Re: [HACKERS] How to determine that a TransactionId is really aborted?
> On Oct 22, 2017, at 3:24 PM, Peter Geoghegan wrote: > > On Sun, Oct 22, 2017 at 2:19 PM, Eric Ridge wrote: >> I'm looking for the status as any concurrent open transaction might see it. >> For example, if any concurrent transaction might see it as "in progress", >> that's what I'd want returned. Does that make sense? > > Maybe, but note that that's fundamentally something that can become > stale immediately. And, just because an MVCC snapshot cannot see a row > does not mean that it cannot affect its transaction/statement in some > other way (e.g. unique index enforcement). Sure, but I don't think I care if it becomes stale immediately. If I ask "now" and PG says "in progress" but it then aborts a few cycles later, I'll just ask again sometime in the future. It's not like a transaction is going to go from "aborted" back to "in progress" -- geez, at least I hope not! I think question I want to answer is "will all active and future transactions see this TransationId as aborted at the time they started"? > Again, you'll probably need to put this low level requirement into > context if you want sound advice from this list. I'm just thinking out lout here, but the context is likely something along the lines of externally storing all transaction ids, and periodically asking Postgres if they're known-to-be-aborted-by-all-transactions -- one at a time. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to determine that a TransactionId is really aborted?
> On Oct 22, 2017, at 2:50 PM, Jaime Casanova > wrote: > > so, what you want is txid_status() [1]... while this is new in v10 you > can use the code as guide or just migrate to v10 ;) Oh neat, thanks. Doesn't that tell you the status relative to the transaction calling txid_status()? I'm looking for the status as any concurrent open transaction might see it. For example, if any concurrent transaction might see it as "in progress", that's what I'd want returned. Does that make sense? That's why I was thinking GetOldestXmin() was the right thing to use rather than GetActiveSnapshot()->xmin and also why I thought to check TransactionIdPrecedes() first. I am curious about the lock on ClogTruncationLock... could any of the TransactionIdDidXXX calls lie without that lock? I haven't seen such a thing used in the 9.3 sources. Maybe it's necessary for 10 or maybe I just missed it in 9.3? Thanks for your time! eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to determine that a TransactionId is really aborted?
> On Oct 22, 2017, at 1:50 PM, Peter Geoghegan wrote: > > On Sun, Oct 22, 2017 at 12:23 PM, Eric Ridge wrote: >> Can anyone confirm or deny that this is correct? I feel like it is correct, >> but I'm no expert. > > What are you going to use the code for? I think that that context is > likely to matter here. I'm not exactly sure yet, but I'm thinking about storing transaction ids externally and then regularly poking Postgres to see which ones are aborted-and-no-longer-considered-visible so I can clean up my external list. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to determine that a TransactionId is really aborted?
When sitting inside an extension, and given an arbitrary TransactionId, how can you determine that it aborted/crashed *and* that no other active transaction thinks it is still running? I've tried to answer this question myself (against the 9.3 sources), and it seems like it's just: { TransactionId oldestXmin = GetOldestXmin (false, false); TransactionId xid = 42; if (TransactionIdPrecedes(xid, oldestXmin) && !TransactionIdDidCommit(xid) && !TransactionIdIsInProgress(xid)) /* not even sure this is necessary? */ { /* xid is aborted/crashed and no active transaction cares */ } } Can anyone confirm or deny that this is correct? I feel like it is correct, but I'm no expert. Thanks so much for your time! eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to get the 'ctid' from a record type?
On Sat, Mar 11, 2017 at 2:14 PM Andres Freund wrote: > On 2017-03-11 04:31:16 +0000, Eric Ridge wrote: > > Well shoot. That kinda spoils my plans. > > I think you should elaborate on what you're trying to achieve - > otherwise our advice will be affected by the recent, widely reported, > crystal ball scarcity. > What I'm trying to do is port https://github.com/zombodb/zombodb to Postgres 9.6+. It's an Access Method that stores full rows, encoded as JSON, in Elasticsearch instead of in local storage. It was fairly straightforward to do the mechanical work to convert it to use 9.6's new AM API (which is very nice, btw!), but the fact that 9.6 also disallows including system columns (specifically ctid) has me turned upside down. With <9.6, I was able to cook-up a scheme where it was able to answer queries from the remote Elasticsearch index even when Postgres decided to plan a sequential scan. That hinged, mostly, on being able to create a multi-column index where the first column was a function call that included as an argument (among other things) the ctid system column. The ability to answer sequential scans (and filters) using the remote ES index is pretty important as the knowledge of how to do that exists in Elasticsearch, not my custom operator function in Postgres. Anyways, I've been trying to find a way to intuit the ctid system column value with 9.6 and it's clear now that that just isn't possible. The closest I got was digging through ActivePortal->queryDesc->estate->es_tuple, but that only works when it's a real tuple, not one that's virtual or minimal. I'm pretty sure that I need to be implementing a Custom Scan Provider instead, and I've been spending time with that API too. There's a pretty steep learning curve for me, but I'll eventually get over that hump. I could probably bore you with greater detail but basically, I want to take: CREATE INDEX idxfoo ON table USING zombodb (zdb(table), zdb_to_json(table)) WITH (url='http://remote.ip.addr:9200/'); SELECT * FROM table WHERE zdb(table) ==> 'some full text query' OR id = 42; And have the "zdb(table) ==> 'some full text query'" bit be answered by my extension, regardless of how PG wants to plan the query. While I was able to hack something together for <9.6, I think that means a Custom Scan Provider now? eric
Re: [HACKERS] How to get the 'ctid' from a record type?
> > I suspect the tuple at (0,1) has been the subject of a failed update. > Yep. > Your problem here is that you're mistaking the t_ctid field of a tuple > header for the tuple's address. It is not that; it's really just garbage > normally, and is only useful to link forward to the next version of the > row from an outdated tuple. I think we do initialize it to the tuple's > own address during an INSERT, but either a completed or failed UPDATE > would change it. > Thanks. That helps clarify the comments in htup_details.h, actually. > I do not think there is any way to get the true address of a heap tuple > out of a composite Datum manufactured from the tuple. Most of the other > system columns can't be gotten from a composite Datum either, because of > the field overlay in HeapTupleHeaderData's union t_choice. Well shoot. That kinda spoils my plans. What about this? Is the tuple currently being evaluated (I suppose in the case of a sequential scan) available in the context of a function call? Thanks again for your time! It's much appreciated. eric
[HACKERS] How to get the 'ctid' from a record type?
This is about Postgres 9.6... I have a very simple 1-arg function, in C, that I want to return the ctid of the record passed in. Example: CREATE OR REPLACE FUNCTION foo(record) RETURNS tid LANGUAGE c IMMUTABLE STRICT AS 'my_extension'; Its implementation is simply: Datum foo(PG_FUNCTION_ARGS) { HeapTupleHeader td = PG_GETARG_HEAPTUPLEHEADER(0); PG_RETURN_POINTER(&td->t_ctid); } What I'm seeing is that the ctid returned from this function isn't always correct: # select ctid, foo(table) from table limit 10; ctid |foo ---+--- (0,1) | (19195,1)-- not correct! (0,2) | (0,2) (0,3) | (0,3) (0,4) | (0,4) (0,5) | (0,5) (0,6) | (0,6) (0,7) | (0,7) (1,1) | (1,1) (1,2) | (1,2) (1,3) | (1,3) (10 rows) I've spent hours tracing through the PG sources trying to figure out why and/or find a different way to do this, but I've got nothing. Of the various examples in the PG sources that use PG_GETARG_HEAPTUPLEHEADER, or otherwise deal with a HeapTupleHeader, I can't find any that want to access a system column, so I'm starting to think I'm just doing it wrong entirely. Can anyone point me in the right direction? Thanks for your time! eric
Re: [HACKERS] Example Custom Scan Provider Implementation?
On Tue, Mar 7, 2017 at 6:39 PM Amit Langote wrote: > > Here you go: https://github.com/kaigai/ctidscan Thanks for the link, Amit! I think that'll get me bootstrapped! This was proposed originally [1] to go into contrib/, but that didn't > happen somehow. > Too bad. :( eric
[HACKERS] Example Custom Scan Provider Implementation?
Hi all! Does anyone know of a simple, example Custom Scan Provider implementation for 9.6+? I found pg_strom by searching GitHub. Its gpuscan.c looks like maybe it implements a pattern similar to what I want to do, but there's a lot of extraneous (to me) stuff to parse through. I'm kinda surprised there isn't an example in contrib/, actually. Thanks for your time! eric
Re: [HACKERS] Disallow unique index on system columns
On Wed, Apr 20, 2016 at 9:24 PM Tom Lane wrote: > > SELECT FROM table WHERE my_func('table', ctid) ==> 'index condition' > > Um, why's the ctid important here, or perhaps more directly, what is > it you're really trying to do? > This function is defined as my_func(regclass, tid) and simply returns the tid value passed in. The operator is defined as ==>(tid, text). Behind the scenes, the AM is actually backed by Elasticsearch, and the tuple's ctid value is used as the "_id" in ES. When Postgres decides to plan a sequential scan (or filter) to answer WHERE clause conditions that use the ==>(tid, text) operator the AM isn't involved but I still need to use the remote Elasticsearch server to answer that condition. So I came up with this "creative" approach to provide enough context in the query plan for me to figure out a) which table is being used and b) which physical row is being evaluated in the seqscan or filter. When the operator's procedure is called, I notice that it's the first time I've seen the FuncExpr on the LHS, go query Elasticsearch with the text query from the RHS, build a hashtable of the matching ctids and lookup the LHS's value in the hashtable. If it exists, the row matches. There just didn't seem to be enough context in the FunctionCallInfo of the the operator's procedure to figure this out without something in the query that's basically statically determined at parse time. I suspect what I should be doing for this particular problem is taking advantage of the Custom Scan API, but I'm trying to support PG 9.3. We weren't planning to do that. > Great! eric
Re: [HACKERS] Disallow unique index on system columns
On Sat, Apr 16, 2016 at 12:14 PM Tom Lane wrote: > > Pushed. I moved the check into DefineIndex, as that's where user-facing > complaints about indexes generally ought to be. > If you're planning on back-patching this, please don't. :) It'll literally ruin my life. I've got an extension that's actually a custom Access Method, and for reasons that are probably too boring to go into here, it requires that the first column in the index be a function that takes the ctid. Ie, something akin to: CREATE INDEX idx ON table (my_func('table', ctid), other_func(table)); The AM implementation itself doesn't actually use the result of my_func(), but that construct is necessary so I can detect certain queries that look like: SELECT FROM table WHERE my_func('table', ctid) ==> 'index condition' I don't mind that you're changing this for 9.6... 9.6 is going to change so much other stuff around custom AMs that I'll deal with it when the time comes, but back-patching this into 9.3/4/5 would make life very difficult. Thanks for listening! eric
Re: [HACKERS] Given a view relation OID, how to construct a Query?
On Wed, Dec 9, 2015 at 5:07 PM Tom Lane wrote: > > FWIW, it's exposed in 9.4 and up. But in older branches you could > probably just copy it, it's not that big. > That's good to know, thanks. I did copy it and it's almost 3x faster than going through SPI. Thanks again for the pointer. eric
Re: [HACKERS] Given a view relation OID, how to construct a Query?
On Wed, Dec 9, 2015 at 4:04 PM Tom Lane wrote: > Eric Ridge writes: > > I'm doing some extension development (in C) and have a situation where I > > need to examine the target list of a view, but all I have is the view's > oid. > > Open the relation and use get_view_query(), perhaps. I figured there was something simple, but I couldn't find it. Thanks! Sadly, it's static. eric
[HACKERS] Given a view relation OID, how to construct a Query?
I'm doing some extension development (in C) and have a situation where I need to examine the target list of a view, but all I have is the view's oid. An approach that works is (pseudocode): SPI_connect(); "SELECT ev_action FROM pg_catalog.pg_rewrite WHERE rulename = '_RETURN' and ev_class=?oid"; Query *query = linitial(stringToNode(ev_action)); ... SPI_finish(); I backed into this by tracing through pg_getviewdef(). Is there a more direct way to do this without going through SPI? I also looked at using postgres.c#pg_analyze_and_rewrite() against a query like "SELECT * FROM viewname" but the target list of the actual query wasn't what I was expecting (individual entry tags don't match those of the SPI approach above). Thanks for your time! eric
Re: [HACKERS] Weirdness using Executor Hooks
On Thu, Jun 18, 2015 at 8:07 PM, Eric Ridge wrote: > > According to pg_proc.probin, all 32 of my C language functions point > to $libdir/my_extension_lib, That's a damn lie. The deployment server has a number of databases that have this extension installed and some of them have probin set to just 'my_extension_lib' Of course I can't seem to trigger the problem, even by changing the extension, re-compiling, and re-installing. #dang eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weirdness using Executor Hooks
On Thu, Jun 18, 2015 at 6:04 PM, Tom Lane wrote: > What we were expecting I guess is that such functions would be declared > to reference the library underneath $libdir/plugins, not that you'd use > a symlink. According to pg_proc.probin, all 32 of my C language functions point to $libdir/my_extension_lib, which makes sense because as part of the extension .sql I declare them as LANGUAGE c AS 'MODULE_PATHNAME', and MODULE_PATHNAME gets substituted by make. So are you saying that we should instead declare them AS '$libdir/plugins/my_extension_lib' so that it matches what's happening in local_preload_libraries? And is it safe to directly update pg_proc.probin (and ya know, terminate all existing backends)? Thanks so much for your time! eric ps, I think if we just changed our deploy process to terminate existing backends this would just disappear, yeah? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weirdness using Executor Hooks
On Thu, Jun 18, 2015 at 5:39 PM, Tom Lane wrote: > BTW, if it didn't occur to you already --- next time you see this, > you could try examining dfmgr.c's DynamicFileList data structure > using gdb. That might help illuminate what Postgres thinks it > loaded and why. Hmm, okay. I'll not check for the condition in the hook functions so that when this does happen again I can do what you suggest. thanks! eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weirdness using Executor Hooks
On Thu, Jun 18, 2015 at 5:30 PM, Tom Lane wrote: > > If that's broken, then so are most of our contrib modules. > Certainly none of them have an extra check as suggested by Andres. I'm putting it in anyways, along with a guard in the actual hook function to ERROR if prev_hook == my_hook. At least that'll avoid the endless loops until I can figure this out. > >> As a data point, that might be interesting to know, but I'd still be >> scratching my head about how it happened. Postgres doesn't load an >> extension library more than once per backend session, does it? > > It's not supposed to, and AFAICS internal_load_library() will treat > either an exact pathname match or an inode-number match as being > "already loaded". I wonder if you might be doing something that > confuses those checks. It does not look like we try terribly hard > to canonicalize library pathnames --- might you have some references > under different relative paths, for instance? The inode number > check would perhaps fail if you'd installed a new library version, > but it's unclear to me why the pathname check would fail. According to the docs, anything listed in 'local_preload_libraries' has to be in $libdir/plugins (whereas extensions just go to $libdir), so rather than duplicate the .so, the "copy" in $libdir/plugins is just a symlink to ../my_extension.so. Could that be confusing things? eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weirdness using Executor Hooks
On Thu, Jun 18, 2015 at 4:42 PM, Andres Freund wrote: > > > How does your _PG_init() actually look like? static ExecutorStart_hook_type prev_ExecutorStartHook = NULL; static ExecutorEnd_hook_type prev_ExecutorEndHook = NULL; void _PG_init(void) { prev_ExecutorStartHook = ExecutorStart_hook; prev_ExecutorEndHook = ExecutorEnd_hook; ExecutorStart_hook = my_executor_start_hook; ExecutorEnd_hook = my_executor_end_hook; RegisterXactCallback(my_xact_callback, NULL); } > I'd suggest adding an error check that prev_ExecutorStartHook is > unequals to your function when you assign it. As a data point, that might be interesting to know, but I'd still be scratching my head about how it happened. Postgres doesn't load an extension library more than once per backend session, does it? > Did you check whether it's compiler/compiler flag dependant? I didn't. As far as compiler flags go, the extension is just inheriting Postgres' defaults. Any suggestions around which flag(s) to adjust? I appreciate your time! eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Weirdness using Executor Hooks
I've written an extension that hooks ExecutorStart_hook and ExecutorEnd_hook. The hooks are assigned in _PG_init() (and the previous ones saved to static vars) and reset to the previous values in _PG_fini(). Maybe also of interest is the extension library is set in postgresql.conf as a local_preload_libraries. This is with Postgres 9.3.4. What happens is that rarely (and of course never on my development machine), the saved "prev_ExecutorXXXHook" gets set to the current value of ExecutorXXX_hook, so when my hook function is called: static void my_executor_start_hook(QueryDesc *queryDesc, int eflags) { executorDepth++; if (prev_ExecutorStartHook) /* this ends up equal to my_executor_start_hook, so it recurses forever */ prev_ExecutorStartHook(queryDesc, eflags); else standard_ExecutorStart(queryDesc, eflags); } it endless loops on itself (ie, prev_ExecutorStartHook == my_executor_start_hook). Based on GDB backtraces, it looks like gcc compiles this into some form of tail recursion as the backtraces just sit on the line that calls prev_ExecutorXXXHook(...). The backend has to be SIGKILL'd. I've followed the patterns set forth in both the 'auto_explain' and 'pg_stat_statements' contrib extensions and I've been over my code about 3 dozen times now, and I just can't figure out what's going on. Clearly the hooks are known-to-work, and I'm stumped. One theory I have is that I've got a bug somewhere that's overwriting memory, but it's quite the coincidence that only the two saved prev hook pointers are being changed and being changed to very specific values. Since it only happens rarely (and never for me during development), another theory is based on the fact that this extension is under pretty constant development/deployment and when we deploy a new binary (and run ALTER EXTENSION UPDATE) we don't restart Postgres and so maybe the already-active-and-initialized-with-the-previous-version backends are getting confused (maybe the kernel re-mmaps the .so or something, I dunno?). I always seem to hear about the problem after a backend has been endlessly spinning for a few days. :( Have any of y'all seen anything like this and could I be on the right track with my second theory? *scratching head*, eric
Re: [HACKERS] Change pg_cancel_*() to ignore current backend
On Fri, May 22, 2015 at 4:51 PM Jim Nasby wrote: > Interesting idea. I suspect that would be even more invasive than > modifying the functions though... > Here's the solution. I can't see how anyone could possibly disagree with this... ;) Change the sort order for pg_stat_activity so the current session sorts last. That way all the other sessions get killed first when doing select pg_terminate_backend(pid) from pg_stat_activity. When I get bitten by this, I don't really care that my session gets killed, I care that it gets killed before all the others. Personally, I only do this sort of thing interactively via psql, and typically ^D as soon as the query finishes (which means the second time I run psql and add a WHERE clause to the query). Alternatively, queue up pg_cancel/terminate_backend calls and process them only on successful transaction commit, in such an order that the current session is processed last. They'd be consistent with NOTIFY too, which might be an added bonus. eric ps, sorry my last message was from corporate email w/ the stupid "legal" disclaimer.
Re: [HACKERS] Change pg_cancel_*() to ignore current backend
> On May 19, 2015, at 6:59 PM, Jim Nasby wrote: > > I find it annoying to have to specifically exclude pg_backend_pid() from > pg_stat_activity if I'm trying to kill a bunch of backends at once, and I > can't think of any reason why you'd ever want to call a pg_cancel_* function > with your own PID. I'm just a lurker, but regularly get bitten by this exact thing. Rather than change the behavior of pg_cancel/terminate_backend(), why not change pg_stat_activity to exclude the current session? Seems like showing a row in pg_stat_activity for "SELECT * FROM pg_stat_activity" is kinda useless anyways. eric PROPRIETARY AND COMPANY CONFIDENTIAL COMMUNICATIONS The information contained in this communication is intended only for the use of the addressee. Any other use is strictly prohibited. Please notify the sender if you have received this message in error. This communication is protected by applicable legal privileges and is company confidential. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices
On Fri, Oct 31, 2014 at 6:07 PM, Tom Lane wrote: > I don't know if/when that will happen as such, but Simon was making noises > about writing code to treat hash indexes as unlogged automatically, which > would more or less fix the worst risks. That's not just a special case > for hash indexes, but any index AM that lacks WAL support, as third-party > AMs might well do. > As someone writing a 3rd-party AM, literally right this moment, do you have a link to that thread? While I follow this list fairly closely I don't remember seeing this. I'd love to understand the thoughts around handling extension-based AMs. eric
[HACKERS] pg_get_indexdef() doesn't quote string reloptions
Hi all! I've been working on implementing a custom index using the Index Access Method API and have the need for custom reloptions that are "complex" strings (ie, also contain non-alphaumerics). pg_get_indexdef() and pg_dump don't quote the reloption values, making a restore (or cut-n-paste of the pg_get_indexdef() output) impossible if the reloption value contains non-alphanumerics. For example, the statement: # CREATE INDEX idxfoo ON table USING myindex (col) WITH (option = 'some complex string'); cannot be restored as it gets rewritten as: CREATE INDEX idxfoo ON table USING myindex (col) WITH (option = some complex string); (note the lack of quotes around the option value) Looks like (at least) ruleutils.c:flatten_reloptions() needs to be smarter. eric PROPRIETARY AND COMPANY CONFIDENTIAL COMMUNICATIONS The information contained in this communication is intended only for the use of the addressee. Any other use is strictly prohibited. Please notify the sender if you have received this message in error. This communication is protected by applicable legal privileges and is company confidential. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Tue, Nov 1, 2011 at 11:47 PM, Tom Lane wrote: > Doesn't mean anything, I think --- the SQL standard seems to exclude it. > It's fairly hard to prevent it at the grammar level, since we regard > "foo.*" as a type of primitive expression, but I suspect it might be a > good idea for transformTargetList to throw an error instead of silently > ignoring the column label. Let me take a stab at fixing it in the grammar... if it's even do-able. If it can be fixed at the grammar, it'll get me a lot closer to doing what I want to do with the actual feature. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Tue, Nov 1, 2011 at 1:33 PM, Pavel Stehule wrote: >> COPY (SELECT * EXCLUDING (a, b, c) FROM ) TO 'somefile.csv' WITH >> CSV; >> > sorry, I don't accept it. I am able to understand your request for > adhoc queries. But not for COPY. I apologize if that example was confusing. I wasn't also suggesting expanding COPY's syntax. I was merely pointing out that if "EXCLUDING(…)" were implemented, you'd be able to use it within the query given to the COPY command. eric ps, it seems my messages aren't hitting the list? weird. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Tue, Nov 1, 2011 at 12:24 PM, Pavel Stehule wrote: > some other idea - but only for psql > > we can define a special values, that ensure a some necessary > preexecution alchemy with entered query > > \pset star_exclude_names col1, col2, col3 > \pset star_exclude_types xml, bytea, text(unlimited) > Sure, something like that could be useful too. It might be confusing to users if they forget that they set an exclusion list, but there's probably ways to work around that. However, the nice thing about the feature being in SQL is that you can use it from all clients, and even in other useful ways. COPY would be an example (something I also do frequently): COPY (SELECT * EXCLUDING (a, b, c) FROM ) TO 'somefile.csv' WITH CSV; Right now, if you want to exclude a column, you have to list all the others out manually, or just dump everything and deal with it in an external tool. I generally agree with everyone that says using this in application code is a bad idea, but I don't think that's reason alone to reject the idea on its face. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Tue, Nov 1, 2011 at 12:03 PM, Stephen Frost wrote: > Note- I haven't looked at the * production or tried to do anything w/ gram.y > to > support this yet, but it's a heck of a lot shorter.. My original thought, that I probably didn't explain too clearly, was to make the "EXCLUDING (...)" bit a modifier to the A_Star node. The idea being that you could write "* EXCLUDING (...)" anywhere you can currently write "*". It's dead simple for the case of: SELECT * FROM ... but because of the "indirection" productions in gram.y, it's literally impossible for: SELECT tablename.* FROM ... without possibly breaking existing queries. Nonetheless, even if it were only available for the first form, it would be very useful. For the ad-hoc type stuff I do, it'd still be great to write: SELECT * EXCLUDING (x.a, x.b, x.c) FROM (SELECT x); I've already got gram.y working the way it needs to, and I've started to get the exclusion list passed into the places it needs to go. If y'all would be willing to accept it in this "limited" form, I'll continue to work on it. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sun, Oct 30, 2011 at 3:17 PM, Tom Lane wrote: > "fails to not break anything else" category. >From what I've seen watching this list, you're usually right. :) It looks like it's perfectly okay to write: SELECT pg_class.* AS foo FROM pg_class; (with or without the AS) I don't know what the above actually means, but it stops SELECT pg_class.* EXCLUDING(...) dead in its tracks. So, I'd have to break a syntax (albeit silly?) that currently works to support this. :( eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sun, Oct 30, 2011 at 4:43 PM, Darren Duncan wrote: > SELECT foo.* EXCLUDING foo.x, bar.* EXCLUDING bar.y, baz.z, (a+b) AS c FROM > ... > Is that where you're going with this? Yes. It's basically a modifier to the star that immediately precedes it. In order to support excluding multiple columns, it needs parens: SELECT foo.* EXCLUDING (foo.x, foo.y), bar.* EXCLUDING (bar.y), baz.z, (a+b) AS c but yes, that's what I'm thinking. I think doing this will require more changes to the grammar than I had first thought because there'd be no point in supporting: SELECT foo.* EXCLUDING (foo.* EXCLUDING foo.y) FROM ... It looks like the above would be implicitly allowed without a bit of extra work. But, if you've got a complex query consisting of a few joins, it'd be nice to say: SELECT * EXCLUDING (table1.*, table2.x) FROM table1 INNER JOIN table2 ... > If so, I think that would make the feature even more valuable and more > syntactically clean than I had previously thought. I don't actually like the term "EXCLUDING", but it conveys what's happening and is already defined as a keyword. I thought about "EXCEPT", but that doesn't work for obvious reasons, and "NOT" might just be confusing. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sun, Oct 30, 2011 at 4:03 PM, Tom Lane wrote: > That's just a gut feeling, I've not tried it ... but the proposed > syntax sure looks a lot like a call to a function named EXCLUDING. I think what makes it okay is that its new use is only defined to immediately follow an asterisk in the "target_el" production. If you look at gram.y:11578 (from git HEAD), I was thinking this: | a_expr { $$ = makeNode(ResTarget); $$->name = NULL; $$->indirection = NIL; $$->val = (Node *)$1; $$->location = @1; } +| '*' EXCLUDING '(' columnref_list ')' +{ +/** make magic happen */ +} | '*' { ColumnRef *n = makeNode(ColumnRef); n->fields = list_make1(makeNode(A_Star)); n->location = @1; $$ = makeNode(ResTarget); $$->name = NULL; $$->indirection = NIL; $$->val = (Node *)n; $$->location = @1; } And it looks like something similar would be necessary in the "indirection_el" production, around line 11478. But that might be overly simplistic (and wrong). eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sun, Oct 30, 2011 at 3:38 PM, Mark Mielke wrote: > Stupid question: > > Is this just a display thing? Well, it's a "display thing" as much as any SELECT statement (especially via psql) is a "display thing". It's more like "I want all 127 columns, except the giant ::xml column, and I'm too lazy to type each column name out by hand". > Or does this have impact for things such as COUNT(*) vs COUNT(1)? If it does, it's broken. > Is it "like a view, but on the fly"? Naw, it's just short-hand for omitting columns from the output target list. As I'm envisioning the feature, it would be SQL-level syntax, so you could bake it into a view, but... > 1) Not standards compliant, Sure, no doubt. I think the "::typename" syntax is not standards compliant either, and I bet I'm not the only Postgres user to use that every day. But I secretly tend to agree, I wouldn't necessarily want to use this in production/application code. > 2) Not deterministic (i.e. a database change might cause my code to break), Okay, I'm inventing a use-case here, but say you have a "users" table with various bits of metadata about the user, including password. Maybe, regardless of database changes, you never want the password column returned: SELECT * EXCLUDING (password) FROM tbl_users; Changes of omission can break your code just as easily. > 3) Working around a problem that maybe shouldn't exist in the first place? > It's > a like buying a rug, so that nobody sees the scratches on the floor. Sometimes, rugs are cheaper than new floors. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sun, Oct 30, 2011 at 3:17 PM, Tom Lane wrote: > > It's entirely possible that it will get bounced on standards-compliance > grounds. And that's a perfectly valid reason to reject it. > In particular, I don't think it's acceptable to introduce a > new reserved keyword for this --- that would fall under the "fails to > not break anything else" category. Please correct me if I'm wrong, but if we choose the word carefully (which is why I chose EXCLUDING), I think we're okay? EXCLUDING is already defined as an "ordinary key word". And it's new use in this situation seems to be completely unambiguous, such that you'd still be able to use "excluding" everywhere you already could. You know more about the grammar than I (or probably most anyone), so I'm wondering why you think it might need to be a "reserved keyword"? Alternatively, would it be okay to use an existing reserved keyword? eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sun, Oct 30, 2011 at 2:54 PM, Robert Haas wrote: > OTOH, I'm slightly afraid of how much work it would take to implement > this properly. I think first, the A_Star node struct will need to be expanded to include a List of qualified column references to exclude. From there, the "target_el" rule in gram.y will need to be expanded to support a syntax like: '*' EXCLUDING '(' columnref_list ')' { ... } I also think that the "indirection_el" rule will need to be expanded to support something similar. Together, that would let us write both: SELECT * EXCLUDING(table1.col1, table2.col1) FROM ... and SELECT table.* EXCLUDING(col1, col2) FROM ... or even SELECT * EXCLUDING(table1.col1), table2.* EXCLUDING(col1) FROM ... I think changing the "indirection_el" rule might have an impact to OLD/NEW, but I'm not sure. Is it legal to write OLD.*, and if so, would you also want to write OLD.* EXCLUDING (...) in those cases? I think this only applies to RULES or SQL-type trigger functions, but not pl/pgsql? Then it looks like touching various functions in src/backend/nodes/*.c to do the right things with the new exclusion list field in A_Star. I haven't traced through everything yet, but it looks like if the various places in src/backend/nodes/*.c are done correctly, then regurgitating a view definition or whatnot that includes this syntax will be automatic (i.e., no specific support required for pg_dump)? Anyways, at first I thought it would be about 8hrs of work just to get something working. Maybe it's more like 20, but even still, it seems fairly straightforward. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sun, Oct 30, 2011 at 1:51 PM, Eric B. Ridge wrote: > eric > > PROPRIETARY AND COMPANY CONFIDENTIAL COMMUNICATIONS my bad. Switched email accounts without realizing. :( eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
On Sat, Oct 29, 2011 at 6:35 PM, Stephen Frost wrote: >> Maybe the SQL spec says something about this and nobody's done the work yet? > > I don't know of anything like this in the spec. Also, there would be > concern about this possibly going against spec, breaking possibly valid > queries, promoting keywords to reserved words, and maybe ending up in a > bad situation if the SQL committee decides to support that kind of > syntax for something completely different. At least concerning breaking valid queries and promoting keywords, I don't think the former can happen (they'd fail to parse today) and the latter doesn't seem necessary as "EXCLUDING"'s use in this case appears to be completely unambiguous. However, I realize there's no second-guessing what the SQL committee might do in the future. > In general, I doubt this is something we'd implement, but others may > feel differently. I hope so. :) > What might be interesting to consider is how hard it > would be to make psql smarter when it comes to line editing in this > regard. Maybe if there was a way to easily expand the '*' from psql and > then you could remove the columns from the list easily..? Probably really dang hard, especially when you consider a "SELECT *" involving lots of joins. And even if it turned out to be easy, it would be limited to psql. Anyways, it's just something I've wanted for quite awhile and thought I'd actually do the work to make it happen, *if* y'all would take it. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
Would y'all accept a patch that extended the "SELECT *" syntax to let you list fields to exclude from the A_Star? Quite regularly I'll be testing queries via psql and want to see all the columns from a fairly wide table except maybe a giant text or xml column. A syntax like: SELECT * EXCLUDING (big_col1, big_col2) FROM foo; would be pretty handy. It would definitely save some typing in certain cases. It seems like such a syntax would better document the intent of a query too, rather than leaving one wondering if "big_col1" was supposed to be omitted from the target list or not. Anyways, I just wanted to run the idea by youse guys before I put too much more effort into it. I've already made what appear to be the minimum necessary changes to gram.y, and a few quick greps through the code make me think the rest will be pretty easy. Maybe the SQL spec says something about this and nobody's done the work yet? Thanks for your input! eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in SignalSomeChildren
On Tue, Dec 21, 2010 at 2:33 PM, Robert Haas wrote: > The point of the patch was to improve cases where attaching gdb > *didn't* work well. Any cases where it was already working for you > aren't going to be made worse by this. Okay, great. Thanks for the clarification. eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in SignalSomeChildren
On Mon, Dec 20, 2010 at 3:36 PM, Martijn van Oosterhout wrote: > On Mon, Dec 20, 2010 at 03:08:02PM -0500, Robert Haas wrote: >> The attached patch appears to work correctly on MacOS X. I did check, >> BTW: getppid() in the attached process returns gdb's pid. Poor! > > This appears to be a BSDism at least. On Linux and BSD derivatives the > man pages specifically mention the reparenting (needed for catching > signals) but on Linux getppid() is specifically documented to return > the correct value anyway. I'm just a random lurker here, and happened to catch the last bit of this thread. Could one of you that understand this issue straighten something out for me? Every now and again we've been known to attach gdb to a production Postgres backend to troubleshoot problems. Ya know, just trying to get an idea of what Postgres is actually doing via a backtrace. This is always on Linux, BTW. Does this thread mean that the above no longer works with v9? Or is this only on non-Linux systems, or did the patch Robert Haas commit "fix" fix? We're still using 8.1 (slowly moving to 8.4) in production, but have plans of picking up 9.x later in '11. Just wondering if we need to actually be a bit more careful in the future? Thanks! eric -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User Defined Functions/AM's inherently slow?
I don't have the original thread in my inbox anymore, so for reference: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF -8&selm=20129.1074484755%40sss.pgh.pa.us Tom Lane last said: I have committed a patch into CVS HEAD --- give it a try. It took me awhile, but I finally got around to trying this (by plopping it into my 7.4 sources), and yes, the hashlookup does significantly improve the performance of calling "C" functions. I agree with your commit message that they're now roughly on par with built-in functions. thanks! eric ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Question about indexes
On Jan 30, 2004, at 6:31 AM, Bruce Momjian wrote: I like the idea of building in-memory bitmapped indexes. Me too (FWIW)! This thread is really interesting as the whole idea would help to solve the biggest issue with my (currently stalled) project to integrate Xapian as a full-text search engine. Getting index scans used in all situations would drastically improve performance. eric ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] User Defined Functions/AM's inherently slow?
On Jan 18, 2004, at 1:34 AM, Christopher Kings-Lynne wrote: Theory B would be that there's some huge overhead in calling non-built-in functions on your platform. We do know that looking up a "C" function is significantly more expensive than looking up a "builtin" function, but there should only be half a dozen such calls involved in this test case; it's hard to credit that that takes 200 msec. Does the time drop at all on second and subsequent repetitions in a single backend run? Yes, it drops from about .680ms to the .250ish that I posted. I suppose I should try compiling this little stub into postgres, eh? What if you try the new preload_libraries (or whatever it's called) config variable in postgresql.conf in the 7.4 release? yes, that takes care of the initial load time of the library itself (bringing the initial .680ms back to .250ish for the first run), but this is not the problem. eric ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] User Defined Functions/AM's inherently slow?
On Jan 17, 2004, at 11:33 PM, Tom Lane wrote: The difference between "total runtime" and the top plan node's runtime has to represent plan startup/shutdown time. I'm suspicious that your stubs are somehow not initializing something, though on first glance I do not see what. I can't see anything either... which is why I brought it up. I'm still a noob with this stuff, and thought maybe I was just missing something. Theory B would be that there's some huge overhead in calling non-built-in functions on your platform. We do know that looking up a "C" function is significantly more expensive than looking up a "builtin" function, but there should only be half a dozen such calls involved in this test case; it's hard to credit that that takes 200 msec. Does the time drop at all on second and subsequent repetitions in a single backend run? Yes, it drops from about .680ms to the .250ish that I posted. I suppose I should try compiling this little stub into postgres, eh? eric ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] User Defined Functions/AM's inherently slow?
On Jan 17, 2004, at 11:27 PM, Tom Lane wrote: Eric Ridge <[EMAIL PROTECTED]> writes: costestimate: { PG_RETURN_VOID(); } This at least needs to set some values into the output parameters --- zeroes are okay, not setting them at all isn't. I'm surprised the planner doesn't go nuts. It looks from your EXPLAIN results like the values are coming up zero anyway, but seeing that cost_index() doesn't do anything to initialize those local variables, I'd expect fairly unpredictable behavior. I have tried setting them all to zero, and even using the backend/utils/adt/selfuncs.c:genericcostestimate() code (it's not exported, it seems), but no matter what sane (or crazy!) numbers I provide for the cost estimate, the "fact" remains, my AM, which does "nothing" is slower than the builtin btree AM. Could this just be related to the fact that my AM is in a .so, and there's just some operating system/C runtime overhead in calling functions in dynamically loaded libraries? eric ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] User Defined Functions/AM's inherently slow?
On Jan 17, 2004, at 10:22 PM, Tom Lane wrote: Eric Ridge <[EMAIL PROTECTED]> writes: I've created a stub AM that literally does nothing. It's not possible for an index AM to "do nothing", at least not for an indexscan. It has to return tuple pointers. What are you doing for that? I should have included the entire explain output: stub AM: Index Scan using idxa_stub on test2 (cost=0.00..2.68 rows=1 width=5) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (a ==> '1'::text) Total runtime: 0.247 ms builtin btree AM: Index Scan using idxa_btree on test2 (cost=0.00..4.68 rows=1 width=5) (actual time=0.024..0.026 rows=1 loops=1) Index Cond: (a = '1'::text) Total runtime: 0.060 ms If the "actual time" numbers are really a measure of the amount of time spent in (at least) the index, it seems the stub should report a smaller "total runtime", but alas, it doesn't. eric ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] User Defined Functions/AM's inherently slow?
On Jan 17, 2004, at 10:22 PM, Tom Lane wrote: Eric Ridge <[EMAIL PROTECTED]> writes: I've created a stub AM that literally does nothing. It's not possible for an index AM to "do nothing", at least not for an indexscan. It has to return tuple pointers. What are you doing for that? costestimate: { PG_RETURN_VOID(); } beginscan: { Relation index = (Relation) PG_GETARG_POINTER(0); int keysz = PG_GETARG_INT32(1); ScanKey scankey = (ScanKey) PG_GETARG_POINTER(2); IndexScanDesc scan = RelationGetIndexScan(index, keysz, scankey); PG_RETURN_POINTER(scan); } rescan: { PG_RETURN_VOID(); } gettuple: { PG_RETURN_BOOL(false); } endscan: { PG_RETURN_VOID(); } I think the above is about as close to "nothing" as one can get. In trying to track down some performance issues with my real AM, I decided to make this stub AM just to see what the overhead is... then I started seeing these crazy results. eric ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] User Defined Functions/AM's inherently slow?
I've created a stub AM that literally does nothing. It indexes nothing. It scans for nothing. Nadda. It does just enough work to specify return values that prevent PG from dumping core. What I've found is that this stub AM, compiled with the same options as postgres itself (-O2 -fno-strict-aliasing), is roughly 4 times slower than the built in btree AM that actually does something useful! The test table contains 1 column, and 1 row: My stub AM: explain analyze select * from test where a ==> '1'; Total runtime: 0.254 ms builtin btree AM: explain analyze select * from test where a = '1'; Total runtime: 0.058 ms (I ran each one a number times, with basically the same results). What gives? *scratches head* I know btree's are efficient, but geez, can they really be more efficient than O(zero)? :) Looking at the backtrace from the beginscan function of each AM, PG doesn't appear to do anything different for user-provided AM's. My platform is OS X 10.3.2, using PG 7.4, gcc version 3.3 (Apple's build #1495). Any insight would be greatly appreciated. Thanks! eric ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: Postgres + Xapian (was Re: [HACKERS] fulltext searching via a custom index type )
Thanks to everyone that provided input about this idea. After taking everything into consideration and talking with Olly Betts from the Xapian project, what I'm going to do is implement a btree that supports multiple roots and an "tag" value of arbitrary length for each item in the tree. Then implement a new Xapian backend that uses it. In the end, it'll be much more efficient and much less dirty than this silly filesystem thing I have now. And hopefully, concurrency and WAL will be easier to deal with too. Having the existing nbtree AM as guide will be very useful. I have one issue that could potentially make all of this completely useless, and it's related to Postgres deciding to use a Filter instead of an Index Scan. I asked this question last week, and Tom Lane responded with a solution, but I don't think I explained myself very well. And now that I've got a bit more experience with some of the PG internals, maybe I can ask the question more intelligently. Given this query: select id from table where document_text ==> 'food' and title ==> 'water'; Postgres generates this plan: Index Scan using idxtitle on table (cost=0.00..4.01 rows=1 width=8) Index Cond: (title ==> 'water'::text) Filter: (document_text ==> 'food'::text) The problem is, the "document_text" column is *really* big. Average value length is 171k. With this query plan, my operator procedure is forced to re-parse the document_text column from each row returned by the index scan against the title column, and do a bunch of Xapian tricks for each row. The overhead is huge. The composite index solution doesn't seem ideal here because there's absolutely no way I can anticipate every combination of fields a user might choose to search. Forgetting about composite indexes for a moment, is postgres even capable of doing 2 index scans in this situation? Does it know how do take the intersection of two scans? My AM's cost estimate function literally sets the selectivity, correlation, total cost, and startup cost values to zero (and I've tried tons of combinations of really big, really small, and really negative values). My thought behind setting them all to zero was that if the cost function basically says, "There is no cost", I could fool Postgres into wanting to use the index everywhere it can. Sadly, this doesn't work out. Now, I realize I can fix my cost estimate function to return better costs for the title and document_text fields so that PG will instead decide to index scan on document_text, but what I really want to do is make PG do an index scan for each field. Can PG even do this? I'd appreciate any thoughts on this. Hopefully, I'm just missing something really obvious. thanks! eric ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: Postgres + Xapian (was Re: [HACKERS] fulltext searching via a custom index type )
On Jan 2, 2004, at 4:54 PM, Alvaro Herrera wrote: I think your approach is too ugly. You will have tons of problems the minute you start thinking about concurrency (unless you want to allow only a single user accessing the index) It might be ugly, but it's very fast. Surprisingly fast, actually. Concerning concurrency, Xapian internally supports multiple readers and only 1 concurrent writer. So the locking requirements should be far less complex than a true concurrent solution. Now, I'm not arguing that this ideal, but if Xapian is a search engine you're interested in, then you've already made up your mind that you're willing to deal with 1 writer at a time. However, Xapian does have built-in support for searching multiple databases at once. One thought I've had is to simply create a new 1-document database on every INSERT/UPDATE beyond the initial CREATE INDEX. Then whenever you do an index scan, tell Xapian to use all the little databases that exist in the index. This would give some bit of concurrency. Then on VACUUM (or FULL), all these little databases could be merged back into the main index. and recovery (unless you want to force users to REINDEX when the system crashes). I don't yet understand how the WAL stuff works. I haven't looked at the API's yet, but if something you can record is "write these bytes to this BlockNumber at this offset", or if you can say, "index Tuple X from Relation Y", then it seems like recovery is still possible. If ya can't do any of that, then I need to go look at WAL further. I think one way of attacking the problem would be using the existing nbtree by allowing it to store the five btrees. First read the README in the nbtree dir, and then poke at the metapage's only structure. You will see that it has a BlockNumber to the root page of the index. Right, I had gotten this far in my investigation already. The daunting thing about trying to use the nbtree code, is the a code itself. It's very complex. Plus, I just don't know how well the rest of Xapian would respond to all of a sudden having a concurrent backend. It's likely that it would make no difference, but it's just an unknown to me at this time. Try modifying that to make it have a BlockNumber to every index's root page. You will have to provide ways to access each root page and maybe other nonstandard things (such as telling the root split operation what root page are you going to split), but you will get recovery and concurrency (at least to a point) for free. And I'm not convinced that recovery and concurrency would be "for free" in this case either. The need to keep essentially 5 different trees in sync greatly complicates the concurrency issue, I would think. thanks for your time! eric ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] fulltext searching via a custom index type
On Dec 26, 2003, at 4:04 PM, Tom Lane wrote: Eric Ridge <[EMAIL PROTECTED]> writes: Xapian has it's own storage subsystem, and that's what I'm using to store the index... not using anything internal to postgres (although this could change). I would say you have absolutely zero chance of making it work that way. thanks for the encouragement! :) You will not be able to get it to interoperate reliably with transactions, checkpointing, or WAL replay; to say nothing of features we might add in the future, such as tablespaces and point-in-time recovery. You need to migrate all the data into the Postgres storage mechanism. And these are the things I'm struggling with now. The basic indexing and searching currently works flawlessly, but the moment another user connects up, everything goes to hell. It might be worth pointing out here than an index AM is not bound to use exactly the typical Postgres page layout. I think you probably do have to use the standard page header, but the page contents don't have to look like tuples if you don't want 'em to. For precedent see the hash index AM, which stores ordinary index tuples on some index pages but uses other pages for its own purposes. That's useful information. Thanks. I've been using the hash AM as my guide b/c it's not as complex as the other index types (atleast on the public interface side). Obviously, I'm trying to save the time and energy of re-inventing the wheel when it comes full text indexing and searching. Xapian is an awesome standalone engine (and it's amazingly fast too!), so it seemed like a good place to start. It's backend storage subsystem is pluggable, and after our little exchange here today, I'm now considering writing a postgres backend for Xapian. I assume the doc chapter on Page Files and the various storage-related README files are good places for more information. Any other tips or pointers? eric ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] fulltext searching via a custom index type
On Dec 26, 2003, at 3:22 PM, Tom Lane wrote: 3) How does one get the $PGDATA directory? DataDir. Why should you care? An index AM that wants to know this is probably broken IMHO; it's certainly trying to do something that's outside the charter of index AMs, and is likely to cause lots of headaches. Xapian has it's own storage subsystem, and that's what I'm using to store the index... not using anything internal to postgres (although this could change). As such, Xapian needs to know *where* to save its indexes... $PGDATA seemed like a good place to start. 4) Can a function be registered as part of a transaction, pre-commit -- so the function can have an opportunity to abort the transaction. Why would that be a good idea? When exactly would you expect it to be called (relative to the other ninety-nine things that happen at commit)? How are you going to recover if something else aborts the transaction, either before or after your function runs? I don't really have an answer to this. :) I get the impression from your questions that you are trying to make an end run around the transaction mechanism. Perhaps. I'm actually fishing for ideas to bridge xapian's transaction facilities to postgres. Your comment confirms my suspicions that it just ain't gunna work out. This is almost certainly doomed to failure. You need to find a way of storing all your data within the ordinary index structure. You are probably right. :) I'm just playing around right now. I do appreciate your response, it's given me a lot to think about. eric ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] fulltext searching via a custom index type
(I started with this addressed to the -general list, but it just doesn't seem like a general usage topic. If -hackers is the wrong place, please point me in the right direction). I've been working on a custom index type (access method) that allows postgres to do fulltext searching (including phrase and proximity) that I believe is more flexible and natural than "tsearch2" (no offense to the tsearch2 guys... it's been a source of inspiration). I also plan on providing "term browsing" and hit-position information. I'm using Xapian (www.xapian.org) as the backend fulltext engine, and I'm trying to learn more of the innards of postgres as I go. I've only spent about 18-24 hours on this, so it's nowhere near complete, and I really hesitate to even mention it in a public forum. But if in the end it doesn't suck it might make a great contrib/ package. *shrug* who knows? In a nutshell, I've implemented all the necessary access method functions to teach postgres about a new index type named "xapian", and so far, everything is really great. It works just like any other index type would: create table test (stuff varchar(255), more_stuff text); create index idxstuff on test using xapian (stuff); create index idxmore_stuff on test using xapian (more_stuff); insert into test (stuff, more_stuff) values ('this is stuff', 'this is more stuff'); insert into test (stuff, more_stuff) values ('my name is eric ridge', 'i like to drink beer'); select * from test where stuff => 'stuff' or more_stuff => '"drink beer"'; stuff | more_stuff ---+-- this is stuff | this is more stuff my name is eric ridge | i like to drink beer (2 rows) All this aside, I've got some questions related to indexes and query plans. 1) Is it possible for an access method to receive some kind of "DROP INDEX" notification? As far as I can tell, the answer is "no", but it can't hurt to ask. 2) When does the query planner decide that it needs to "Filter" results, and is there any way to turn this off or otherwise fool the query planner into NOT doing Filters (even if only for certain operators)? For example, this query: select * from test where stuff => 'stuff' AND NOT more_stuff => '"drink beer"'; has this plan: Index Scan using idxstuff on test (cost=0.00..-0.98 rows=250 width=177) Index Cond: ((stuff)::text => 'stuff'::text) Filter: (NOT (more_stuff => '"drink beer"'::text)) In this case, postgres is forced to re-parse the contents of the "more_stuff" field (via the defined procedure for the => operator) for every row returned by the previous index scan, just so it can determine if the field contains the phrase 'drink beer' or not. Since so much overhead is involved here, it would be cool if postgres could somehow do another index scan. Maybe there's some way for the operator function to know not only the Datum value, but the actual field (and ItemPointer)? 3) How does one get the $PGDATA directory? getenv() doesn't seem ideal since PGDATA can be specified as a command-line argument. 4) Can a function be registered as part of a transaction, pre-commit -- so the function can have an opportunity to abort the transaction. I've seen RegisterEOXactCallback, but it's not quite what I'm looking for. 5) are there discussions in the archives of this list (or other pgsql- lists) that discuss fulltext searching that y'all think are worth reading? thanks in advance for your time and input! eric ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [GENERAL] Can't Build 7.3.4 on OS X
On Sep 21, 2003, at 9:03 PM, Tom Lane wrote: Eric Ridge <[EMAIL PROTECTED]> writes: any ideas here? 7.3.2 and 7.4beta3 compile just fine (I noticed that 7.4 has something more cross-platform for tas). What happened in 7.3.4 that broke it? That makes no sense at all --- AFAICT there were *no* darwin or ppc specific changes between 7.3.2 and 7.3.4. Can you double check? Looks like I misspoke. 7.3.2 does not link on 10.3 either. I don't know if this makes things better or worse. Here's the output from 7.3.2. 7.3.4 says the same thing. gcc -no-cpp-precomp -O3 -D__APPLE__ -Wall -Wmissing-prototypes -Wmissing-declarationsaccess/SUBSYS.o bootstrap/SUBSYS.o catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o lib/SUBSYS.o libpq/SUBSYS.o main/SUBSYS.o nodes/SUBSYS.o optimizer/SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o rewrite/SUBSYS.o storage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz -lreadline -ltermcap -lresolv -ldl -lm -o postgres ld: Undefined symbols: _tas make[2]: *** [postgres] Error 1 I purposely defined -O3 and -D__APPLE__ when I ./configure-d. With or w/o -D__APPLE__ things still fail. I had a whole mess of things going wrong with my 10.3 beta box (including a hardware problem), and I must have gotten things confused. Again, 7.3.2 doesn't link either. I don't think the OS X 10.3 betas are readily available (I've payed to be in Apple's developer program), so if you don't have access to 10.3 but have some idea as to what would cause this problem with tas, I'll do whatever I can to help test. eric ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Can't Build 7.3.4 on OS X
On Sep 21, 2003, at 3:11 PM, Tom Lane wrote: BTW, is anyone interested in looking into whether we can be made to build without using either flag? I tried it and saw a number of I did this... before I knew about -no-cpp-precomp. :( I read all about -traditional-cpp in the gcc man page, but could never find the corresponding "not traditional cpp" flag. It boiled down to two things: use of macros that used the "stringification" syntax, and whitespace around marco arguments. Take src/include/nodes/nodes.h, around line 265 for example: #define makeNode(_type_) ((_type_ *) newNode(sizeof(_type_),T_#_type_)) ... #define IsA(nodeptr, _type_) (nodeTag(nodeptr) == T_#_type_) gcc 3.3 just didn't like this. So I had to fake it out: #define T_UNDER() T_ #define makeNode(_type_) ((_type_ *) newNode(sizeof(_type_),T_UNDER()_type_)) ... #define IsA(nodeptr,_type_) (nodeTag(nodeptr) == T_UNDER()_type_) But it gets better. Apparently with gcc 3.3 whitespace around macro arguments is preserved! So, in the case of calls to (at least) the IsA macro: before: if (IsA(foo, Short)) after: if (IsA(foo,Short)) ^- no space! From what I could tell, the statement would be expanded into (using my re-defined version above): if (nodeTag(nodeptr) == T_ Short) which of course isn't legal syntax b/c of the space. So I went through with some Perl and did a bunch of global substitutions on the files that gcc complained about. There were a few more than the above examples, but not too many. too. It would be interesting to understand what the problem is. There it is. eric ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Can't Build 7.3.4 on OS X
On Sep 21, 2003, at 3:11 PM, Tom Lane wrote: Great. I was afraid it might have been new with 10.2. Also, 7.3.4 doesn't link on the OS X 10.3 beta's. Apparently tas is never being defined. I could never fix this. In the list archives I found all sorts of references to tas()/TAS and older versions of postgres on other operating systems, but I simply couldn't figure out how to make it happen on OS X 10.3. the #if defined(__APPLE__) || defined(__ppc__) section of s_lock.c was being used, as best I could tell. It defines a method named "void tas_dummy()". And s_lock.h declares "extern int tas(slock_t *lock)" and "#define TAS(lock) tas(lock)" but how those match up to tas_dummy() I don't know. Everything compiles, then ld says: ld: Undefined Symbol _tas any ideas here? 7.3.2 and 7.4beta3 compile just fine (I noticed that 7.4 has something more cross-platform for tas). What happened in 7.3.4 that broke it? eric ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings