Re: [HACKERS] How to determine that a TransactionId is really aborted?

2017-10-22 Thread Eric Ridge
> On Oct 22, 2017, at 3:24 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> 
> On Sun, Oct 22, 2017 at 2:19 PM, Eric Ridge <eeb...@gmail.com> wrote:
>> I'm looking for the status as any concurrent open transaction might see it.  
>> For example, if any concurrent transaction might see it as "in progress", 
>> that's what I'd want returned.  Does that make sense?
> 
> Maybe, but note that that's fundamentally something that can become
> stale immediately. And, just because an MVCC snapshot cannot see a row
> does not mean that it cannot affect its transaction/statement in some
> other way (e.g. unique index enforcement).

Sure, but I don't think I care if it becomes stale immediately.  If I ask "now" 
and PG says "in progress" but it then aborts a few cycles later, I'll just ask 
again sometime in the future.  It's not like a transaction is going to go from 
"aborted" back to "in progress" -- geez, at least I hope not!

I think question I want to answer is "will all active and future transactions 
see this TransationId as aborted at the time they started"?

> Again, you'll probably need to put this low level requirement into
> context if you want sound advice from this list.

I'm just thinking out lout here, but the context is likely something along the 
lines of externally storing all transaction ids, and periodically asking 
Postgres if they're known-to-be-aborted-by-all-transactions -- one at a time.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to determine that a TransactionId is really aborted?

2017-10-22 Thread Eric Ridge
> On Oct 22, 2017, at 2:50 PM, Jaime Casanova  
> wrote:
> 
> so, what you want is txid_status() [1]... while this is new in v10 you
> can use the code as guide or just migrate to v10 ;)

Oh neat, thanks.  



Doesn't that tell you the status relative to the transaction calling 
txid_status()?  

I'm looking for the status as any concurrent open transaction might see it.  
For example, if any concurrent transaction might see it as "in progress", 
that's what I'd want returned.  Does that make sense?  

That's why I was thinking GetOldestXmin() was the right thing to use rather 
than GetActiveSnapshot()->xmin and also why I thought to check 
TransactionIdPrecedes() first.

I am curious about the lock on ClogTruncationLock... could any of the 
TransactionIdDidXXX calls lie without that lock?  I haven't seen such a thing 
used in the 9.3 sources.  Maybe it's necessary for 10 or maybe I just missed it 
in 9.3?

Thanks for your time!

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to determine that a TransactionId is really aborted?

2017-10-22 Thread Eric Ridge
> On Oct 22, 2017, at 1:50 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> 
> On Sun, Oct 22, 2017 at 12:23 PM, Eric Ridge <eeb...@gmail.com> wrote:
>> Can anyone confirm or deny that this is correct?  I feel like it is correct, 
>> but I'm no expert.
> 
> What are you going to use the code for? I think that that context is
> likely to matter here.

I'm not exactly sure yet, but I'm thinking about storing transaction ids 
externally and then regularly poking Postgres to see which ones are 
aborted-and-no-longer-considered-visible so I can clean up my external list.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to determine that a TransactionId is really aborted?

2017-10-22 Thread Eric Ridge
When sitting inside an extension, and given an arbitrary TransactionId, how can 
you determine that it aborted/crashed *and* that no other active transaction 
thinks it is still running?

I've tried to answer this question myself (against the 9.3 sources), and it 
seems like it's just:

{
   TransactionId oldestXmin = GetOldestXmin (false, false);
   TransactionId xid = 42;
  
   if (TransactionIdPrecedes(xid, oldestXmin) && 
  !TransactionIdDidCommit(xid) && 
  !TransactionIdIsInProgress(xid)) /* not even sure this is necessary? */
   {
  /* xid is aborted/crashed and no active transaction cares */
   }
}

Can anyone confirm or deny that this is correct?  I feel like it is correct, 
but I'm no expert.

Thanks so much for your time!

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to get the 'ctid' from a record type?

2017-03-11 Thread Eric Ridge
On Sat, Mar 11, 2017 at 2:14 PM Andres Freund <and...@anarazel.de> wrote:

> On 2017-03-11 04:31:16 +0000, Eric Ridge wrote:
> > Well shoot.  That kinda spoils my plans.
>
> I think you should elaborate on what you're trying to achieve -
> otherwise our advice will be affected by the recent, widely reported,
> crystal ball scarcity.
>

What I'm trying to do is port https://github.com/zombodb/zombodb to
Postgres 9.6+.  It's an Access Method that stores full rows, encoded as
JSON, in Elasticsearch instead of in local storage.  It was fairly
straightforward to do the mechanical work to convert it to use 9.6's new AM
API (which is very nice, btw!), but the fact that 9.6 also disallows
including system columns (specifically ctid) has me turned upside down.

With <9.6, I was able to cook-up a scheme where it was able to answer
queries from the remote Elasticsearch index even when Postgres decided to
plan a sequential scan.  That hinged, mostly, on being able to create a
multi-column index where the first column was a function call that included
as an argument (among other things) the ctid system column.

The ability to answer sequential scans (and filters) using the remote ES
index is pretty important as the knowledge of how to do that exists in
Elasticsearch, not my custom operator function in Postgres.

Anyways, I've been trying to find a way to intuit the ctid system column
value with 9.6 and it's clear now that that just isn't possible.  The
closest I got was digging through
ActivePortal->queryDesc->estate->es_tuple, but that only works when it's a
real tuple, not one that's virtual or minimal.

I'm pretty sure that I need to be implementing a Custom Scan Provider
instead, and I've been spending time with that API too.  There's a pretty
steep learning curve for me, but I'll eventually get over that hump.

I could probably bore you with greater detail but basically, I want to take:
   CREATE INDEX idxfoo ON table USING zombodb (zdb(table),
zdb_to_json(table)) WITH (url='http://remote.ip.addr:9200/');
   SELECT * FROM table WHERE zdb(table) ==> 'some full text query' OR id =
42;

And have the "zdb(table) ==> 'some full text query'" bit be answered by my
extension, regardless of how PG wants to plan the query.  While I was able
to hack something together for <9.6, I think that means a Custom Scan
Provider now?

eric


Re: [HACKERS] How to get the 'ctid' from a record type?

2017-03-10 Thread Eric Ridge
>
> I suspect the tuple at (0,1) has been the subject of a failed update.
>

Yep.


> Your problem here is that you're mistaking the t_ctid field of a tuple
> header for the tuple's address.  It is not that; it's really just garbage
> normally, and is only useful to link forward to the next version of the
> row from an outdated tuple.  I think we do initialize it to the tuple's
> own address during an INSERT, but either a completed or failed UPDATE
> would change it.
>

Thanks.  That helps clarify the comments in htup_details.h, actually.


> I do not think there is any way to get the true address of a heap tuple
> out of a composite Datum manufactured from the tuple.  Most of the other
> system columns can't be gotten from a composite Datum either, because of
> the field overlay in HeapTupleHeaderData's union t_choice.


Well shoot.  That kinda spoils my plans.

What about this?  Is the tuple currently being evaluated (I suppose in the
case of a sequential scan) available in the context of a function call?

Thanks again for your time!  It's much appreciated.

eric


[HACKERS] How to get the 'ctid' from a record type?

2017-03-10 Thread Eric Ridge
This is about Postgres 9.6...

I have a very simple 1-arg function, in C, that I want to return the ctid
of the record passed in.  Example:

CREATE OR REPLACE FUNCTION foo(record) RETURNS tid LANGUAGE c IMMUTABLE
STRICT AS 'my_extension';

Its implementation is simply:

Datum foo(PG_FUNCTION_ARGS) {
HeapTupleHeader td = PG_GETARG_HEAPTUPLEHEADER(0);

PG_RETURN_POINTER(>t_ctid);
}

What I'm seeing is that the ctid returned from this function isn't always
correct:

# select ctid, foo(table) from table limit 10;
 ctid  |foo
---+---
 (0,1) | (19195,1)-- not correct!
 (0,2) | (0,2)
 (0,3) | (0,3)
 (0,4) | (0,4)
 (0,5) | (0,5)
 (0,6) | (0,6)
 (0,7) | (0,7)
 (1,1) | (1,1)
 (1,2) | (1,2)
 (1,3) | (1,3)
(10 rows)

I've spent hours tracing through the PG sources trying to figure out why
and/or find a different way to do this, but I've got nothing.

Of the various examples in the PG sources that use
PG_GETARG_HEAPTUPLEHEADER, or otherwise deal with a HeapTupleHeader, I
can't find any that want to access a system column, so I'm starting to
think I'm just doing it wrong entirely.

Can anyone point me in the right direction?

Thanks for your time!

eric


Re: [HACKERS] Example Custom Scan Provider Implementation?

2017-03-08 Thread Eric Ridge
On Tue, Mar 7, 2017 at 6:39 PM Amit Langote 
wrote:

>
> Here you go: https://github.com/kaigai/ctidscan


Thanks for the link, Amit!  I think that'll get me bootstrapped!

This was proposed originally [1] to go into contrib/, but that didn't
> happen somehow.
>

Too bad.  :(

eric


[HACKERS] Example Custom Scan Provider Implementation?

2017-03-07 Thread Eric Ridge
Hi all!

Does anyone know of a simple, example Custom Scan Provider implementation
for 9.6+?

I found pg_strom by searching GitHub.  Its gpuscan.c looks like maybe it
implements a pattern similar to what I want to do, but there's a lot of
extraneous (to me) stuff to parse through.

I'm kinda surprised there isn't an example in contrib/, actually.

Thanks for your time!

eric


Re: [HACKERS] Disallow unique index on system columns

2016-04-21 Thread Eric Ridge
On Wed, Apr 20, 2016 at 9:24 PM Tom Lane  wrote:

> > SELECT FROM table WHERE my_func('table', ctid) ==> 'index condition'
>

> Um, why's the ctid important here, or perhaps more directly, what is
> it you're really trying to do?
>

This function is defined as my_func(regclass, tid) and simply returns the
tid value passed in.  The operator is defined as ==>(tid, text).

Behind the scenes, the AM is actually backed by Elasticsearch, and the
tuple's ctid value is used as the "_id" in ES.

When Postgres decides to plan a sequential scan (or filter) to answer WHERE
clause conditions that use the ==>(tid, text) operator the AM isn't
involved but I still need to use the remote Elasticsearch server to answer
that condition.

So I came up with this "creative" approach to provide enough context in the
query plan for me to figure out a) which table is being used and b) which
physical row is being evaluated in the seqscan or filter.

When the operator's procedure is called, I notice that it's the first time
I've seen the FuncExpr on the LHS, go query Elasticsearch with the text
query from the RHS, build a hashtable of the matching ctids and lookup the
LHS's value in the hashtable.  If it exists, the row matches.

There just didn't seem to be enough context in the FunctionCallInfo of the
the operator's procedure to figure this out without something in the query
that's basically statically determined at parse time.

I suspect what I should be doing for this particular problem is taking
advantage of the Custom Scan API, but I'm trying to support PG 9.3.

We weren't planning to do that.
>

Great!

eric


Re: [HACKERS] Disallow unique index on system columns

2016-04-20 Thread Eric Ridge
On Sat, Apr 16, 2016 at 12:14 PM Tom Lane  wrote:

>
> Pushed.  I moved the check into DefineIndex, as that's where user-facing
> complaints about indexes generally ought to be.
>

If you're planning on back-patching this, please don't.  :)  It'll
literally ruin my life.

I've got an extension that's actually a custom Access Method, and for
reasons that are probably too boring to go into here, it requires that the
first column in the index be a function that takes the ctid.  Ie, something
akin to:

   CREATE INDEX idx ON table (my_func('table', ctid), other_func(table));

The AM implementation itself doesn't actually use the result of my_func(),
but that construct is necessary so I can detect certain queries that look
like:
SELECT FROM table WHERE my_func('table', ctid) ==> 'index condition'

I don't mind that you're changing this for 9.6... 9.6 is going to change so
much other stuff around custom AMs that I'll deal with it when the time
comes, but back-patching this into 9.3/4/5 would make life very difficult.

Thanks for listening!

eric


[HACKERS] Given a view relation OID, how to construct a Query?

2015-12-09 Thread Eric Ridge
I'm doing some extension development (in C) and have a situation where I
need to examine the target list of a view, but all I have is the view's oid.

An approach that works is (pseudocode):
   SPI_connect();
   "SELECT ev_action FROM pg_catalog.pg_rewrite WHERE rulename = '_RETURN'
and ev_class=?oid";
   Query *query = linitial(stringToNode(ev_action));
   ...
   SPI_finish();

I backed into this by tracing through pg_getviewdef().  Is there a more
direct way to do this without going through SPI?

I also looked at using postgres.c#pg_analyze_and_rewrite() against a query
like "SELECT * FROM viewname" but the target list of the actual query
wasn't what I was expecting (individual entry tags don't match those of the
SPI approach above).

Thanks for your time!

eric


Re: [HACKERS] Given a view relation OID, how to construct a Query?

2015-12-09 Thread Eric Ridge
On Wed, Dec 9, 2015 at 5:07 PM Tom Lane  wrote:

>
> FWIW, it's exposed in 9.4 and up.  But in older branches you could
> probably just copy it, it's not that big.
>

That's good to know, thanks.  I did copy it and it's almost 3x faster than
going through SPI.  Thanks again for the pointer.

eric


Re: [HACKERS] Given a view relation OID, how to construct a Query?

2015-12-09 Thread Eric Ridge
On Wed, Dec 9, 2015 at 4:04 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> Eric Ridge <eeb...@gmail.com> writes:
> > I'm doing some extension development (in C) and have a situation where I
> > need to examine the target list of a view, but all I have is the view's
> oid.
>
> Open the relation and use get_view_query(), perhaps.


I figured there was something simple, but I couldn't find it.  Thanks!
Sadly, it's static.

eric


Re: [HACKERS] Weirdness using Executor Hooks

2015-06-18 Thread Eric Ridge
On Thu, Jun 18, 2015 at 8:07 PM, Eric Ridge eeb...@gmail.com wrote:

 According to pg_proc.probin, all 32 of my C language functions point
 to $libdir/my_extension_lib,

That's a damn lie.  The deployment server has a number of databases
that have this extension installed and some of them have probin set to
just 'my_extension_lib'

short pause

Of course I can't seem to trigger the problem, even by changing the
extension, re-compiling, and re-installing.  #dang

eric


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdness using Executor Hooks

2015-06-18 Thread Eric Ridge
On Thu, Jun 18, 2015 at 4:42 PM, Andres Freund and...@anarazel.de wrote:


 How does your _PG_init() actually look like?

static ExecutorStart_hook_type prev_ExecutorStartHook = NULL;
static ExecutorEnd_hook_type prev_ExecutorEndHook  = NULL;

void _PG_init(void)
{
   prev_ExecutorStartHook = ExecutorStart_hook;
   prev_ExecutorEndHook = ExecutorEnd_hook;

   ExecutorStart_hook = my_executor_start_hook;
   ExecutorEnd_hook = my_executor_end_hook;

   RegisterXactCallback(my_xact_callback, NULL);
}

 I'd suggest adding an error check that prev_ExecutorStartHook is
 unequals to your function when you assign it.

As a data point, that might be interesting to know, but I'd still be
scratching my head about how it happened.  Postgres doesn't load an
extension library more than once per backend session, does it?

 Did you check whether it's compiler/compiler flag dependant?

I didn't.  As far as compiler flags go, the extension is just
inheriting Postgres' defaults.  Any suggestions around which flag(s)
to adjust?

I appreciate your time!

eric


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Weirdness using Executor Hooks

2015-06-18 Thread Eric Ridge
I've written an extension that hooks ExecutorStart_hook and
ExecutorEnd_hook.  The hooks are assigned in _PG_init() (and the previous
ones saved to static vars) and reset to the previous values in _PG_fini().
Maybe also of interest is the extension library is set in postgresql.conf
as a local_preload_libraries.  This is with Postgres 9.3.4.

What happens is that rarely (and of course never on my development
machine), the saved prev_ExecutorXXXHook gets set to the current value of
ExecutorXXX_hook, so when my hook function is called:

static void my_executor_start_hook(QueryDesc *queryDesc, int eflags)
{
   executorDepth++;

   if (prev_ExecutorStartHook)   /* this ends up equal to
my_executor_start_hook, so it recurses forever */
  prev_ExecutorStartHook(queryDesc, eflags);
   else
  standard_ExecutorStart(queryDesc, eflags);
}

it endless loops on itself (ie, prev_ExecutorStartHook ==
my_executor_start_hook).  Based on GDB backtraces, it looks like gcc
compiles this into some form of tail recursion as the backtraces just sit
on the line that calls prev_ExecutorXXXHook(...).  The backend has to be
SIGKILL'd.

I've followed the patterns set forth in both the 'auto_explain' and
'pg_stat_statements' contrib extensions and I've been over my code about 3
dozen times now, and I just can't figure out what's going on.  Clearly the
hooks are known-to-work, and I'm stumped.

One theory I have is that I've got a bug somewhere that's overwriting
memory, but it's quite the coincidence that only the two saved prev hook
pointers are being changed and being changed to very specific values.

Since it only happens rarely (and never for me during development), another
theory is based on the fact that this extension is under pretty constant
development/deployment and when we deploy a new binary (and run ALTER
EXTENSION UPDATE) we don't restart Postgres and so maybe the
already-active-and-initialized-with-the-previous-version backends are
getting confused (maybe the kernel re-mmaps the .so or something, I
dunno?).  I always seem to hear about the problem after a backend has been
endlessly spinning for a few days.  :(

Have any of y'all seen anything like this and could I be on the right track
with my second theory?

*scratching head*,

eric


Re: [HACKERS] Weirdness using Executor Hooks

2015-06-18 Thread Eric Ridge
On Thu, Jun 18, 2015 at 5:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 If that's broken, then so are most of our contrib modules.
 Certainly none of them have an extra check as suggested by Andres.

I'm putting it in anyways, along with a guard in the actual hook
function to ERROR if prev_hook == my_hook.  At least that'll avoid the
endless loops until I can figure this out.


 As a data point, that might be interesting to know, but I'd still be
 scratching my head about how it happened.  Postgres doesn't load an
 extension library more than once per backend session, does it?

 It's not supposed to, and AFAICS internal_load_library() will treat
 either an exact pathname match or an inode-number match as being
 already loaded.  I wonder if you might be doing something that
 confuses those checks.  It does not look like we try terribly hard
 to canonicalize library pathnames --- might you have some references
 under different relative paths, for instance?  The inode number
 check would perhaps fail if you'd installed a new library version,
 but it's unclear to me why the pathname check would fail.

According to the docs, anything listed in 'local_preload_libraries'
has to be in $libdir/plugins (whereas extensions just go to $libdir),
so rather than duplicate the .so, the copy in $libdir/plugins is
just a symlink to ../my_extension.so.

Could that be confusing things?

eric


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdness using Executor Hooks

2015-06-18 Thread Eric Ridge
On Thu, Jun 18, 2015 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, if it didn't occur to you already --- next time you see this,
 you could try examining dfmgr.c's DynamicFileList data structure
 using gdb.  That might help illuminate what Postgres thinks it
 loaded and why.

Hmm, okay.  I'll not check for the condition in the hook functions so
that when this does happen again I can do what you suggest.

thanks!

eric


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdness using Executor Hooks

2015-06-18 Thread Eric Ridge
On Thu, Jun 18, 2015 at 6:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 What we were expecting I guess is that such functions would be declared
 to reference the library underneath $libdir/plugins, not that you'd use
 a symlink.

According to pg_proc.probin, all 32 of my C language functions point
to $libdir/my_extension_lib, which makes sense because as part of the
extension .sql I declare them as LANGUAGE c AS 'MODULE_PATHNAME', and
MODULE_PATHNAME gets substituted by make.

So are you saying that we should instead declare them AS
'$libdir/plugins/my_extension_lib' so that it matches what's happening
in local_preload_libraries?  And is it safe to directly update
pg_proc.probin (and ya know, terminate all existing backends)?

Thanks so much for your time!

eric

ps, I think if we just changed our deploy process to terminate
existing backends this would just disappear, yeah?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change pg_cancel_*() to ignore current backend

2015-05-22 Thread Eric Ridge
 On May 19, 2015, at 6:59 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 
 I find it annoying to have to specifically exclude pg_backend_pid() from 
 pg_stat_activity if I'm trying to kill a bunch of backends at once, and I 
 can't think of any reason why you'd ever want to call a pg_cancel_* function 
 with your own PID.

I'm just a lurker, but regularly get bitten by this exact thing.

Rather than change the behavior of pg_cancel/terminate_backend(), why not 
change pg_stat_activity to exclude the current session?  Seems like showing a 
row in pg_stat_activity for SELECT * FROM pg_stat_activity is kinda useless 
anyways.

eric

PROPRIETARY AND COMPANY CONFIDENTIAL COMMUNICATIONS
The information contained in this communication is intended only for
the use of the addressee. Any other use is strictly prohibited.
Please notify the sender if you have received this message in error.
This communication is protected by applicable legal privileges and is
company confidential.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change pg_cancel_*() to ignore current backend

2015-05-22 Thread Eric Ridge
On Fri, May 22, 2015 at 4:51 PM Jim Nasby jim.na...@bluetreble.com wrote:

 Interesting idea. I suspect that would be even more invasive than
 modifying the functions though...


Here's the solution.  I can't see how anyone could possibly disagree with
this... ;)

Change the sort order for pg_stat_activity so the current session sorts
last.  That way all the other sessions get killed first when doing select
pg_terminate_backend(pid) from pg_stat_activity.

When I get bitten by this, I don't really care that my session gets killed,
I care that it gets killed before all the others.  Personally, I only do
this sort of thing interactively via psql, and typically ^D as soon as the
query finishes (which means the second time I run psql and add a WHERE
clause to the query).

Alternatively, queue up pg_cancel/terminate_backend calls and process them
only on successful transaction commit, in such an order that the current
session is processed last.  They'd be consistent with NOTIFY too, which
might be an added bonus.

eric

ps, sorry my last message was from corporate email w/ the stupid legal
disclaimer.


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-10-31 Thread Eric Ridge
On Fri, Oct 31, 2014 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I don't know if/when that will happen as such, but Simon was making noises
 about writing code to treat hash indexes as unlogged automatically, which
 would more or less fix the worst risks.  That's not just a special case
 for hash indexes, but any index AM that lacks WAL support, as third-party
 AMs might well do.


As someone writing a 3rd-party AM, literally right this moment, do you have
a link to that thread?  While I follow this list fairly closely I don't
remember seeing this.  I'd love to understand the thoughts around handling
extension-based AMs.

eric


[HACKERS] pg_get_indexdef() doesn't quote string reloptions

2014-10-13 Thread Eric Ridge
Hi all!

I've been working on implementing a custom index using the Index Access Method 
API and have the need for custom reloptions that are complex strings (ie, 
also contain non-alphaumerics).

pg_get_indexdef() and pg_dump don't quote the reloption values, making a 
restore (or cut-n-paste of the pg_get_indexdef() output) impossible if the 
reloption value contains non-alphanumerics.

For example, the statement:

# CREATE INDEX idxfoo ON table USING myindex (col) WITH (option = 'some 
complex string');

cannot be restored as it gets rewritten as:

CREATE INDEX idxfoo ON table USING myindex (col) WITH (option = some 
complex string);
(note the lack of quotes around the option value)

Looks like (at least) ruleutils.c:flatten_reloptions() needs to be smarter.

eric




PROPRIETARY AND COMPANY CONFIDENTIAL COMMUNICATIONS
The information contained in this communication is intended only for
the use of the addressee. Any other use is strictly prohibited.
Please notify the sender if you have received this message in error.
This communication is protected by applicable legal privileges and is
company confidential.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-11-02 Thread Eric Ridge
On Tue, Nov 1, 2011 at 11:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Doesn't mean anything, I think --- the SQL standard seems to exclude it.
 It's fairly hard to prevent it at the grammar level, since we regard
 foo.* as a type of primitive expression, but I suspect it might be a
 good idea for transformTargetList to throw an error instead of silently
 ignoring the column label.

Let me take a stab at fixing it in the grammar... if it's even do-able.

If it can be fixed at the grammar, it'll get me a lot closer to doing
what I want to do with the actual feature.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-11-01 Thread Eric Ridge
On Tue, Nov 1, 2011 at 12:03 PM, Stephen Frost sfr...@snowman.net wrote:
  Note- I haven't looked at the * production or tried to do anything w/ gram.y 
 to
 support this yet, but it's a heck of a lot shorter..

My original thought, that I probably didn't explain too clearly, was
to make the EXCLUDING (...) bit a modifier to the A_Star node.  The
idea being that you could write * EXCLUDING (...) anywhere you can
currently write *.

It's dead simple for the case of:
 SELECT * FROM ...
but because of the indirection productions in gram.y, it's literally
impossible for:
 SELECT tablename.* FROM ...
without possibly breaking existing queries.

Nonetheless, even if it were only available for the first form, it
would be very useful.  For the ad-hoc type stuff I do, it'd still be
great to write:
SELECT * EXCLUDING (x.a, x.b, x.c) FROM (SELECT  x);

I've already got gram.y working the way it needs to, and I've started
to get the exclusion list passed into the places it needs to go.

If y'all would be willing to accept it in this limited form, I'll
continue to work on it.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-11-01 Thread Eric Ridge
On Tue, Nov 1, 2011 at 12:24 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 some other idea - but only for psql

 we can define a special values, that ensure a some necessary
 preexecution alchemy with entered query

 \pset star_exclude_names col1, col2, col3
 \pset star_exclude_types xml, bytea, text(unlimited)


Sure, something like that could be useful too.  It might be confusing
to users if they forget that they set an exclusion list, but there's
probably ways to work around that.

However, the nice thing about the feature being in SQL is that you can
use it from all clients, and even in other useful ways.  COPY would be
an example (something I also do frequently):

COPY (SELECT * EXCLUDING (a, b, c) FROM big query) TO 'somefile.csv' WITH CSV;

Right now, if you want to exclude a column, you have to list all the
others out manually, or just dump everything and deal with it in an
external tool.

I generally agree with everyone that says using this in application
code is a bad idea, but I don't think that's reason alone to reject
the idea on its face.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-11-01 Thread Eric Ridge
On Tue, Nov 1, 2011 at 1:33 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 COPY (SELECT * EXCLUDING (a, b, c) FROM big query) TO 'somefile.csv' WITH 
 CSV;

 sorry, I don't accept it. I am able to understand your request for
 adhoc queries. But not for COPY.

I apologize if that example was confusing.  I wasn't also suggesting
expanding COPY's syntax.  I was merely pointing out that if
EXCLUDING(…) were implemented, you'd be able to use it within the
query given to the COPY command.

eric

ps, it seems my messages aren't hitting the list?  weird.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Eric Ridge
On Sun, Oct 30, 2011 at 1:51 PM, Eric B. Ridge e...@tcdi.com wrote:

 eric

 PROPRIETARY AND COMPANY CONFIDENTIAL COMMUNICATIONS

snip

my bad.  Switched email accounts without realizing.  :(

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Eric Ridge
On Sun, Oct 30, 2011 at 2:54 PM, Robert Haas robertmh...@gmail.com wrote:

 OTOH, I'm slightly afraid of how much work it would take to implement
 this properly.

I think first, the A_Star node struct will need to be expanded to
include a List of qualified column references to exclude.  From there,
the target_el rule in gram.y will need to be expanded to support a
syntax like:
  '*' EXCLUDING '(' columnref_list ')' { ... }
I also think that the indirection_el rule will need to be expanded
to support something similar.

Together, that would let us write both:
 SELECT * EXCLUDING(table1.col1, table2.col1) FROM ...
and
 SELECT table.* EXCLUDING(col1, col2) FROM ...
or even
 SELECT * EXCLUDING(table1.col1), table2.* EXCLUDING(col1) FROM ...

I think changing the indirection_el rule might have an impact to
OLD/NEW, but I'm not sure.  Is it legal to write OLD.*, and if so,
would you also want to write OLD.* EXCLUDING (...) in those cases?  I
think this only applies to RULES or SQL-type trigger functions, but
not pl/pgsql?

Then it looks like touching various functions in src/backend/nodes/*.c
to do the right things with the new exclusion list field in A_Star.  I
haven't traced through everything yet, but it looks like if the
various places in src/backend/nodes/*.c are done correctly, then
regurgitating a view definition or whatnot that includes this syntax
will be automatic (i.e., no specific support required for pg_dump)?

Anyways, at first I thought it would be about 8hrs of work just to get
something working.  Maybe it's more like 20, but even still, it seems
fairly straightforward.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Eric Ridge
On Sun, Oct 30, 2011 at 3:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 It's entirely possible that it will get bounced on standards-compliance
 grounds.

And that's a perfectly valid reason to reject it.

 In particular, I don't think it's acceptable to introduce a
 new reserved keyword for this --- that would fall under the fails to
 not break anything else category.

Please correct me if I'm wrong, but if we choose the word carefully
(which is why I chose EXCLUDING), I think we're okay?  EXCLUDING is
already defined as an ordinary key word.  And it's new use in this
situation seems to be completely unambiguous, such that you'd still be
able to use excluding everywhere you already could.

You know more about the grammar than I (or probably most anyone), so
I'm wondering why you think it might need to be a reserved keyword?
Alternatively, would it be okay to use an existing reserved keyword?

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Eric Ridge
On Sun, Oct 30, 2011 at 3:38 PM, Mark Mielke m...@mark.mielke.cc wrote:
 Stupid question:

 Is this just a display thing?

Well, it's a display thing as much as any SELECT statement
(especially via psql) is a display thing.  It's more like I want
all 127 columns, except the giant ::xml column, and I'm too lazy to
type each column name out by hand.

 Or does this have impact for things such as COUNT(*) vs COUNT(1)?

If it does, it's broken.

 Is it like a view, but on the fly?

Naw, it's just short-hand for omitting columns from the output target
list.  As I'm envisioning the feature, it would be SQL-level syntax,
so you could bake it into a view, but...

 1) Not standards compliant,

Sure, no doubt.  I think the ::typename syntax is not standards
compliant either, and I bet I'm not the only Postgres user to use that
every day.  But I secretly tend to agree, I wouldn't necessarily want
to use this in production/application code.

 2) Not deterministic (i.e. a database change might cause my code to break),

Okay, I'm inventing a use-case here, but say you have a users table
with various bits of metadata about the user, including password.
Maybe, regardless of database changes, you never want the password
column returned:  SELECT * EXCLUDING (password) FROM tbl_users;

Changes of omission can break your code just as easily.

 3) Working around a problem that maybe shouldn't exist in the first place? 
 It's
 a like buying a rug, so that nobody sees the scratches on the floor.

Sometimes, rugs are cheaper than new floors.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Eric Ridge
On Sun, Oct 30, 2011 at 4:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That's just a gut feeling, I've not tried it ... but the proposed
 syntax sure looks a lot like a call to a function named EXCLUDING.

I think what makes it okay is that its new use is only defined to
immediately follow an asterisk in the target_el production.  If you
look at gram.y:11578 (from git HEAD), I was thinking this:
| a_expr
{
$$ = makeNode(ResTarget);
$$-name = NULL;
$$-indirection = NIL;
$$-val = (Node *)$1;
$$-location = @1;
}
+| '*' EXCLUDING '(' columnref_list ')'
+{
+/** make magic happen */
+}
| '*'
{
ColumnRef *n = makeNode(ColumnRef);
n-fields =
list_make1(makeNode(A_Star));
n-location = @1;

$$ = makeNode(ResTarget);
$$-name = NULL;
$$-indirection = NIL;
$$-val = (Node *)n;
$$-location = @1;
}

And it looks like something similar would be necessary in the
indirection_el production, around line 11478.  But that might be
overly simplistic (and wrong).

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Eric Ridge
On Sun, Oct 30, 2011 at 4:43 PM, Darren Duncan dar...@darrenduncan.net wrote:

  SELECT foo.* EXCLUDING foo.x, bar.* EXCLUDING bar.y, baz.z, (a+b) AS c FROM
 ...
 Is that where you're going with this?

Yes.  It's basically a modifier to the star that immediately precedes
it.  In order to support excluding multiple columns, it needs parens:
 SELECT foo.* EXCLUDING (foo.x, foo.y), bar.* EXCLUDING (bar.y),
baz.z, (a+b) AS c

but yes, that's what I'm thinking.  I think doing this will require
more changes to the grammar than I had first thought because there'd
be no point in supporting:
 SELECT foo.* EXCLUDING (foo.* EXCLUDING foo.y) FROM ...
It looks like the above would be implicitly allowed without a bit of extra work.

But, if you've got a complex query consisting of a few joins, it'd be
nice to say:
 SELECT * EXCLUDING (table1.*, table2.x) FROM table1 INNER JOIN table2 ...

 If so, I think that would make the feature even more valuable and more
 syntactically clean than I had previously thought.

I don't actually like the term EXCLUDING, but it conveys what's
happening and is already defined as a keyword.  I thought about
EXCEPT, but that doesn't work for obvious reasons, and NOT might
just be confusing.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-30 Thread Eric Ridge
On Sun, Oct 30, 2011 at 3:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
snip

 fails to not break anything else category.

From what I've seen watching this list, you're usually right.  :)

It looks like it's perfectly okay to write:
SELECT pg_class.* AS foo FROM pg_class;
(with or without the AS)

I don't know what the above actually means, but it stops SELECT
pg_class.* EXCLUDING(...) dead in its tracks.  So, I'd have to break a
syntax (albeit silly?) that currently works to support this.  :(

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-29 Thread Eric Ridge
Would y'all accept a patch that extended the SELECT * syntax to let
you list fields to exclude from the A_Star?

Quite regularly I'll be testing queries via psql and want to see all
the columns from a fairly wide table except maybe a giant text or xml
column.  A syntax like:

 SELECT * EXCLUDING (big_col1, big_col2) FROM foo;

would be pretty handy.  It would definitely save some typing in
certain cases.  It seems like such a syntax would better document the
intent of a query too, rather than leaving one wondering if big_col1
was supposed to be omitted from the target list or not.

Anyways, I just wanted to run the idea by youse guys before I put too
much more effort into it.  I've already made what appear to be the
minimum necessary changes to gram.y, and a few quick greps through the
code make me think the rest will be pretty easy.

Maybe the SQL spec says something about this and nobody's done the work yet?

Thanks for your input!

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-10-29 Thread Eric Ridge
On Sat, Oct 29, 2011 at 6:35 PM, Stephen Frost sfr...@snowman.net wrote:
 Maybe the SQL spec says something about this and nobody's done the work yet?

 I don't know of anything like this in the spec.  Also, there would be
 concern about this possibly going against spec, breaking possibly valid
 queries, promoting keywords to reserved words, and maybe ending up in a
 bad situation if the SQL committee decides to support that kind of
 syntax for something completely different.

At least concerning breaking valid queries and promoting keywords, I
don't think the former can happen (they'd fail to parse today) and the
latter doesn't seem necessary as EXCLUDING's use in this case
appears to be completely unambiguous.

However, I realize there's no second-guessing what the SQL committee
might do in the future.

 In general, I doubt this is something we'd implement, but others may
 feel differently.

I hope so.  :)

 What might be interesting to consider is how hard it
 would be to make psql smarter when it comes to line editing in this
 regard.  Maybe if there was a way to easily expand the '*' from psql and
 then you could remove the columns from the list easily..?

Probably really dang hard, especially when you consider a SELECT *
involving lots of joins.  And even if it turned out to be easy, it
would be limited to psql.

Anyways, it's just something I've wanted for quite awhile and thought
I'd actually do the work to make it happen, *if* y'all would take it.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Eric Ridge
On Mon, Dec 20, 2010 at 3:36 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Mon, Dec 20, 2010 at 03:08:02PM -0500, Robert Haas wrote:
 The attached patch appears to work correctly on MacOS X.  I did check,
 BTW: getppid() in the attached process returns gdb's pid.  Poor!

 This appears to be a BSDism at least. On Linux and BSD derivatives the
 man pages specifically mention the reparenting (needed for catching
 signals) but on Linux getppid() is specifically documented to return
 the correct value anyway.

I'm just a random lurker here, and happened to catch the last bit of
this thread.  Could one of you that understand this issue straighten
something out for me?

Every now and again we've been known to attach gdb to a production
Postgres backend to troubleshoot problems.  Ya know, just trying to
get an idea of what Postgres is actually doing via a backtrace.  This
is always on Linux, BTW.

Does this thread mean that the above no longer works with v9?  Or is
this only on non-Linux systems, or did the patch Robert Haas commit
fix fix?  We're still using 8.1 (slowly moving to 8.4) in
production, but have plans of picking up 9.x later in '11.  Just
wondering if we need to actually be a bit more careful in the future?

Thanks!

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Eric Ridge
On Tue, Dec 21, 2010 at 2:33 PM, Robert Haas robertmh...@gmail.com wrote:
 The point of the patch was to improve cases where attaching gdb
 *didn't* work well.  Any cases where it was already working for you
 aren't going to be made worse by this.

Okay, great.  Thanks for the clarification.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User Defined Functions/AM's inherently slow?

2004-03-28 Thread Eric Ridge
I don't have the original thread in my inbox anymore, so for reference:
	http://groups.google.com/groups?hl=enlr=ie=UTF-8oe=UTF 
-8selm=20129.1074484755%40sss.pgh.pa.us

Tom Lane last said:
I have committed a patch into CVS HEAD --- give it a try.
It took me awhile, but I finally got around to trying this (by plopping  
it into my 7.4 sources), and yes, the hashlookup does significantly  
improve the performance of calling C functions.  I agree with your  
commit message that they're now roughly on par with built-in functions.

thanks!

eric



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Question about indexes

2004-01-30 Thread Eric Ridge
On Jan 30, 2004, at 6:31 AM, Bruce Momjian wrote:

I like the idea of building in-memory bitmapped indexes.
Me too (FWIW)!  This thread is really interesting as the whole idea 
would help to solve the biggest issue with my (currently stalled) 
project to integrate Xapian as a full-text search engine.  Getting 
index scans used in all situations would drastically improve 
performance.

eric

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] User Defined Functions/AM's inherently slow?

2004-01-17 Thread Eric Ridge
I've created a stub AM that literally does nothing.  It indexes 
nothing.  It scans for nothing.  Nadda.  It does just enough work to 
specify return values that prevent PG from dumping core.

What I've found is that this stub AM, compiled with the same options as 
postgres itself (-O2 -fno-strict-aliasing), is roughly 4 times slower 
than the built in btree AM that actually does something useful!

The test table contains 1 column, and 1 row:

My stub AM:
explain analyze select * from test where a == '1';
Total runtime: 0.254 ms
builtin btree AM:
explain analyze select * from test where a = '1';
Total runtime: 0.058 ms
(I ran each one a number times, with basically the same results).

What gives?  *scratches head*  I know btree's are efficient, but geez, 
can they really be more efficient than O(zero)?  :)  Looking at the 
backtrace from the beginscan function of each AM, PG doesn't appear to 
do anything different for user-provided AM's.

My platform is OS X 10.3.2, using PG 7.4, gcc version 3.3 (Apple's 
build #1495).

Any insight would be greatly appreciated.

Thanks!

eric

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] User Defined Functions/AM's inherently slow?

2004-01-17 Thread Eric Ridge
On Jan 17, 2004, at 10:22 PM, Tom Lane wrote:

Eric Ridge [EMAIL PROTECTED] writes:
I've created a stub AM that literally does nothing.
It's not possible for an index AM to do nothing, at least not for an
indexscan.  It has to return tuple pointers.  What are you doing for
that?
costestimate: {
PG_RETURN_VOID();
}
beginscan: {
Relation index = (Relation) PG_GETARG_POINTER(0);
int keysz = PG_GETARG_INT32(1);
ScanKey scankey = (ScanKey) PG_GETARG_POINTER(2);
IndexScanDesc scan = RelationGetIndexScan(index, keysz, scankey);
PG_RETURN_POINTER(scan);
}
rescan: {
PG_RETURN_VOID();
}
gettuple: {
PG_RETURN_BOOL(false);
}
endscan: {
PG_RETURN_VOID();
}
I think the above is about as close to nothing as one can get.

In trying to track down some performance issues with my real AM, I 
decided to make this stub AM just to see what the overhead is... then I 
started seeing these crazy results.

eric

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] User Defined Functions/AM's inherently slow?

2004-01-17 Thread Eric Ridge
On Jan 17, 2004, at 10:22 PM, Tom Lane wrote:

Eric Ridge [EMAIL PROTECTED] writes:
I've created a stub AM that literally does nothing.
It's not possible for an index AM to do nothing, at least not for an
indexscan.  It has to return tuple pointers.  What are you doing for
that?
I should have included the entire explain output:

stub AM:
 Index Scan using idxa_stub on test2  (cost=0.00..2.68 rows=1 width=5) 
(actual time=0.002..0.002 rows=0 loops=1)
   Index Cond: (a == '1'::text)
 Total runtime: 0.247 ms

builtin btree AM:
Index Scan using idxa_btree on test2  (cost=0.00..4.68 rows=1 width=5) 
(actual time=0.024..0.026 rows=1 loops=1)
   Index Cond: (a = '1'::text)
 Total runtime: 0.060 ms

If the actual time numbers are really a measure of the amount of time 
spent in (at least) the index, it seems the stub should report a 
smaller total runtime, but alas, it doesn't.

eric

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] User Defined Functions/AM's inherently slow?

2004-01-17 Thread Eric Ridge
On Jan 17, 2004, at 11:27 PM, Tom Lane wrote:

Eric Ridge [EMAIL PROTECTED] writes:
costestimate: {
PG_RETURN_VOID();
}
This at least needs to set some values into the output parameters ---
zeroes are okay, not setting them at all isn't.  I'm surprised the
planner doesn't go nuts.  It looks from your EXPLAIN results like
the values are coming up zero anyway, but seeing that cost_index()
doesn't do anything to initialize those local variables, I'd expect
fairly unpredictable behavior.
I have tried setting them all to zero, and even using the 
backend/utils/adt/selfuncs.c:genericcostestimate() code (it's not 
exported, it seems), but no matter what sane (or crazy!) numbers I 
provide for the cost estimate, the fact remains, my AM, which does 
nothing is slower than the builtin btree AM.

Could this just be related to the fact that my AM is in a .so, and 
there's just some operating system/C runtime overhead in calling 
functions in dynamically loaded libraries?

eric

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] User Defined Functions/AM's inherently slow?

2004-01-17 Thread Eric Ridge
On Jan 17, 2004, at 11:33 PM, Tom Lane wrote:
The difference between total runtime and the top plan node's runtime
has to represent plan startup/shutdown time.  I'm suspicious that your
stubs are somehow not initializing something, though on first glance I
do not see what.
I can't see anything either... which is why I brought it up.  I'm still 
a noob with this stuff, and thought maybe I was just missing something.

Theory B would be that there's some huge overhead in calling 
non-built-in
functions on your platform.  We do know that looking up a C function 
is
significantly more expensive than looking up a builtin function, but
there should only be half a dozen such calls involved in this test 
case;
it's hard to credit that that takes 200 msec.  Does the time drop at 
all
on second and subsequent repetitions in a single backend run?
Yes, it drops from about .680ms to the .250ish that I posted.

I suppose I should try compiling this little stub into postgres, eh?

eric



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] User Defined Functions/AM's inherently slow?

2004-01-17 Thread Eric Ridge
On Jan 18, 2004, at 1:34 AM, Christopher Kings-Lynne wrote:

Theory B would be that there's some huge overhead in calling 
non-built-in
functions on your platform.  We do know that looking up a C 
function is
significantly more expensive than looking up a builtin function, 
but
there should only be half a dozen such calls involved in this test 
case;
it's hard to credit that that takes 200 msec.  Does the time drop at 
all
on second and subsequent repetitions in a single backend run?
Yes, it drops from about .680ms to the .250ish that I posted.
I suppose I should try compiling this little stub into postgres, eh?
What if you try the new preload_libraries (or whatever it's called) 
config variable in postgresql.conf in the 7.4 release?
yes, that takes care of the initial load time of the library itself 
(bringing the initial .680ms back to .250ish for the first run), but 
this is not the problem.

eric

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: Postgres + Xapian (was Re: [HACKERS] fulltext searching via a custom index type )

2004-01-08 Thread Eric Ridge
Thanks to everyone that provided input about this idea.  After taking 
everything into consideration and talking with Olly Betts from the 
Xapian project, what I'm going to do is implement a btree that supports 
multiple roots and an tag value of arbitrary length for each item in 
the tree.  Then implement a new Xapian backend that uses it.  In the 
end, it'll be much more efficient and much less dirty than this silly 
filesystem thing I have now.  And hopefully, concurrency and WAL will 
be easier to deal with too.  Having the existing nbtree AM as guide 
will be very useful.

I have one issue that could potentially make all of this completely 
useless, and it's related to Postgres deciding to use a Filter instead 
of an Index Scan.

I asked this question last week, and Tom Lane responded with a 
solution, but I don't think I explained myself very well.  And now that 
I've got a bit more experience with some of the PG internals, maybe I 
can ask the question more intelligently.

Given this query:
 select id from table where document_text == 'food' and title == 
'water';

Postgres generates this plan:
 Index Scan using idxtitle on table  (cost=0.00..4.01 rows=1 width=8)
   Index Cond: (title == 'water'::text)
   Filter: (document_text == 'food'::text)
The problem is, the document_text column is *really* big.  Average 
value length is 171k.

With this query plan, my operator procedure is forced to re-parse the 
document_text column from each row returned by the index scan against 
the title column, and do a bunch of Xapian tricks for each row.  The 
overhead is huge.

The composite index solution doesn't seem ideal here because there's 
absolutely no way I can anticipate every combination of fields a user 
might choose to search.

Forgetting about composite indexes for a moment, is postgres even 
capable of doing 2 index scans in this situation?  Does it know how do 
take the intersection of two scans?

My AM's cost estimate function literally sets the selectivity, 
correlation, total cost, and startup cost values to zero (and I've 
tried tons of combinations of really big, really small, and really 
negative values).  My thought behind setting them all to zero was that 
if the cost function basically says, There is no cost, I could fool 
Postgres into wanting to use the index everywhere it can.  Sadly, this 
doesn't work out.

Now, I realize I can fix my cost estimate function to return better 
costs for the title and document_text fields so that PG will instead 
decide to index scan on document_text, but what I really want to do is 
make PG do an index scan for each field.

Can PG even do this?

I'd appreciate any thoughts on this.  Hopefully, I'm just missing 
something really obvious.

thanks!

eric

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: Postgres + Xapian (was Re: [HACKERS] fulltext searching via a custom index type )

2004-01-05 Thread Eric Ridge
On Jan 2, 2004, at 4:54 PM, Alvaro Herrera wrote:
I think your approach is too ugly.  You will have tons of problems the
minute you start thinking about concurrency (unless you want to allow
only a single user accessing the index)
It might be ugly, but it's very fast.  Surprisingly fast, actually.

Concerning concurrency, Xapian internally supports multiple readers and 
only 1 concurrent writer.  So the locking requirements should be far 
less complex than a true concurrent solution.  Now, I'm not arguing 
that this ideal, but if Xapian is a search engine you're interested in, 
then you've already made up your mind that you're willing to deal with 
1 writer at a time.

However, Xapian does have built-in support for searching multiple 
databases at once.  One thought I've had is to simply create a new 
1-document database on every INSERT/UPDATE beyond the initial CREATE 
INDEX.  Then whenever you do an index scan, tell Xapian to use all the 
little databases that exist in the index.  This would give some bit of 
concurrency.  Then on VACUUM (or FULL), all these little databases 
could be merged back into the main index.

and recovery (unless you want to force users to REINDEX when the 
system crashes).
I don't yet understand how the WAL stuff works.  I haven't looked at 
the API's yet, but if something you can record is write these bytes to 
this BlockNumber at this offset, or if you can say, index Tuple X 
from Relation Y, then it seems like recovery is still possible.

If ya can't do any of that, then I need to go look at WAL further.

I think one way of attacking the problem would be using the existing
nbtree by allowing it to store the five btrees.  First read the README
in the nbtree dir, and then poke at the metapage's only structure.  You
will see that it has a BlockNumber to the root page of the index.
Right, I had gotten this far in my investigation already.  The daunting 
thing about trying to use the nbtree code, is the a code itself.  It's 
very complex.  Plus, I just don't know how well the rest of Xapian 
would respond to all of a sudden having a concurrent backend.  It's 
likely that it would make no difference, but it's just an unknown to me 
at this time.

Try modifying that to make it have a BlockNumber to every index's root 
page.
You will have to provide ways to access each root page and maybe other
nonstandard things (such as telling the root split operation what root
page are you going to split), but you will get recovery and concurrency
(at least to a point) for free.
And I'm not convinced that recovery and concurrency would be for free 
in this case either.  The need to keep essentially 5 different trees in 
sync greatly complicates the concurrency issue, I would think.

thanks for your time!

eric

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] fulltext searching via a custom index type

2003-12-26 Thread Eric Ridge
(I started with this addressed to the -general list, but it just 
doesn't seem like a general usage topic.  If -hackers is the wrong 
place, please point me in the right direction).

I've been working on a custom index type (access method) that allows 
postgres to do fulltext searching (including phrase and proximity) that 
I believe is more flexible and natural than tsearch2 (no offense to 
the tsearch2 guys... it's been a source of inspiration).  I also plan 
on providing term browsing and hit-position information.

I'm using Xapian (www.xapian.org) as the backend fulltext engine, and 
I'm trying to learn more of the innards of postgres as I go.  I've only 
spent about 18-24 hours on this, so it's nowhere near complete, and I 
really hesitate to even mention it in a public forum.  But if in the 
end it doesn't suck it might make a great contrib/ package.  *shrug*  
who knows?

In a nutshell, I've implemented all the necessary access method 
functions to teach postgres about a new index type named xapian, and 
so far, everything is really great.  It works just like any other index 
type would:

create table test (stuff varchar(255), more_stuff text);
create index idxstuff on test using xapian (stuff);
create index idxmore_stuff on test using xapian (more_stuff);
	insert into test (stuff, more_stuff) values ('this is stuff', 'this is 
more stuff');
	insert into test (stuff, more_stuff) values ('my name is eric ridge', 
'i like to drink beer');

	select * from test where stuff = 'stuff' or more_stuff = 'drink 
beer';
 stuff |  more_stuff
---+--
 this is stuff | this is more stuff
 my name is eric ridge | i like to drink beer
(2 rows)

All this aside, I've got some questions related to indexes and query 
plans.

1) Is it possible for an access method to receive some kind of DROP 
INDEX notification?  As far as I can tell, the answer is no, but it 
can't hurt to ask.

2) When does the query planner decide that it needs to Filter 
results, and is there any way to turn this off or otherwise fool the 
query planner into NOT doing Filters (even if only for certain 
operators)?

For example, this query:
	select * from test where stuff = 'stuff' AND NOT more_stuff = 
'drink beer';
has this plan:
	Index Scan using idxstuff on test  (cost=0.00..-0.98 rows=250 
width=177)
   		Index Cond: ((stuff)::text = 'stuff'::text)
   		Filter: (NOT (more_stuff = 'drink beer'::text))

In this case, postgres is forced to re-parse the contents of the 
more_stuff field (via the defined procedure for the = operator) for 
every row returned by the previous index scan, just so it can determine 
if the field contains the phrase 'drink beer' or not.  Since so much 
overhead is involved here, it would be cool if postgres could somehow 
do another index scan.

Maybe there's some way for the operator function to know not only the 
Datum value, but the actual field (and ItemPointer)?

3) How does one get the $PGDATA directory? getenv() doesn't seem ideal 
since PGDATA can be specified as a command-line argument.

4) Can a function be registered as part of a transaction, pre-commit -- 
so the function can have an opportunity to abort the transaction.  I've 
seen RegisterEOXactCallback, but it's not quite what I'm looking for.

5) are there discussions in the archives of this list (or other pgsql- 
lists) that discuss fulltext searching that y'all think are worth 
reading?

thanks in advance for your time and input!

eric

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] fulltext searching via a custom index type

2003-12-26 Thread Eric Ridge
On Dec 26, 2003, at 3:22 PM, Tom Lane wrote:
3) How does one get the $PGDATA directory?
DataDir.  Why should you care?  An index AM that wants to know this is
probably broken IMHO; it's certainly trying to do something that's
outside the charter of index AMs, and is likely to cause lots of
headaches.
Xapian has it's own storage subsystem, and that's what I'm using to 
store the index... not using anything internal to postgres (although 
this could change).  As such, Xapian needs to know *where* to save its 
indexes... $PGDATA seemed like a good place to start.

4) Can a function be registered as part of a transaction, pre-commit 
--
so the function can have an opportunity to abort the transaction.
Why would that be a good idea?  When exactly would you expect it to be
called (relative to the other ninety-nine things that happen at 
commit)?
How are you going to recover if something else aborts the transaction,
either before or after your function runs?
I don't really have an answer to this.  :)

I get the impression from your questions that you are trying to make an
end run around the transaction mechanism.
Perhaps.  I'm actually fishing for ideas to bridge xapian's transaction 
facilities to postgres.  Your comment confirms my suspicions that it 
just ain't gunna work out.

This is almost certainly doomed to failure.  You need to find a way of 
storing all your data
within the ordinary index structure.
You are probably right.  :)  I'm just playing around right now.  I do 
appreciate your response, it's given me a lot to think about.

eric

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] fulltext searching via a custom index type

2003-12-26 Thread Eric Ridge
On Dec 26, 2003, at 4:04 PM, Tom Lane wrote:

Eric Ridge [EMAIL PROTECTED] writes:
Xapian has it's own storage subsystem, and that's what I'm using to
store the index... not using anything internal to postgres (although
this could change).
I would say you have absolutely zero chance of making it work that way.
thanks for the encouragement!  :)

You will not be able to get it to interoperate reliably with
transactions, checkpointing, or WAL replay; to say nothing of features
we might add in the future, such as tablespaces and point-in-time 
recovery.
You need to migrate all the data into the Postgres storage mechanism.
And these are the things I'm struggling with now.  The basic indexing 
and searching currently works flawlessly, but the moment another user 
connects up, everything goes to hell.

It might be worth pointing out here than an index AM is not bound to 
use
exactly the typical Postgres page layout.  I think you probably do have
to use the standard page header, but the page contents don't have to
look like tuples if you don't want 'em to.  For precedent see the hash
index AM, which stores ordinary index tuples on some index pages but
uses other pages for its own purposes.
That's useful information.  Thanks.  I've been using the hash AM as my 
guide b/c it's not as complex as the other index types (atleast on the 
public interface side).  Obviously, I'm trying to save the time and 
energy of re-inventing the wheel when it comes full text indexing and 
searching.  Xapian is an awesome standalone engine (and it's amazingly 
fast too!), so it seemed like a good place to start.  It's backend 
storage subsystem is pluggable, and after our little exchange here 
today, I'm now considering writing a postgres backend for Xapian.

I assume the doc chapter on Page Files and the various storage-related 
README files are good places for more information.  Any other tips or 
pointers?

eric

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [GENERAL] Can't Build 7.3.4 on OS X

2003-09-22 Thread Eric Ridge
On Sep 21, 2003, at 9:03 PM, Tom Lane wrote:

Eric Ridge [EMAIL PROTECTED] writes:
any ideas here?  7.3.2 and 7.4beta3 compile just fine (I noticed that
7.4 has something more cross-platform for tas).  What happened in 
7.3.4
that broke it?
That makes no sense at all --- AFAICT there were *no* darwin or ppc
specific changes between 7.3.2 and 7.3.4.  Can you double check?
Looks like I misspoke.  7.3.2 does not link on 10.3 either.  I don't 
know if this makes things better or worse.  Here's the output from 
7.3.2.  7.3.4 says the same thing.

gcc -no-cpp-precomp -O3 -D__APPLE__ -Wall -Wmissing-prototypes 
-Wmissing-declarationsaccess/SUBSYS.o bootstrap/SUBSYS.o 
catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o 
lib/SUBSYS.o libpq/SUBSYS.o main/SUBSYS.o nodes/SUBSYS.o 
optimizer/SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o 
rewrite/SUBSYS.o storage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz 
-lreadline -ltermcap -lresolv -ldl -lm  -o postgres
ld: Undefined symbols:
_tas
make[2]: *** [postgres] Error 1

I purposely defined -O3 and -D__APPLE__ when I ./configure-d.  With or 
w/o -D__APPLE__ things still fail.  I had a whole mess of things going 
wrong with my 10.3 beta box (including a hardware problem), and I must 
have gotten things confused.  Again, 7.3.2 doesn't link either.

I don't think the OS X 10.3 betas are readily available (I've payed to 
be in Apple's developer program), so if you don't have access to 10.3 
but have some idea as to what would cause this problem with tas, I'll 
do whatever I can to help test.

eric

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Can't Build 7.3.4 on OS X

2003-09-21 Thread Eric Ridge
On Sep 21, 2003, at 3:11 PM, Tom Lane wrote:

Great.  I was afraid it might have been new with 10.2.
Also, 7.3.4 doesn't link on the OS X 10.3 beta's.  Apparently tas is 
never being defined.  I could never fix this.  In the list archives I 
found all sorts of references to tas()/TAS and older versions of 
postgres on other operating systems, but I simply couldn't figure out 
how to make it happen on OS X 10.3.

the #if defined(__APPLE__) || defined(__ppc__)  section of s_lock.c was 
being used, as best I could tell.  It defines a method named void 
tas_dummy().  And s_lock.h declares extern int tas(slock_t *lock)  
and #define TAS(lock)  tas(lock) but how those match up to 
tas_dummy() I don't know.

Everything compiles, then ld says:

ld: Undefined Symbol
_tas
any ideas here?  7.3.2 and 7.4beta3 compile just fine (I noticed that 
7.4 has something more cross-platform for tas).  What happened in 7.3.4 
that broke it?

eric

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Can't Build 7.3.4 on OS X

2003-09-21 Thread Eric Ridge
On Sep 21, 2003, at 3:11 PM, Tom Lane wrote:

BTW, is anyone interested in looking into whether we can be made to
build without using either flag?  I tried it and saw a number of
I did this... before I knew about -no-cpp-precomp.   :(  I read all 
about -traditional-cpp in the gcc man page, but could never find the 
corresponding not traditional cpp flag.

It boiled down to two things:  use of macros that used the 
stringification syntax, and whitespace around marco arguments.

Take src/include/nodes/nodes.h, around line 265 for example:

#define makeNode(_type_)  ((_type_ *) newNode(sizeof(_type_),T_#_type_))
...
#define IsA(nodeptr, _type_) (nodeTag(nodeptr) == T_#_type_)
gcc 3.3 just didn't like this.  So I had to fake it out:

#define T_UNDER()  T_
#define makeNode(_type_)  ((_type_ *) 
newNode(sizeof(_type_),T_UNDER()_type_))
...
#define IsA(nodeptr,_type_) (nodeTag(nodeptr) == T_UNDER()_type_)

But it gets better.   Apparently with gcc 3.3 whitespace around macro 
arguments is preserved!  So, in the case of calls to (at least) the IsA 
macro:

before:  if (IsA(foo, Short))
after:  if (IsA(foo,Short))
^- no space!
From what I could tell, the statement would be expanded into (using my 
re-defined version above):
   if (nodeTag(nodeptr) == T_ Short)

which of course isn't legal syntax b/c of the space.

So I went through with some Perl and did a bunch of global 
substitutions on the files that gcc complained about.  There were a few 
more than the above examples, but not too many.

too.  It would be interesting to understand what the problem is.
There it is.

eric

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster