Re: [HACKERS] problem with install scripts.

2007-04-08 Thread Tzahi Fadida
On Sunday 08 April 2007 21:34:52 Peter Eisentraut wrote:
> Tzahi Fadida wrote:
> > What should i do?
>
> Telling what version you use and what error messages you see would help.

  SET client_min_messages = warning;
  \set ECHO none
+ ERROR:  unrecognized configuration parameter "standard_conforming_strings"
+ ERROR:  unrecognized configuration parameter "escape_string_warning"
  RESET client_min_messages;

#define CATALOG_VERSION_NO  200411041


-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] problem with install scripts.

2007-04-08 Thread Tzahi Fadida
Hi,
I used the development version of PostgreSQL to compile my module 
fulldisjunctions. The install scripts works well there (passes).
However, i wanted to check if it still works (backward compatible) with the 
stable version and it complains on these lines in the sql script when i do 
make installcheck:
SET standard_conforming_strings = off;
SET escape_string_warning = off;

When i remove this lines the stable version make installcheck passes.

What should i do?
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Problems with the varlena patch in my module

2007-04-07 Thread Tzahi Fadida
On Saturday 07 April 2007 04:08:34 Tom Lane wrote:
> Tzahi Fadida <[EMAIL PROTECTED]> writes:
> > This is an excerpt from my code:
> > newtset->tids = (bytea *) fastgetattr(tupleTSet, LABELS_ALIGNED,
> > fctx->tupleSetDesc, &isnull);
> >
> > It seems that for an empty bytea (only the size of the header), i get
> > that VARSIZE(newtset->tids)==534765440
> > instead of VARHDRSZ.
>
> That code has always been broken, you just failed to exercise the

It always worked before the patch so it was hard to know that. 
You see, i knew from the start that this attribute would never reach certain 
sizes so i never thought it would be toasted.

> problem before.  There needs to be a detoasting call there, not
> merely a cast.  DatumGetByteaP() would probably be appropriate.

Thanks. One more question though, what is the CATALOG_VERSION_NO
of the varlena patch? 

The second problem i have (it is a bit long but simple so please bear with 
me):
You see, before there were slots (at least stable working slots)
tuples (slot_getattr, etc...), i needed this functionality.
So, i created the following function (which broke because of the varlena patch 
on the att_align function):
bool
heap_deformtuple_iterative(HeapTuple tuple,
   TupleDesc tupleDesc,
   Datum *values,
   char *nulls, 
   deformTupleIterativeState * ds,
   int tillAttNum, bool 
finishAll)

Where tuple is tuple,
tupleDesc is of course TupleDesc.
values are to store the deformed values (the array is already allocated).
nulls, same.
ds is a pointer to a structure i made to save the state of the 
heap_deformtuple in mid work.
tillAttNum is a number of attributes from the start to deform up to.
finishAll - deform all the remaining attributes.

Now, i want to move to the more standard way to do this. However, i want to 
keep the structure where i have a tuple->t (the tuple), tuple->v (the values 
array), tuple->n (the nulls array). I keep many tuples in memory like this 
and i can't change this now. I want to keep in memory only the heaptuple 
tuple->t and the deformed arrays tuple->v, tuple->n and possibly another 
needed (such as ds) structure for as long as the tuple is not completely 
deformed.

What do you recommend (aside from me reimplementing the heap_deformtuple 
iterative again hoping it won't break again)?


-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Problems with the varlena patch in my module

2007-04-06 Thread Tzahi Fadida
Hi,
I am having trouble with fixing my code for this recent varlena patch:

http://archives.postgresql.org/pgsql-committers/2007-04/msg00081.php

My module is fulldisjunctions.
I have several problems but first i wish to address the following.
This is an excerpt from my code:
newtset->tids = (bytea *) fastgetattr(tupleTSet, LABELS_ALIGNED, 
fctx->tupleSetDesc, &isnull);

It seems that for an empty bytea (only the size of the header), i get that 
VARSIZE(newtset->tids)==534765440
instead of VARHDRSZ.

What can i do?
10x.

-- 
Regards,
    Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] OT: Is there a LinkedIn group for Postgresql?

2006-10-09 Thread Tzahi Fadida
Groups are created by some kind of organization. For example, a mailing list
of java professionals i am listed on. There is no need to advertise because if 
you are a part of that organization you can ask the organization leader to add 
you to the group. Seeing that no one has volunteered a group for PostgreSQL
in the list suggests there is no such group.

(P.s. if someone will open one in the future and find this message in the 
archive, please add me to the group. 10x.)

On Monday 09 October 2006 21:22, Jim C. Nasby wrote:
> On Mon, Oct 09, 2006 at 05:56:41PM +, Chris Browne wrote:
> > [EMAIL PROTECTED] (Tzahi Fadida) writes:
> > > Hi,
> > > Is there a LinkedIn group for Postgresql/Hackers list.
> > > If there is, how can i join?
> >
> > The usual way LinkedIn works is that if there are people you know that
> > do PostgreSQL work, they may link to others doing the same.  You
> > should probably see about linking to people you know; you will likely
> > then find relevant relationships.
>
> I believe there's also some kind of group functionality, though I don't
> know how to find it on the site.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] OT: Is there a LinkedIn group for Postgresql?

2006-10-09 Thread Tzahi Fadida
Hi,
Is there a LinkedIn group for Postgresql/Hackers list.
If there is, how can i join?
Thank you.

-- 
Regards,
 tzahi.

Itzhak Fadida
M.Sc - Technion, Information Systems, IE Faculty
Home Page: Http://tzahi.webhop.info
BLOG: Http://tzahi.blogsite.org
LinkedIn: http://www.linkedin.com/in/tzahi

WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Adding fulldisjunctions to the contrib

2006-08-12 Thread Tzahi Fadida
On Saturday 12 August 2006 07:22, Bruce Momjian wrote:
> I am still waiting for someone to tell us that they would use this
> capability for a real-world problem.

I suggest looking into web applications.
The example here
http://www.technion.ac.il/~tzahi/soc.html

shows a possible 3 separate web resources.
I.e. heterogeneous sources. Naturally, since the sources
did not know each other in advance, they did not form
relations that would not end up cyclic in the scheme graph.
XMLs are usually like these. Obviously you have to turn them into
relations first of course.
In addition, i have recently added a feature where you give alias to column 
names so if you have "country" column and a "state" column that really means
country, you can do "country=public.relation_with_state.state,..." dictionary 
style. This is commonly needed in web applications.

Here is another example (improvising :) ):
site1: user_name,email,favorite_book_isbn
site2: user_name,email,favorite_chat_room
site3: user_name,credit_card

So, let's say i wanted to advertise discounts using a certain credit card
for certain books, i would do FD(site1,site2,site3).
Natural join will give - so you get data on people who read some books and 
visit certain chat rooms and users credit cards.
FD will give - some people did not buy books but have a credit card and a 
chat room so you want to advertise anyway. 
Some people did buy books and uses
a certain credit cards but you don't know where they chat, however,
you know you want to adv some best seller that most buy anyway.
certain people did buy books and visit chat rooms but you can't offer
a specific discount, so you will advertise all credit cards.
...

However, caution. FD is a very,very expensive operation even with the new 
algorithms so it is best to do FD separately and put the results into a table 
and use that table. Unless of course, as common to web applications, the 
relations are quite small (few thousands of rows) and they don't connect 
strongly. In this cases, on my p1.6 it comes out about 2-3 secs.
However, i can generate the same experiment with strong connectivity
between the relations and it can take hours to compute.
On the other hand i have seen experiments with 100 thousans of records
that finished in a matter of minutes so it all depends on how many join
combination there are in the data.

>
> -------
>
> Tzahi Fadida wrote:
> > On Friday 11 August 2006 07:18, Bruce Momjian wrote:
> > > I have looked over this addition, and I think I finally understand it.
> > > Given three tables, A, B, C, which join as A->B, B->C, C->A, you can
> > > really join them as A->B->C, and A->C->B.  What full disjunction does
> > > is to perform both of those joins, and return a one row for each join.
> > > Here
> >
> > What it does is to return all the possible natural joins, i.e.:
> > A
> > B
> > C
> > A,B
> > A,C
> > ...
> > A,B,C
> >
> > And, it removes any redundant information so that if we have a tuple
> > that already contains another tuple's information that tuple is
> > discarded. Also, note that the full disjunction algorithm i implemented
> > is commonly used in cases where the scheme graph is cyclic
> > and thus, you cannot use natural full outer join
> > to compute the FD.
> >
> > Finally, you can FD(A,B,C,D,...) any number of relations (limited to 32
> > in the implementation) with no regard to the order between them.
> >
> > A case study and comparison can be found here:
> > http://www.technion.ac.il/~tzahi/soc.html
> >
> > > is an example from the README:
> > >
> > > Example of an input and output of a full disjunctions:
> > > INPUT:
> > >
> > > --A---|---B---|---C--
> > > X---Y-|-Y---Z-|-X---Z
> > > a-|-b-|-b-|-c-|-a-|-d
> > >
> > > A,B and C are relations. X,Y and Z are attributes. a,b,c and d are
> > > values.
> > >
> > > Note that A,B and C are connected in a cycle. That is:
> > > A is connected to B on attribute Y,
> > > B is connected to C on attribute Z,
> > > C is connected to A on attribute X.
> > >
> > > The output of the full disjunctions FD(A,B,C):
> > >
> > >FD
> > > X---Y---Z
> > > a-|-b-|-c
> > > a-|-b-|-d
> > >
> > > This code is pretty complex, so I can see why it should be in /contrib.
> > > Are there reasonable use cases for this capability?
&

Re: [HACKERS] [PATCHES] Adding fulldisjunctions to the contrib

2006-08-12 Thread Tzahi Fadida
On Friday 11 August 2006 07:18, Bruce Momjian wrote:
> I have looked over this addition, and I think I finally understand it.
> Given three tables, A, B, C, which join as A->B, B->C, C->A, you can
> really join them as A->B->C, and A->C->B.  What full disjunction does is
> to perform both of those joins, and return a one row for each join. Here

What it does is to return all the possible natural joins, i.e.:
A
B
C
A,B
A,C
...
A,B,C

And, it removes any redundant information so that if we have a tuple
that already contains another tuple's information that tuple is discarded.
Also, note that the full disjunction algorithm i implemented 
is commonly used in cases where the scheme graph is cyclic 
and thus, you cannot use natural full outer join
to compute the FD.

Finally, you can FD(A,B,C,D,...) any number of relations (limited to 32 in
the implementation) with no regard to the order between them.

A case study and comparison can be found here:
http://www.technion.ac.il/~tzahi/soc.html

> is an example from the README:
>
> Example of an input and output of a full disjunctions:
> INPUT:
>
> --A---|---B---|---C--
> X---Y-|-Y---Z-|-X---Z
> a-|-b-|-b-|-c-|-a-|-d
>
> A,B and C are relations. X,Y and Z are attributes. a,b,c and d are
> values.
>
> Note that A,B and C are connected in a cycle. That is:
> A is connected to B on attribute Y,
> B is connected to C on attribute Z,
> C is connected to A on attribute X.
>
> The output of the full disjunctions FD(A,B,C):
>
>FD
> X---Y---Z
> a-|-b-|-c
> a-|-b-|-d
>
> This code is pretty complex, so I can see why it should be in /contrib.
> Are there reasonable use cases for this capability?
>
> ---
>
> Tzahi Fadida wrote:
> > Hi,
> > I wish to add the fulldisjunctions function to the contrib.
> > With the help of Jonah, we (or rather he :) created a patch with
> > regression tests. The function is finished programmatically but
> > still a little more code documentation touches and improved error
> > messages are needed. All the rest was extensively tested.
> >
> > Attached is the patch.
> >
> > Works great. Just compiled from a fresh cvs which i patched with the
> > attached diff. ran the fulldijsjunction.sql in the
> > share/contrib/fulldisjunction and let it run and it works great.
> > 10x.
> >
> > --
> > Regards,
> > Tzahi.
> > --
> > Tzahi Fadida
> > Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
> > WARNING TO SPAMMERS: ?see at
> > http://members.lycos.co.uk/my2nis/spamwarning.html
>
> [ Attachment, skipping... ]
>
> > ---(end of broadcast)---
> > TIP 3: Have you checked our extensive FAQ?
> >
> >http://www.postgresql.org/docs/faq

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] 64 bits bitwise operations support

2006-07-30 Thread Tzahi Fadida
Is there a way to tell if there is support for 64 bits bitwise operations 
since c.h apparently defines BUSTED for no support for 64 bits variables.
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pgindet ^M

2006-07-29 Thread Tzahi Fadida
These are .c and .h files on linux.
Also i am using the patched indent and compiled the entab, etc...
From 8.2

On Saturday 29 July 2006 17:57, Andrew Dunstan wrote:
> Tzahi Fadida wrote:
> > I am trying to use pgindent but it adds ^M in my files.
> > However, it doesn't add it to postgresql src files (i tried).
> > I guess it is maybe something with encoding?
> > What can i do?
> > 10x.
>
> On what platform and on what files?
>
> cheers
>
> andrew

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] pgindet ^M

2006-07-29 Thread Tzahi Fadida
I am trying to use pgindent but it adds ^M in my files.
However, it doesn't add it to postgresql src files (i tried).
I guess it is maybe something with encoding?
What can i do?
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Formulating an sql query with CTID

2006-07-29 Thread Tzahi Fadida
10x, 
Found my answer thanks to Ragnar.
SELECT DISTINCT ON (a0,a1) a0,a1,ctid FROM t1 

To tell the truth, i did not know there was an ON option in SQL.
This thing is very handy.

On Saturday 29 July 2006 03:37, Tzahi Fadida wrote:
> Hi,
> I have a query i am having trouble to formulate:
> I used to do:
> SELECT DISTINCT a0,a1 FROM public.t1
> However, now i need to add the CTID attribute, but CTID is unique
> thus, distinct is useless. In addition, i can't seem to be able to use
> max() etc... or casting to anything on CTID in order to use group by
> technique.
>
> What are my options?
>
> (I am using 8.2 or 8.1)

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Formulating an sql query with CTID

2006-07-28 Thread Tzahi Fadida
Hi,
I have a query i am having trouble to formulate:
I used to do:
SELECT DISTINCT a0,a1 FROM public.t1
However, now i need to add the CTID attribute, but CTID is unique
thus, distinct is useless. In addition, i can't seem to be able to use max()
etc... or casting to anything on CTID in order to use group by technique.

What are my options?

(I am using 8.2 or 8.1)

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Maximum size of tuples in a relation

2006-07-24 Thread Tzahi Fadida
Is there a way to figure out from the catalogs somehow the largest size in 
bytes of the largest tuple in a relation without going one by one over the 
relation and calculating that. alternatively, without actually pulling out 
the whole tuple from the disk?
(It is necessary for the algorithm to know that value).
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] missing tuplestore_gettuple

2006-07-18 Thread Tzahi Fadida
Hi,
In 8.2 the function tuplestore_gettuple in tuplestore.h is either missing or 
deprecated? can someone shed some light on this?
(it appears in the tuplestore.c file and there is no deprecation warning 
there)
10x.


-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] CTIDs invalidations and dropping columns.

2006-07-11 Thread Tzahi Fadida
On Tuesday 11 July 2006 17:27, Martijn van Oosterhout wrote:
> On Tue, Jul 11, 2006 at 01:50:40AM +0300, Tzahi Fadida wrote:
> > As i understand rowids, i.e ctids, are supposed to allow for fast access
> > to the tables. I don't see the rational, for example, when casting some
> > attributes, to blank the ctid. So it is not exactly the same, but it
> > still came from the same tuple. What will happen if for read only SPI
> > queries it will not be blank?
>
> Did you read the email Tom sent?

yes, if it is potentially broken, i think i should better use the attribute 
CTID even if there would be a performance drop.

>
> I worked out the exact issue with your example btw. It's because of the
> DROP COLUMN. After dropping the column the tuples on disk have 3
> columns and you only asked for 2, so an extra step has to be taken.
> This extra step copies the two values, creating a new tuple, which has
> no CTID.

10x. i c what you mean.

>
> If you're tying yourself this tightly to the backend, maybe you should
> just use index_beginscan/heap_beginscan/etc which return actual tuples.

I am considering it, however, it will also be accompanied with areas that 
SPI/engine previously handled that i will have to manage myself. 
I'll have to experiment a bit and see what is more important the Reuse of spi/ 
or the performance gains of heap_beginscan...

>
> Have a nice day,

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] CTIDs invalidations and dropping columns.

2006-07-10 Thread Tzahi Fadida
On Tuesday 11 July 2006 00:35, Martijn van Oosterhout wrote:
> On Mon, Jul 10, 2006 at 11:47:23PM +0300, Tzahi Fadida wrote:
> > Hi,
> > First, i use CTIDs to immensely speed up my function which is inherently
> > slow because of the problem itself.
> >
> > I have a question about CTID invalidation when you open a read only
> > cursor using SPI. Why does it at all happens? Why is it so important to
> > invalidate a ctid of a read only query (for example when using indices,
> > casting,etc...)?
>
> You're talking about "invalidation" as if it's something someone
> deliberately does. That's incorrect. The t_ctid field is filled in if
> and only if the tuple is exactly the on disk tuple. Otherwise it's a
> new tuple, which by definition does not have a ctid (it doesn't exist
> on disk).

As i understand rowids, i.e ctids, are supposed to allow for fast access to 
the tables. I don't see the rational, for example, when casting some 
attributes, to blank the ctid. So it is not exactly the same, but it still 
came from the same tuple. What will happen if for read only SPI queries
it will not be blank?

>
> > Specifically, i encountered something unexpected. i created a table:
> > (a2 int4, a0 int4) then i did alter table add column a5 int4, then update
> > set a5=a0, update set a0=a0+1, alter table drop column a0.
> >
> > Now that i run a simple select * from SPI cursor query on this table and
> > look at the  t_data->t_ctid i see that the ctids are invalidated for some
> > unknown reason?
> > previously before the alter table it was ok.
>
> This doesn't make any sense. What is invalidated? Is it blank or what?
> I think you're going to have to provide some example code.

blank.

i am attaching a code. it is not supposed to run to completion but to print to 
screen using elog. 
i used this sql to declare the function but you'll need to alter it where it 
says fdfuncs:
CREATE OR REPLACE FUNCTION ctest(text) RETURNS SETOF RECORD
AS 'fdfuncs','ctest'
LANGUAGE C STABLE STRICT;

anyway, run these commands:
create table ctest (a2 int4, a0 int4);
insert into ctest values (10,11);
insert into ctest values (12,13);
then run the function.
it should print 1 in posid and 0 in hi and lo.

Now run the following commands:
alter table ctest add column a5 int4;
alter table ctest DROP COLUMN a0;

and run the function.
it should print 0 on all three fields which means the ctid is blank.

>
> What do you mean by "invalidation" anyway?
>
> > I highly prefer not to use CTID as an attribute since it is going to
> > greatly lower the performance since it is sitting on a bottleneck.
>
> You've measured this performance difference?

Yes, i played with this in the past. since i can pass over a relation 
potentially hundreds or thousands of times, this can be a bottleneck.
Full disjunctions is a difficult problem that in order to speed up uses the 
tuple set concept where a set of tuples are represented as a set of CTIDs.

>
> Have a nice day,

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html
#include "executor/spi.h"
#include "funcapi.h"
extern char *stpcpy (char *__dest, const char *__src);
PG_FUNCTION_INFO_V1(ctest);

Datum
ctest(PG_FUNCTION_ARGS)
{   
SPI_connect();
char query[255];
char *to = query;
to =(char *) stpcpy(to,"SELECT * FROM ctest");
void * plan;
Portal portal;
if ((plan = SPI_prepare(query, 0, NULL)) == NULL)
 elog(ERROR, "initialize_SPI_structures: SPI_prepare('%s') returns NULL", query);
if ((portal = SPI_cursor_open(NULL, plan, NULL, NULL, true)) == NULL)
  elog(ERROR, "initialize_SPI_structures: SPI_cursor_open('%s') returns NULL", 
  query);
SPI_cursor_fetch(portal, true, 1);
elog(INFO,"ctidhi:%d",SPI_tuptable->vals[0]->t_data->t_ctid.ip_blkid.bi_hi);
elog(INFO,"ctidlo:%d",SPI_tuptable->vals[0]->t_data->t_ctid.ip_blkid.bi_lo);
elog(INFO,"ctidip_posid:%d",SPI_tuptable->vals[0]->t_data->t_ctid.ip_posid);
elog(INFO,"FINISHED PRINTING.");
SPI_finish();
SRF_RETURN_DONE(NULL);
}

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] CTIDs invalidations and dropping columns.

2006-07-10 Thread Tzahi Fadida
Hi,
First, i use CTIDs to immensely speed up my function which is inherently slow 
because of the problem itself. 

I have a question about CTID invalidation when you open a read only cursor 
using SPI. Why does it at all happens? Why is it so important to invalidate a 
ctid of a read only query (for example when using indices, casting,etc...)?

Specifically, i encountered something unexpected. i created a table:
(a2 int4, a0 int4) then i did alter table add column a5 int4, then update set 
a5=a0, update set a0=a0+1, alter table drop column a0.

Now that i run a simple select * from SPI cursor query on this table and 
look at the  t_data->t_ctid i see that the ctids are invalidated for some 
unknown reason?
previously before the alter table it was ok.

I am using 8.1.4, can you tell me if it is a bug/feature/don't care about 
ctids in spi... reason.

Is it such a difficult thing to return ctids if the query is read only. where 
is it invalidated anyway?
I highly prefer not to use CTID as an attribute since it is going to greatly 
lower the performance since it is sitting on a bottleneck. 

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] getting type name

2006-07-09 Thread Tzahi Fadida
On Monday 10 July 2006 00:29, Martijn van Oosterhout wrote:
> On Sun, Jul 09, 2006 at 10:08:42PM +0300, Tzahi Fadida wrote:
> > They return format_type_be(INT4OID) = "integer" or
> > format_type_be(FLOAT8OID) = "double precision"
> > I need to use this in a query with the "::" cast operator.
>
> The problem being?

Just if it is a one-to-one conversion, otherwise the same type
equality functions i use would potentially not work properly.

>
> test=# select '1'::integer, '4.5'::double precision;
>  int4 | float8
> --+----
> 1 |4.5
> (1 row)
>
> Have a nice day,

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] getting type name

2006-07-09 Thread Tzahi Fadida
On Sunday 09 July 2006 21:49, Michael Fuhr wrote:
> On Sun, Jul 09, 2006 at 09:03:21PM +0300, Tzahi Fadida wrote:
> > How do i get a char * type from a type Oid. i.e.
> > getStringTypeFromOid(INT4OID) will return "int4".
>
> Server-side or client-side?  In the backend I think you could use

Backend.

> format_type_be() or format_type_with_typemod(), both declared in
> utils/builtins.h and defined in src/backend/utils/adt/format_type.c.

They return format_type_be(INT4OID) = "integer" or format_type_be(FLOAT8OID) 
= "double precision"
I need to use this in a query with the "::" cast operator.
There is a function SPI_gettype but it works on relations. I wish to work 
directly with the oid types without opening a relation.

10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] getting type name

2006-07-09 Thread Tzahi Fadida
Hi,
How do i get a char * type from a type Oid. i.e. getStringTypeFromOid(INT4OID) 
will return "int4".
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Help with casting and comparing.

2006-07-06 Thread Tzahi Fadida
On Thursday 06 July 2006 21:55, Martijn van Oosterhout wrote:
> On Thu, Jul 06, 2006 at 07:43:20PM +0300, Tzahi Fadida wrote:
> > The downside is that i noticed that the CTID is removed from the tuple
> > if a cast occurs. Is there a way to tell postgresql to not remove the
> > CTID?
>
> Err, the fact the ctid is removed is really just a side-effect. With no
> adjusting of the output, you may just get the actual on-disk tuple. But
> as soon as you do some manipulation, you get a new tuple.
>
> > The other way, of course is to add CTID as an attribute in the query
> > but it seems less efficient since i am accessing it repeatedly.
>
> If you want the ctid, you have to ask for it.
>
> But this seems a little like premature optimisation. First, make it
> work, then make it fast. Once you've got it working you can worry about
> performance. Adding an extra column to the output costs very, very
> little compared to other things...

It works, i use this technique for index accesses. I am not worried about
getting this to work since i already tried this. I am more worried about 
optimization. Well, it is probably the lesser evil of dealing with casting.
P.s. the code is running and can be found here:
http://pgfoundry.org/projects/fulldisjunction/


-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Help with casting and comparing.

2006-07-06 Thread Tzahi Fadida
I looked around in the code and the whole thing looks complex
and prone to breaking my code often, i.e., whenever someone will decide to 
change the casting/operators. I thought about just
issuing in SPI_prepare query the proper casting like:
SELECT a0::text,a1::text ...
Casting to equal types (when neccessary) will allow me to just 
use regular equality functions.
And perhaps the added benefit is that the casted values are cached? since
i restart cursor scans often(by moving to start not reopening).
The downside is that i noticed that the CTID is removed from the tuple
if a cast occurs. Is there a way to tell postgresql to not remove the
CTID?
The other way, of course is to add CTID as an attribute in the query 
but it seems less efficient since i am accessing it repeatedly.


On Wednesday 28 June 2006 18:12, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > On Wed, Jun 28, 2006 at 03:25:57PM +0300, Tzahi Fadida wrote:
> >> I need help finding out how to determine if two types are equality
> >> compatible and compare them.
> >
> > Fortunatly the backend contains functions that do all this already.
> > Check out parser/parse_oper.c, in particular oper() and
> > compatable_oper().
>
> Note that this still leaves the question of what operator to search for,
> and where to look for it.  The current system doesn't really provide an
> adequate way of identifying a suitable equality operator; you kind of
> have to take it on faith that people won't have made "=" do unexpected
> things (an assumption already violated by some builtin datatypes ...).
> We've been moving gradually in the direction of relying on btree
> operator classes to give us a better understanding of which operators
> really act like equality, but it's far from all done.
>
> The most recent thread about fixing this was
> http://archives.postgresql.org/pgsql-hackers/2006-02/msg00960.php
> Nothing much has been done since then as far as fixing foreign-key
> checks, but you might want to look at the code for interpreting row
> value comparisons (make_row_comparison_op in parse_expr.c).
> SelectSortFunction in tuplesort.c is another example of looking for
> btree info to infer the behavior of an operator.
>
>   regards, tom lane

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Help with casting and comparing.

2006-06-28 Thread Tzahi Fadida
Hi,

I need help finding out how to determine if two types are equality compatible
and compare them.

I am using the following call to check for equality between two values:
DatumGetBool(\
FunctionCall2(&(fctx->tupleSetAttEQFunctions[attID]->eq_opr_finfo)\
  , lvalue, rvalue))

The structure fctx->tupleSetAttEQFunctions[attID]->eq_opr_finfo currently 
holds a reference to an equality function that is proper for the type of 
lvalue and rvalue.

Currently i only allow two values only of the same type but i wish to allow
to compare values like "20.2"=?20.2 or 20=?20.0 etc...

The first step is to find out if two attributes are equality and casting 
compatible, i.e., if one type can be cast to the other type so they can be 
compared. Or, just equality compatible and the casting is done somehow behind
the scene.
Finally, i have to use a function to compare the two values.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Planning without reason.

2006-06-23 Thread Tzahi Fadida
Perhaps it is over the top just for my specific query.
Basically, i wish not to do something the system should do
because, as i already noticed, when versions changes the
database can break your code if you don't keep up.

I guess i can make a map of attributes participating in an index
of a relation.
Also, i would have to take into account the type of index used.
For example, a btree should have the capability to do prefix key
searches while hash indices probably can't.
Then check each target tuple if it can use an index.
All this before constructing the query for the planner.

However, i already played with this in the past. I reached
the conclusion that it is better to let the planner decide what is
best since it is too complex for me to say things about cost
estimates or index changing capabilities.

On Friday 23 June 2006 19:28, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
> >> (SELECT * FROM R
> >> WHERE a=3, b=6,. ...)
> >> UNION
> >> (SELECT * FROM R
> >> WHERE b=5, d=2,. ...)
> >> UNION
> >> 
> >> And lots of unions.
> >
> > Do you need UNION, or do you actually mean UNION ALL?
> >
> > Also, couldn't you just do:
> >
> > SELECT * FROM R
> > WHERE (a=3, b=6, ...)
> > OR (b=5, d=2, ...)
> > etc
>
> That seems to be what Tzahi wants the system to do for him.  But the OR
> format is not in general logically equivalent to either UNION or UNION
> ALL, because UNION would cause duplicate output rows to be suppressed
> whereas UNION ALL could allow the same table row to be emitted multiple
> times (if two different WHERE conditions could select the same row).
>
> It's conceivable that the planner could prove that neither effect is
> possible in a particular query and then make the transformation
> automatically, but I'm not about to expend that kind of planning effort
> on such an odd case --- checking for it would waste entirely too many
> cycles in most cases.
>
>   regards, tom lane

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Planning without reason.

2006-06-23 Thread Tzahi Fadida
On Friday 23 June 2006 17:47, Martijn van Oosterhout wrote:
> On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote:
> > My initial reasoning was to avoid extra sorts but i guess that the
> > planner just doesn't get the LIMIT 1. I see now that UNION should be
> > better for the planner to undestand (not performance wise).
> > However, UNION alone, doesn't seem to cut it.
> > Following is an example. t7 has 2 attributes and a non-unique index on
> > one attribute. here is a printout:
> > explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select *
> > from t7 where a2=139 LIMIT 1);
>
> What are the indexes on?  If you only have an index on a4, the latter
> query has to be an index scan and there's no way to optimise it way.

That's my point, it should have only used a sequence scan and not also
do an index scan.
In other words, it should have consolidated the two nodes of index scan and 
sequence scan into a single plan node where you only scan sequentially the
relation and choose a tuple for each UNION clause.

>
> > > Also, couldn't you just do:
> > >
> > > SELECT * FROM R
> > > WHERE (a=3, b=6, ...)
> > > OR (b=5, d=2, ...)
> > > etc
> >
> > No, a filtering action is not enough since my goal is to only use indices
> > when retrieving single tuples each time thus, if i will use OR i cannot
> > control the number of tuples returned by each Or clause.
>
> I must admit, this is a really strange way of doing it. For example, if
> multiple rows match, the tuples eventually returned will be a random
> selection of the rows that matched. Especially with the "limit 1"
> there's no way the optimiser could combine the individual scans.

It is a query i use for full disjunction which is a part of the algorithm.
I am doing it manually, so i don't see why it can't do it itself. 
I.e.: Scan sequentially R. for each UNION clause find a matching tuple. 
the end.

>
> If you really need the "LIMIT 1" and you don't have full index coverage
> then you're quite limited as to how it can be optimised.

You misunderstood me, i wish the planner to only use sequence scan in the
event where even one node is a sequential scan.

>
> > > > I am currently just writing the query as a string and open a cursor.
> > > > Is there a simple way to use Datums instead of converting the
> > > > attributes to strings to create a plan for SPI.
> > > > 10x.
> > >
> > > I imagine SPI_prepare() and SPI_execp() would be used for this.
> >
> > I am already using SPI_prepare but it uses a query of the form of a char
> > string, which i need to prepare and is quite long. I.e. if i have 100
> > tuples i wish to retrieve it can be very wasteful to prepare the string
> > in memory and use SPI_prepare to prepare and later execute it.
> > better to use directly the datums (which i already have deformed from
> > previous operations).
>
> I'm confused here too. I thought the datums you're talking about were
> arguments, thus you could push them straight to SPI_execp(). But you
> seem to be suggesting parts of the actual query are in datum form also?

Example. i have a tuple T i am searching for. 
T contains attribute1, attribute2. I have T in a
heap_deformtuple(T) manner, i.e., i have T->v and T->n (for nulls).
Currently i am doing (loosely):
"(SELECT * FROM R where attribute1=" + convertDatumToCharString(T->v[0])+
" AND attribute2=" + convertDatumToCharString(T->v[1]) +" LIMIT 1)"
+ "UNION"
... as above.

I can use prepare without conversions but i still have to construct the long 
query each time. I can't do prepare just once because the where clauses
structures are always changing. Thus, i was wondering if i can
also construct the part in the plan where i request to SELECT * FROM R...
I.e. not to use strings at all. The structure of the query is the same all the 
time. I.e. there is the SELECT * FROM R and the WHERE clause with LIMIT 1 
nodes with UNION  ALL between SELECTS.

>
> Have a nice day,

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Planning without reason.

2006-06-23 Thread Tzahi Fadida
On Friday 23 June 2006 16:14, Martijn van Oosterhout wrote:
> On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
> > R contains indices but not on all attributes or not on
> > all ordered subset of keys.
> >
> > Query example:
> > (SELECT * FROM R
> > WHERE a=3, b=6,. ...)
> > UNION
> > (SELECT * FROM R
> > WHERE b=5, d=2,. ...)
> > UNION
> > 
> > And lots of unions.
>
> Do you need UNION, or do you actually mean UNION ALL?

I am using UNION ALL, but it doesn't matter
i am actually doing:
(SELECT * FROM R
WHERE a=3, b=6,. ... LIMIT 1)
UNION ALL
(SELECT * FROM R
WHERE b=5, d=2,. ... LIMIT 1)
UNION ALL


with LIMIT 1.
My initial reasoning was to avoid extra sorts but i guess that the planner 
just doesn't get the LIMIT 1. I see now that UNION should be better 
for the planner to undestand (not performance wise). 
However, UNION alone, doesn't seem to cut it. 
Following is an example. t7 has 2 attributes and a non-unique index on one
attribute. here is a printout:
explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * from 
t7 where a2=139 LIMIT 1);
 QUERY PLAN 

 Unique  (cost=23.18..23.19 rows=2 width=8) (actual time=0.149..0.165 rows=1 
loops=1)
   ->  Sort  (cost=23.18..23.18 rows=2 width=8) (actual time=0.142..0.148 
rows=2 loops=1)
 Sort Key: a4, a2
 ->  Append  (cost=0.00..23.17 rows=2 width=8) (actual 
time=0.052..0.106 rows=2 loops=1)
   ->  Limit  (cost=0.00..5.65 rows=1 width=8) (actual 
time=0.046..0.049 rows=1 loops=1)
 ->  Index Scan using indext7 on t7  (cost=0.00..5.65 
rows=1 width=8) (actual time=0.038..0.038 rows=1 loops=1)
   Index Cond: (a4 = 113)
   ->  Limit  (cost=0.00..17.50 rows=1 width=8) (actual 
time=0.035..0.038 rows=1 loops=1)
 ->  Seq Scan on t7  (cost=0.00..17.50 rows=1 width=8) 
(actual time=0.029..0.029 rows=1 loops=1)
   Filter: (a2 = 139)
 Total runtime: 0.334 ms
(11 rows)


>
> Also, couldn't you just do:
>
> SELECT * FROM R
> WHERE (a=3, b=6, ...)
> OR (b=5, d=2, ...)
> etc

No, a filtering action is not enough since my goal is to only use indices
when retrieving single tuples each time thus, if i will use OR i cannot 
control the number of tuples returned by each Or clause.

>
> > I am currently just writing the query as a string and open a cursor.
> > Is there a simple way to use Datums instead of converting the attributes
> > to strings to create a plan for SPI.
> > 10x.
>
> I imagine SPI_prepare() and SPI_execp() would be used for this.

I am already using SPI_prepare but it uses a query of the form of a char 
string, which i need to prepare and is quite long. I.e. if i have 100 tuples
i wish to retrieve it can be very wasteful to prepare the string in memory
and use SPI_prepare to prepare and later execute it.
better to use directly the datums (which i already have deformed from 
previous operations).

>
> Have a nice day,

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Planning without reason.

2006-06-23 Thread Tzahi Fadida
Hi,
I think there is a bug/misscalculation of some rare query i am using.

Suppose we only query one specific relation R.

R contains indices but not on all attributes or not on 
all ordered subset of keys.

Query example:
(SELECT * FROM R
WHERE a=3, b=6,. ...)
UNION
(SELECT * FROM R
WHERE b=5, d=2,. ...)
UNION

And lots of unions.

When doing explain analyze i see that some nodes in the plan uses an index
and some uses a sequential scan (where the WHERE clause made it impossible
to use an index).
As you can see, having even one sequential scan should nullify the whole plan
to using only one node of sequential scan.
Currently, the planner does not seem to understand that.
I circumvent the problem by doing a recursive tree search of the plan
and checking if there is a node of sequential scan (and redefine the query
as a sequential scan) but obviously it is prefferable that the planner will do 
this.


P.s.:
I am currently just writing the query as a string and open a cursor.
Is there a simple way to use Datums instead of converting the attributes to 
strings to create a plan for SPI.
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] CTID issues and a soc student in need of help

2006-06-01 Thread Tzahi Fadida
On Thu, 2006-06-01 at 12:45 -0400, Tom Lane wrote:
> Tzahi Fadida <[EMAIL PROTECTED]> writes:
> > I am not sure about the definition of a context of a single SQL command.
> 
> Well, AFAICS selecting a disjunction ought to qualify as a single SQL
> command using a single snapshot.  It's not that different from a JOIN
> or UNION operation, no?

Yes, it is (at least the current version i am implementing) a one shot
computation. It is computed top-down and not bottom-up as regular
joins. For example, A natural join B natural join C can be broken down
to a left deep plan tree. Full disjunctions cannot be broken into such a
thing (in this version) and FD('A,B,C') directly returns a set of
results.

> 
> > Inside C-language FullDisjunctions() function i repeatedly call, using
> > SPI:
> > SELECT * FROM Relation1;
> > SELECT * FROM Relation2;
> > SELECT * FROM Relation1 WHERE...;
> > SELECT * FROM Relation3;
> > 
> 
> You would need to force all these operations to be done with the same
> snapshot; should be possible with SPI_execute_snapshot.  But really the
> above sounds like a toy prototype implementation to me.  Why aren't you
> building this as executor plan-tree machinery?

I actually use cursors because i reiterate on the
"SELECT * FROM Relation1" queries using the FETCH_ALL technique.
Hopefully cursors uses something similar to SPI_execute_snapshot?
(maybe on READ_ONLY that i use. i see it uses something called
ActiveSnapshot)
(but for WHERE queries that are intended to exploit indices in
the relations i must execute repeatedly).

The reason, is two fold.
- At this time i don't see any big advantage (aside from the schema) 
in putting it in the parser and subsequently the executor.
- I want to work inside the frame of time for the soc.

I think that i should first have a stable contrib module that looks
acceptable before i continue to something more problematic to maintain. 

We have a new paper that was accepted to VLDB yesterday that breaks down
the problem into smaller ones + iterators + have polynomial delay that
is suited for streaming, hence the possibility for implementing in
the planner but it's too complex for soc. Lets have a stable something
first.

> 
> > p.s.: In a different version of the function i create a temporary
> > relation and insert tuples in it, but it is exclusively used and
> > destroyed by the specific instance of that function.
> 
> Why?  You could use a tuplestore for transient data.

I do use tuplestore, but the other version needs an index and you can't
put an index on a tuplestore. Unless, you can give me a hint on how to
create a btree/hash index without a relation but that can be stored on
disk like tuplestore. I.e. all data is stored in the index. The key is
the whole tuple (the array of CTIDs) anyway.

> 
>   regards, tom lane


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] CTID issues and a soc student in need of help

2006-06-01 Thread Tzahi Fadida
I am not sure about the definition of a context of a single SQL command.

Example of a run:

A <- SELECT getfdr('Relation1,Relation2,Relation3');
to get the result schema (takes a few milliseconds).
SELECT * FROM FullDisjunctions('Relation1,Relation2,Relation3') AS
RECORD A;
Can take a long time.

Inside C-language FullDisjunctions() function i repeatedly call, using
SPI:
SELECT * FROM Relation1;
SELECT * FROM Relation2;
SELECT * FROM Relation1 WHERE...;
SELECT * FROM Relation3;


So i call using one SQL to FullDisjunction and subsequent SQL calls in
it. I wish that the context of the overall SELECT FullDisjunctions
will be perfectly unchanged for all subsequent SQL calls inside it
and especially the CTID.

p.s.: In a different version of the function i create a temporary
relation and insert tuples in it, but it is exclusively used and
destroyed by the specific instance of that function. I hope it does not
affect anything in the general sense of a read only transaction.

10x!
Regards,
tzahi.

On Thu, 2006-06-01 at 11:28 -0400, Tom Lane wrote:
> 
> I think this is OK within the context of a single SQL command, since
> tuple visibility should not change for that command.  If you were trying
> to use the CTID info across multiple statements then it'd get worrisome.
> 
>   regards, tom lane


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] CTID issues and a soc student in need of help

2006-06-01 Thread Tzahi Fadida
I am using CTID for the concept of a tuple set.
For example, the set of t1 from relation1, t1 from relation2, t10 from
relation3 will be represented in my function as a list
of (TableID:CTID) pairs.
For example {(1:1),(2:1),(3:10))
I then save these in bytea arrays in a tuplestore.
This is essential for the full disjunction algorithm because
you do not want to recompute joins for every such set using the actual
attribute.

>From the documentation i see that
Dirty Read,  Nonrepeatable Read and Phantom Read
are all unacceptable.
Moreover, when i said long time i meant full disjunctions
can run for hours on an extremely difficult input 
(not that most people will want to do that, but for
the general case) so it is not
realistic to lock the tables for that period.

So i have 2 follow up questions:
1) If i execute the function in a serializable isolation level
and the function is read only to the tables, is it possible
for the function to fail or other updating transactions to
either fail or wait for hours/starvation?

2) Inside the function i open and reopen cursors and portals to
   relations, how can i set once, for all those opening within
   the function, to have a "serializable" isolation level.
   I suspect that because of command boundaries, just running
   SET TRANSACTION SERIALIZABLE using SPI at the start should be enough.

On Thu, 2006-06-01 at 15:30 +0200, Martijn van Oosterhout wrote:
> 
> The CTID is the location on disk of the tuple, so no, it doesn't change
> while you are running.
> 
> However, if you're running in isolation mode "read committed", then
> someone else could update the tuple while you're looking at it. In this
> case the tuple will appear to vanish and the updated version will
> appear elsewhere with a new CTID. Whether this is a problem or not
> depends on what you're using it for.
> 
> Hope this helps,


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] CTID issues and a soc student in need of help

2006-06-01 Thread Tzahi Fadida
Hi,
I am a Google soc student and in need of some help 
with PostgreSQL internals:

My C function can run (and already running)
for a very very long time on some
inputs and reiterate on relations using SPI.
Basically, I open portals and cursors to relations. 
Also note that I always open the
relations in READ ONLY mode using SPI.
A big no no is that some data in the relations
would change since that would just ruin everything.

I have a great need to identify a tuple uniquely so
my prototype uses the CTID field for that purpose.
Btw, this identification is required by the algorithm
and primary keys are just wasteful and will slow the process
considerably (not to mention some relations don't have primary keys).

The question is, can the CTID field change throughout
the run of my function due to some other processes working
on the relation? Or because of command boundaries it is
pretty much secured inside an implicit transaction?
The problem wasn't so great if I didn't want to exploit
indices in the relations (but I do and does), since
after you issue a SELECT that uses indices, all you can rely on
is the CTID to uniquely identify a tuple.

The other solution is to temporarily duplicate the relations but
this is less appealing (plus it's already working great with CTIDs).

-- 
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] SPI bug.

2005-04-29 Thread Tzahi Fadida
Hi,
While trying to determine if SPI_cursor_move can rewind
(and receiving a great help from the guys at the irc), we
found out that since the count parameter is int
and FETCH_ALL is LONG_MAX then setting
the count parameter to FETCH_ALL to rewind
will not work on 64bit systems.

On my pIII 32 bit system it works since int size=long size.

I am using 8.0.2 (i.e. the repositioning bug is already fixed here).

I think the solution can be either changing the FETCH_ALL to
INT_MAX or changing the interface parameter count and subsequent usages
to long.
(FETCH_ALL at parsenodes.h)

Regards,
tzahi.

WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] Multicolumn hash tables space complexity.

2005-04-14 Thread Tzahi Fadida
Hi,
I hope its not off topic.
I have an algorithm to implement where it needs
to hold the tuples (complete tuples - all columns)
either in a b+tree or a hashtable.
I am concerned about the space such an index will require.
What is the difference in percentages from the size of all
the data not indexed vs. holding it in a b+tree or a hashtable.

also what is the difference when those indices are half-full?

e.g. I understand a b+tree half full in the worse case can take
space as if it was full. I am guessing it's the same with hashtables.

I understand that the HASH indice in postgresql does not support
a multicolumn. What does it take to upgrade it to do that?

Regards,
tzahi.

WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html 



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)

2005-02-13 Thread Tzahi Fadida
Just my 2 cents.
I am not a super statistics guy but besides increasing the sample size
and
assumming things on the distribution, I understand you want to get more
info on what distribution the data represents.
usualy the problem with these things is that the data needs to be sorted
on the
index key and also it could take a while, at least for the one time you
want
to find out what is the distribution. Example:
for comulative distributions you need to first sort the data (I am
talking scalars but
probably other keys can work) and run it sequentially thru a
KS(Kolmogorov smirnov) test.
(there are other tests but this is good for general cases)
The test can be against all kind of comulative distributions like
normals,etc...
You then get a feel of how close is the data to the selected
distribution with
a parameter that can be rejected at 0.01, 0.05, 0.1, etc...

Anyway, it can be done, however I am not sure how much better is it over
just plain histograms with random() and uniform dist. Or what happens if
you
just increase the sample size and be done with it.
Again, I am talking about the general/common cases.

Regards,
tzahi.

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> [EMAIL PROTECTED]
> Sent: Wednesday, February 09, 2005 3:46 PM
> To: Ron Mayer
> Cc: Mark Kirkwood; Tom Lane; Ron Mayer; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)
> 
> 
> I wrote a message caled "One Big trend vs multiple smaller 
> trends in table statistics" that, I think, explains what 
> we've been seeing.
> 
> 
> > [EMAIL PROTECTED] wrote:
> >>
> >> In this case, the behavior observed could be changed by 
> altering the 
> >> sample size for a table. I submit that an arbitrary fixed 
> sample size 
> >> is not a good base for the analyzer, but that the sample 
> size should 
> >> be based on the size of the table or some calculation of its 
> >> deviation.
> >>
> >
> >Mark,
> >
> > Do you have any evidence that the Sample Size had anything 
> to do with 
> > the performance problem you're seeing?
> 
> Sample size is only a bandaid for the issue, however, more 
> samples always provide more information.
> 
> 
> >
> > I also do a lot with the complete Census/TIGER database.
> >
> > Every problem I have with the optimizer comes down to the fact that 
> > the data is loaded (and ordered on disk) by State/County 
> FIPS codes, 
> > and then queried by zip-code or by city name.  Like this:
> >
> >  Alabama36101 [hundreds of pages with zip's in 36***]
> >  Alaska 99686 [hundreds of pages with zip's in 9]
> >  Arizona85701 [hundreds of pages with zip's in 855**]
> >
> > Note that the zip codes are *NOT* sequential.
> 
> Again, read "One Big Trend..." and let me know what you 
> think. I think it describes exactly the problem that we see.
> 
> For now, the solution that works for me is to seriously up 
> the value of "targrows" in analyze.c. It makes it take 
> longer, and while the stats are not "correct" because they 
> are not designed to detect these sorts of patterns, a larger 
> sample allows them to be "less wrong" enough to give a better 
> hint to the planner.
> 
> 
> 
> ---(end of 
> broadcast)---
> TIP 7: don't forget to increase your free space map settings
> 
> 



---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Implementing and Experimenting with a Full Disjunctions Operator.

2005-01-03 Thread Tzahi Fadida
Hi All,
As part of my thesis I need to implement a new algorithm for
Full-Disjunctions and 2 older ones. 
A short explanation of what Full-Disjunction is, is that it comes to
solve what A natural outer join 
usually can't do when you have more than 2 relations. 
The main goal is to retrieve maximal answers from a set of relations. 
The natural representation of the operator that is usually depicted in
the literature is FD(r1,...,rN). 
(r=relation/table) 
Ullman's algorithm uses several natural outer joins so there is no
problem there but our 
algorithm must be run internally at the server since it uses no existing
operators but it is 
also not limited to the gamma-acyclic restriction of Ullman's algorithm.

I have already read most of the development documentations, faqs,
presentations, listened on 
this mailing list, etc... I already compiled a dynamically loaded
library with a function and ran it successfully. 
The research part of implementing the algorithm is theoretical and
experimentation. 
After looking around in the code and seeing how the SPI generally works
I have several concerns 
(the first one is the most crucial to me):
1) As part of the experimentation I need to know exactly how many blocks
have been read when 
the algorithm ran. I need complete control over the process to run my
simulations. 
I see that there are functions like heap heap_getnext() heap_fetch()
SearchSysCache(). 
Our algorithm usually read sequentially the relations and I don't see
how to read complete blocks and 
count these blocks. In addition temporary queues that must be held in
memory will be needed to be dumped to 
disk at various times (because of their size) and fetched. Is there a
way to control this process 
with accuracy and calculate the exact disk writes?
2) As part of the theoretical work and experimentation we want to load
blocks of relation rows to the 
main memory and cache them using our techniques. Is there a way to
control the memory blocks so 
they won't be swapped. In addition, is there a way to get a specific
size of memory so we can 
plan our operator running path. I see that palloc return's to me a
chunck of memory but I don't know 
in advance how much is available to me (aside from polling for it).
3) When outputting the results as a set of records, I cannot know in
advance the type of temporary 
table that will come out just like a subquery like (select * from
relationA,relationB); Is there a problem 
outputting this kind of table?
4) When inputting the various tables to the operator I understand that
the function is limited to a fixed 
number of arguments. Is there a way to circumvent that or would I need
to use an ugly ARRAY construct.

Obviously there are better ways than a dynamically loaded library
function, so after we study the algorithm
I don't think there should be any problem integrating it to postgreSQL,
of course if it will be good enough :)

Thank you.

Regards,
tzahi.

* - * - *
Itzhak Fadida
MSc Student
Information System Engineering Area
Faculty of Industrial Engineering & Management
Technion - Israel Institute of Technology
Technion City, Haifa, Israel 32000
Technion Email: [EMAIL PROTECTED]
Alternative Email: [EMAIL PROTECTED]
* - * - * - * - * - * - * - * - * - * - * - * - * - * - *

WARNING TO SPAMMERS:  see at
http://members.lycos.co.uk/my2nis/spamwarning.html



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match