Fwd: [GENERAL] Getting several columns from subselect with LIMIT 1

2008-09-20 Thread Pierre Thibaudeau
Thanks for the suggestion.  Unfortunately I get an INVALID COLUMN
REFERENCE (SQL state: 42P10)
to the effect that the subselect in the FROM clause cannot reference
other tables at the same request level.


2008/9/20 Marcus Engene <[EMAIL PROTECTED]>:
>>
>> SELECT
>>persons.*,
>>(
>>SELECT child.name
>>FROM progeny JOIN persons child ON child.id = progeny.child
>>WHERE progeny.parent = persons.id
>>ORDER BY child.birthdate ASC
>>LIMIT 1
>>) AS firstborn_name
>>  FROM persons;
>>
>> Now, this is probably not the most elegant piece of code, but the real
>> problem is that
>> I cannot see how to extend it to the case where I want not only the
>> firstborn's name but also the firstborn's ID
>> (short of repeating the entire subselect a second time).  At the
>> moment, with this current syntax, my subSELECT statement would not be
>> allowed to return more than a single column.
>
> Would this work?
>
> select
>   p.*
>  ,pp.*
> from
>   persons p
>  ,(
>   SELECT child.name, child.id
>   FROM progeny JOIN persons child ON child.id = progeny.child
>   WHERE progeny.parent = p.id
>   ORDER BY child.birthdate ASC
>   LIMIT 1
>   ) as kid(kid_name,kid_id)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Getting several columns from subselect with LIMIT 1

2008-09-20 Thread Pierre Thibaudeau
I have a PERSONS table.
I also have a PROGENY table, which is a many-to-many association table
with two foreign keys to the PERSONS table to itself.
(In this day and age, not only can an individual have any number of
children, but also a person can have any number of parents!  At least,
let's assume that's true for the sake of this setup.)

Suppose I wish to construct a view of the persons, along with the name
of their first-born (if they have one;  NULL otherwise).  The
following SELECT does just that:

SELECT
persons.*,
(
SELECT child.name
FROM progeny JOIN persons child ON child.id = progeny.child
WHERE progeny.parent = persons.id
ORDER BY child.birthdate ASC
LIMIT 1
) AS firstborn_name
 FROM persons;

Now, this is probably not the most elegant piece of code, but the real
problem is that
I cannot see how to extend it to the case where I want not only the
firstborn's name but also the firstborn's ID
(short of repeating the entire subselect a second time).  At the
moment, with this current syntax, my subSELECT statement would not be
allowed to return more than a single column.

Any suggestion?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tsearch strategy for incremental search

2008-06-30 Thread Pierre Thibaudeau
2008/6/30 Oleg Bartunov <[EMAIL PROTECTED]>:
> you, probably, can use custom configuration, which uses pg_3chars
> dictionary.

Thanks Oleg.  That sounds like a cleverer (and more natural) idea than mine.

I am intrigued:  is "pg_3chars" something that's part of the current
distribution of postgresql (I did a quick search and found nothing),
or are you implying that it would be worth writing a tsearch
configuration along the lines of my earlier ideas?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tsearch strategy for incremental search

2008-06-29 Thread Pierre Thibaudeau
2008/6/30 Oleg Bartunov <[EMAIL PROTECTED]>:
> tsearch will have prefix search support in 8.4.

Thanks Oleg!  That's fantastic news!

In the meantime, carrying on with my earlier idea, here's the little
function I came up with for extracting the 3-char-lexeme tsvector, in
case anyone's interested:

CREATE OR REPLACE FUNCTION public.to_three_char_tsvector(str text)
  RETURNS tsvector AS
$BODY$declare
somerow record;
shortened_text text := '';
BEGIN
FOR somerow IN SELECT regexp_split_to_table(trim(both '\'' from
strip(to_tsvector('simple',str))::text),'\'') AS item LOOP
shortened_text := shortened_text || ' ' ||
COALESCE(substring(somerow.item for 3), '');
END LOOP;
RETURN strip(to_tsvector(shortened_text));
END;$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE;

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] tsearch strategy for incremental search

2008-06-29 Thread Pierre Thibaudeau
I am trying to implement an incremental search engine.  The service
should start searching when the user has typed at least 3 characters.
I am thinking of using the following strategy:

a) Create a function string_to_three_char_tsvector(str text) that
would generate the tsvector composed of the three-letter lexemes that
begin all the admissible words within the parameter str.
b) Using this function, create an indexed tsvector column:  three_char_index.
c) Given the query string query_string (assume query_string containing
at least 3 characters):
SELECT *
FROM mytable, 
plainto_tsquery((string_to_three_char_tsvector(query_string))::text)
AS query
WHERE three_char_index @@ query
AND text_field LIKE '%' || str || '%';

Once I've narrowed the field of possibilities down to the correct
3-letter lexemes, there are fewer than 100 lines to search through
with LIKE.  I could even repeat the exercise with 4-letter lexemes if
these numbers were to grow or if I needed the extra boost in
performance.

So, two questions to postgres/tsearch experts:

1) Does that seem like a decent overall strategy?

2) About the function string_to_three_char_tsvector(text), I cannot
think of an elegant way of writing this.  Is it possible to do better
than the following:

str => cast to tsvector => cast to text => for each lexeme-string,
take first-three-char substring => concat back together => cast to
tsvector

Is there a nice way of performing the middle operation?  Like
splitting the string to an array...

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 8.3: where's the replacement tsearch2 module?

2008-02-10 Thread Pierre Thibaudeau
> > My database uses tsearch2.  I was about to follow the conversions
> > instructions found at Appendix F31 (on the new tsearch module).
> > http://www.postgresql.org/docs/8.3/static/tsearch2.html
> The docs will need to be updated because tsearch2 is now in the core and
> should already be available, it's not an external module any more.

Appendix F31 (which I quoted previously) indeed acknowledges that fact
(that tsearch2 is in the core).  The instructions I focused on explain
how to "convert" an 8.2 database already using tsearch2, without going
through the trouble of renaming the tsearch2 functions.  Therefore, I
think the docs are fine.  The problem is the missing tsearch2.sql
script...

>You'd have to edit it to replace MODULE_PATHNAME with the right thing.
>A bigger problem is that the tsearch2 dll is probably omitted too :-(

Indeed, I cannot find tsearch2.dll anywhere...

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] 8.3: where's the replacement tsearch2 module?

2008-02-10 Thread Pierre Thibaudeau
I just downloaded the 8.3 Windows installation (binary with installer).

My database uses tsearch2.  I was about to follow the conversions
instructions found at Appendix F31 (on the new tsearch module).
http://www.postgresql.org/docs/8.3/static/tsearch2.html

However, I hit a problem when I get to step 2:  "In the new
installation, create empty database(s) and install the replacement
tsearch2 module into each database that will use text search."

I am assuming that the "replacement tsearch2 module" is some file
"tsearch2.sql" found in the folder share/contrib.  However, no such
file, or anything that looks remotely like it, in that folder or in
any folder around.

(If I download the sources, I can find a 16 Kb file entitled
"tsearch2.sql.in" inside contrib/tsearch2.  Is that the file I want?)

For reference, here is the full content of the share/contrib folder,
after the Windows installation:

_int.sql
adminpack.sql
autoinc.sql
btree_gist.sql
chkpass.sql
cube.sql
dblink.sql
earthdistance.sql
fuzzystrmatch.sql
hstore.sql
insert_username.sql
int_aggregate.sql
isn.sql
lo.sql
ltree.sql
moddatetime.sql
pageinspect.sql
pg_buffercache.sql
pg_freespacemap.sql
pg_trgm.sql
pgcrypto.sql
pgrowlocks.sql
pgstattuple.sql
pgxml.sql
pldbgapi.sql
refint.sql
seg.sql
sslinfo.sql
tablefunc.sql
timetravel.sql
uninstall__int.sql
uninstall_adminpack.sql
uninstall_btree_gist.sql
uninstall_chkpass.sql
uninstall_cube.sql
uninstall_dblink.sql
uninstall_earthdistance.sql
uninstall_fuzzystrmatch.sql
uninstall_hstore.sql
uninstall_int_aggregate.sql
uninstall_isn.sql
uninstall_lo.sql
uninstall_ltree.sql
uninstall_pageinspect.sql
uninstall_pg_buffercache.sql
uninstall_pg_freespacemap.sql
uninstall_pg_trgm.sql
uninstall_pgrowlocks.sql
uninstall_pgstattuple.sql
uninstall_pgxml.sql
uninstall_seg.sql
uninstall_sslinfo.sql
uninstall_tablefunc.sql

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] varchar(n) VS text

2007-06-25 Thread Pierre Thibaudeau

Having read 
http://www.postgresql.org/docs/8.2/interactive/datatype-character.html
I am puzzling over this issue:

1) Is there ever ANY reason to prefer "varchar(n)" to "text" as a column type?

2) For instance, if I know that a character-type column will never
contain more than 300 characters, would I be better to define it as
"varchar(300)" or as "text"?

3) What if, in practice that same column usually contains strings of
no more than 10 characters (although I can't guarantee that it
wouldn't explode up to 300 in the one-in-a-million case)?

4) On the other hand, what if a column systematically contains strings
of 5 or fewer characters.  Is it better to define it as "varchar(5)"
or as "text"?


From my reading of the dataype documentation, the ONLY reason I can

think of for using "varchar(n)" would be in order to add an extra
data-type constraint to the column.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] "simple" dict with stop words in tsearch2

2007-01-29 Thread Pierre Thibaudeau

Brilliant!  Thank you!


> In tsearch2, I would like to use the "simple" dictionary along with my
> own list of stopwords.
> [...]

sure, just specify dict_initoption. For example,
test=# update pg_ts_dict set  dict_initoption='contrib/english.stop' where 
dict_name='simple';
UPDATE 1
test=# select lexize('simple', 'the');
  lexize

  {}
(1 row)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] "simple" dict with stop words in tsearch2

2007-01-29 Thread Pierre Thibaudeau

In tsearch2, I would like to use the "simple" dictionary along with my
own list of stopwords.

In other words, once the text is parsed into tokens, no stemming
whatsoever, but stopwords are removed.

Is there an easy way to produce that result, using the standard
"simple" dictionary?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Fwd: [GENERAL] convert(USING utf8_to_iso_8859_15) on Windows

2007-01-29 Thread Pierre Thibaudeau

Thanks, Tom, for the comment.  (Sorry for emailing directly to you:
pressed "send" too quickly!)

Although that raises further questions:

* Is there a text that documents all that is known about the encoding
issues between PostgreSQL and Windows?  Surely, this is likely to be a
"fairly" widespread issue!  So far, everything I've read had to do
with mysterious bad omens with never specific statements about what's
what, and what can (or cannot) be done to solve the situation
satisfactorily...

* Windows XP does support UTF8, yet it is not possible (as far as I
know) to define one's locale to have anything to do with UTF8
(presumably in the sense that UTF8 isn't an aspect of a specific
locale):  there is no en_US.UTF8 or fr_CA.UTF8 locales, for instance.
But why should this matter?  Say I am entering the data through a
piece of software that works with UTF8, via the ODBC driver.  Say
again that I output the data with another software that expects UTF8,
via the JDBC driver.  Why does it matter that my system should be
localized in another encoding?


2007/1/29, Tom Lane <[EMAIL PROTECTED]>:

"Pierre Thibaudeau" <[EMAIL PROTECTED]> writes:
> My setup is as follows:
> PostgreSQL 8.2.1 on WindowsXP
> The database has UTF8 encoding.
> SHOW lc_ctype; gives: "French_Canada.1252"

I'm not sure about any Windows-specific issues, but in general it's a
really bad idea to be using lc_collate or lc_ctype that is incompatible
with the database encoding.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] convert(USING utf8_to_iso_8859_15) on Windows

2007-01-28 Thread Pierre Thibaudeau

Is this a documented phenomenon with the "convert" function?  The first
result is what's expected:

SELECT convert('Gregoire' USING utf8_to_iso_8859_15);
"Gregoire"

But I don't understand the next result, when I put an acute accent over the
first "e":

SELECT convert('Grégoire' USING utf8_to_iso_8859_15);
""

(The output is an empty string.)

Likewise, whenever I enter a string containing non-ASCII characters, the
convert function outputs an empty string.  Same results when I change the
conversion type from UTF8 to any other encoding which accepts those
non-ASCII characters...  (When I try a conversion to an encoding that
doesn't accept the characters, I get an error message, and that's normal.)

My setup is as follows:
PostgreSQL 8.2.1 on WindowsXP
The database has UTF8 encoding.
SHOW lc_ctype; gives: "French_Canada.1252"

Is my problem related to Windows' lack of UTF8 support?  I thought those
problems were solved with version 8.2...


Re: [GENERAL] tsearch2, locale, UTF-8 and Windows

2007-01-28 Thread Pierre Thibaudeau

Just to pinpoint the meaning of my dismay, let me add one comment to my
previous post.

In the What'sNew document for tsearch2 with 8.2
http://www.sai.msu.su/~megera/wiki/Tsearch2WhatsNew
we read:


Don't forget to initdb cluster with correct utf8-locale !
initdb -D /usr/local/pgsql-dev/data.el_utf8 --locale=el_GR.utf8


I have never seen any detail of what was meant by the "correct" locale.
Clearly if I was using a database with French content on a French system, I
would intuitively choose fr_FR.utf8 as the locale, but if my database
contains texts in several languages (see quoted post below), I don't know on
what factor to base my choice of initdb locale.

Suggestions?

On Jan 28, 3:28 am, [EMAIL PROTECTED] ("Pierre Thibaudeau") wrote:

[...] I cannot figure out what the latest status is concerning the
"default locale" on a Windows UTF-8 database under PostgreSQL 8.2.

[...] I have a UTF-8 database containing information in five
different European languages (English, French, Spanish, German and
Italian).  I am coding on a Windows system with locale French_Canada.1252.
The server that will soon run the database will likely have locale

en_US...


[GENERAL] tsearch2, locale, UTF-8 and Windows

2007-01-28 Thread Pierre Thibaudeau

Over the past few days, I have been reading everything I could about
tsearch2, but I cannot figure out what the latest status is concerning the
"default locale" on a Windows UTF-8 database under PostgreSQL 8.2.

More specifically, I have a UTF-8 database containing information in five
different European languages (English, French, Spanish, German and
Italian).  I am coding on a Windows system with locale French_Canada.1252.
The server that will soon run the database will likely have locale en_US...
I am at a loss concerning the locale with which I should initdb on my
system!!!

What am I getting myself into?  A lot of what I read on this matter was
pretty bleak;  has version 8.2 cleared all that?  What is the latest word on
this?