date:20001115

[SQL] Index problem

2000-11-15 Thread Patrik Kudo


Hi!

We're using postgres 6.5.3 for an application where we, among other
things, store information about users. On two separate occations on
two different servers we've experienced problems with indices on the
same table (out of over 100).

The problem is that postgres refuse to find a certain row in the table
when performing a simple select using the primary (unique) key like
this:

select * from elever where userid = 'walther';

When selecting all rows the row in question is shown. When doing the
following, I get the result I want:

select * from elever where userid like 'w%'; <-- OK
select * from elever where userid like 'wa%'; <-- OK
select * from elever where userid like 'wal%'; <-- OK
select * from elever where userid like 'walt%'; <-- OK

but not

select * from elever where userid like 'walth%'; <-- Not OK!

Droping and recreating the index solves the problem, but that's not
good enough for me since the problem has reoccured on a different
machine with a different database. vacuum and vacuum analyze does not
report any problem with the table.

Both times the problem occured with userid's starting with a
"w". Postgres is running with a Swedish locale, and on FreeBSD this
means that "w" and "v" (among a number of other letter) are treated
equally when collating/sorting. I suppose this could be part of the
problem.

Is this a known problem? If so, is it fixed in 7.0.2? I've not seen
this happen on any of our postgres 7.0.2 systems, but as I can't even
reproduce it on the 6.5.3, that's no guarantee it's fixed...

For the record... I know we shouldn't run 6.5.3 and we will upgrade to
7.x within soon...

psql -c 'select version()'
PostgreSQL 6.5.3 on i386-unknown-freebsd4.1, compiled by cc

uname -a
FreeBSD 4.1-RELEASE #0: Tue Aug 29 15:31:01 CEST 2000

Regards,
Patrik Kudo

ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Känns det oklart? Fråga på!

Re: [SQL] FTI, paged, ranked searching and efficiency.

2000-11-15 Thread Josh Berkus

Paul,

I'm afraid that much of that was over my head.  In fact, I'm keeping it
as an example in case I ever need to do something similar.  Forward your
info and I'll include credit in the source :-)

In general terms, I've always depended on the PHP to select a "page" of
results, using the logic that the number of results on a page is a
matter for the web application to handle (a display issue) rather than
something to be handled on the back-end (a data issue).

However, you point about not pulling out tons of data the user will
never examine (i.e. 2nd and succeeding pages) is well-taken.  Although,
without pulling the entire data set, you can't display to the user how
many results there are, total.  If it's a strong possibility that the
users are really only ever going to want the top 20-40 rated results,
then splitting it as you suggest ... first, counting all the matches and
then dragging in the rest of the data for the top X records ... makes a
lot of sense.

Unfortunately, your only real option I can see for DB server-side row
grabbing is: Create the query(ies) as a temporary table or view using a
function.  Then use Limit and Offset to grab one chunk of data at a
time.  This is, of course, a serious problem for mutli-user performance
since eash user would need their own temp table or view.

From what I can tell, search engines (google, for example) grab the
whole recordset and use the web script to parse it out 25 records at a
time.

Hopefully, someone on this list will have done that before and can
provide less theoretical advice.

-Josh Berkus

P.S. I've also posted this to the pgsql-php list.  I;ve quoted the full
text of your question below my .sig for that reason.

-- 
__AGLIO DATABASE SOLUTIONS___
Josh Berkus
   Complete information technology  [EMAIL PROTECTED]
and data management solutions   (415) 436-9166
   for law firms, small businesses   fax  436-0137
and non-profit organizations.   pager 338-4078
San Francisco

Paul wrote:
> 
> Hello,
> 
> This is going to be a bit long, I hope some of you will take the
> trouble to read it :)
> 
> I am building a search engine for a section of a (PHP based) website.
> I wish the user to be able to a number of words in the search, and the
> search results to be ranked by the number of times words occur (both
> different words and the same word occuring multiple times are good).
> 
> My (simplified) table structure is this:
> ==
>  Table "entry_fti"
>  Attribute |Type | Modifier
> ---+-+--
>  string| varchar(25) |
>  id| oid |
> Index: entry_fti_string_idx
> 
>Table "entry"
>Attribute   | Type  |Modifier
> ---+---+--
> -
>  entry_id  |integer| not null default
> nextval('entry_id_seq'::text)
>  entry_name|text   |
>  entry_description_html|text   |
>  entry_image_id|integer| not null default 0
>  entry_tn_image_id |integer| not null default 0
>  entry_live|boolean| not null default 't'
> Index: entry_pkey
> 
>Table "image"
>  Attribute  |Type |Modifier
> +-+
>  image_id   | integer | not null default nextval('image_id_seq'::text)
>  image_name | varchar(32) |
>  height | integer | not null
>  width  | integer | not null
> Indices:  image_pkey
> ==
> 
> And my (simplified) query looks like this:
> ==
> SELECT   COUNT(entry_fti.id) AS rating,
>  entry.entry_name AS name,
>  entry.entry_id AS id,
>  entry.entry_description_html AS description_html,
>  image.image_name AS thumb1_name,
>  image.height AS thumb1_height,
>  image.width AS thumb1_width
> FROM entry, entry_fti, image
> WHEREentry_fti.id=entry.oid
>   ANDentry.entrytn_image_id=image.image_id
>   ANDentry.entry_live = 't'::bool
>   AND(
>   entry_fti.string ~'^word1'
>   OR
>   entry_fti.string ~'^word2'
>   OR
>.
>.
>   OR
>   entry_fti.string ~'^wordn'
>  )
> GROUP BY entry.entry_id,
>  entry.entry_name,
>  entry.entry_description_html,
>  image.image_name,
>  image.height,
>  image.width
> ORDER BY rating DESC
> ==
> 
> Now this all works, which is good. My problem now is that I want to
> limit the number of results shown on a page to 20 and show the number
> of pages of extra results, much like you'd see on any search engine site.
> Naturally I immedia

Re: [PHP] Re: [SQL] FTI, paged, ranked searching and efficiency.

2000-11-15 Thread Josh Berkus


Stephen,
 
> How come nobody's ever thought of cursors?
> 
> DECLARE foo CURSOR FOR SELECT stuff FROM stuff WHERE foo ORDER BY
> something;
> 
> Hop forward N rows?
> MOVE FORWARD $n IN foo
> 
> Want M rows?
> FETCH FORWARD $m IN foo

I'm intrigued by this.  How would I retrieve cursor rows into a web
application?  If we could output a cursor to a functon result (we
can't), it would be easy, but I'm not sure otherwise.

-Josh
-- 
__AGLIO DATABASE SOLUTIONS___
Josh Berkus
   Complete information technology  [EMAIL PROTECTED]
and data management solutions   (415) 436-9166
   for law firms, small businesses   fax  436-0137
and non-profit organizations.   pager 338-4078
San Francisco

Re: [SQL] Index problem

2000-11-15 Thread Tom Lane

Patrik Kudo <[EMAIL PROTECTED]> writes:
> select * from elever where userid like 'walth%'; <-- Not OK!

> Droping and recreating the index solves the problem, but that's not
> good enough for me since the problem has reoccured on a different
> machine with a different database. vacuum and vacuum analyze does not
> report any problem with the table.

> Both times the problem occured with userid's starting with a
> "w". Postgres is running with a Swedish locale, and on FreeBSD this
> means that "w" and "v" (among a number of other letter) are treated
> equally when collating/sorting. I suppose this could be part of the
> problem.

> Is this a known problem? If so, is it fixed in 7.0.2? I've not seen
> this happen on any of our postgres 7.0.2 systems, but as I can't even
> reproduce it on the 6.5.3, that's no guarantee it's fixed...

Hmm.  I can think of two known issues that may be relevant.  First off,
there are indeed known problems with LIKE index optimization in
non-ASCII locales.  I believe that 7.0 fixes the problems for locales
that just have a non-ASCII sort order of individual characters, but
we've since realized that it can still do the wrong thing in locales
where there are special rules for handling multi-character sequences.
I don't know the Swedish rules offhand, so don't know if that's a
problem for you.

However, a LIKE optimization problem would not be fixed by dropping and
recreating the index.  This makes me think you are being bit by the
other issue: if you compile with LOCALE support then it is critical that
the postmaster *always* be started with the same LOCALE settings (at
least the same LC_COLLATE value).  If LOCALE varies that means your
text sort ordering varies, which means that indexes on text columns may
appear out-of-order, which causes index searches to miss entries they
should have found.  If you insert entries under different LOCALE
settings then you may wind up with an index that is not consistent with
*any* single LOCALE; the only cure for that is to drop and rebuild the
index.

Unfortunately, it's way too easy to get bit by this bug.  The most
common error is to start the postmaster by hand from a shell account
whose LOCALE environment is different from what's supplied when the
postmaster is started from a boot-time script.  Best bet is to set
the correct LOCALE values in a wrapper script that you use in both
cases.

We have talked about fixing this by saving the active LOCALE variables
at initdb time, and having the postmaster adopt those values whenever
it's started.  But it hasn't got done yet.  (Peter, would it be
easy to make GUC handle this?  There'd need to be some way to cause
guc.c to do a putenv() ...)

regards, tom lane

Re: [SQL] Index problem

2000-11-15 Thread Tom Lane


Peter Eisentraut <[EMAIL PROTECTED]> writes:
>> But it hasn't got done yet.  (Peter, would it be
>> easy to make GUC handle this?  There'd need to be some way to cause
>> guc.c to do a putenv() ...)

> I thought pg_control was the place for this.  Any idea how long a locale
> string can legally be?  Doesn't look too hard beyond that.

That would work too, I guess.

> Then again, perhaps this should be a per-database thing, much like the
> encoding.  The postmaster doesn't do any collating, so it should be safe
> to have different locale in different backends. -- Unless someone puts an
> index on pg_database.datname. :-)

There already is an index on pg_group.groname, so I think we'd better be
conservative and require the same locale across the whole installation.

Hmm ... doesn't that mean that per-database encoding is a bogus concept
too!?

regards, tom lane

[SQL] Index problem

Re: [SQL] FTI, paged, ranked searching and efficiency.

Re: [PHP] Re: [SQL] FTI, paged, ranked searching and efficiency.

Re: [SQL] Index problem

Re: [SQL] Index problem

5 matches

Site Navigation

Mail list logo

Footer information