[SQL] Index problem
Hi! We're using postgres 6.5.3 for an application where we, among other things, store information about users. On two separate occations on two different servers we've experienced problems with indices on the same table (out of over 100). The problem is that postgres refuse to find a certain row in the table when performing a simple select using the primary (unique) key like this: select * from elever where userid = 'walther'; When selecting all rows the row in question is shown. When doing the following, I get the result I want: select * from elever where userid like 'w%'; <-- OK select * from elever where userid like 'wa%'; <-- OK select * from elever where userid like 'wal%'; <-- OK select * from elever where userid like 'walt%'; <-- OK but not select * from elever where userid like 'walth%'; <-- Not OK! Droping and recreating the index solves the problem, but that's not good enough for me since the problem has reoccured on a different machine with a different database. vacuum and vacuum analyze does not report any problem with the table. Both times the problem occured with userid's starting with a "w". Postgres is running with a Swedish locale, and on FreeBSD this means that "w" and "v" (among a number of other letter) are treated equally when collating/sorting. I suppose this could be part of the problem. Is this a known problem? If so, is it fixed in 7.0.2? I've not seen this happen on any of our postgres 7.0.2 systems, but as I can't even reproduce it on the 6.5.3, that's no guarantee it's fixed... For the record... I know we shouldn't run 6.5.3 and we will upgrade to 7.x within soon... psql -c 'select version()' PostgreSQL 6.5.3 on i386-unknown-freebsd4.1, compiled by cc uname -a FreeBSD 4.1-RELEASE #0: Tue Aug 29 15:31:01 CEST 2000 Regards, Patrik Kudo ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol Känns det oklart? Fråga på!
Re: [SQL] FTI, paged, ranked searching and efficiency.
Paul, I'm afraid that much of that was over my head. In fact, I'm keeping it as an example in case I ever need to do something similar. Forward your info and I'll include credit in the source :-) In general terms, I've always depended on the PHP to select a "page" of results, using the logic that the number of results on a page is a matter for the web application to handle (a display issue) rather than something to be handled on the back-end (a data issue). However, you point about not pulling out tons of data the user will never examine (i.e. 2nd and succeeding pages) is well-taken. Although, without pulling the entire data set, you can't display to the user how many results there are, total. If it's a strong possibility that the users are really only ever going to want the top 20-40 rated results, then splitting it as you suggest ... first, counting all the matches and then dragging in the rest of the data for the top X records ... makes a lot of sense. Unfortunately, your only real option I can see for DB server-side row grabbing is: Create the query(ies) as a temporary table or view using a function. Then use Limit and Offset to grab one chunk of data at a time. This is, of course, a serious problem for mutli-user performance since eash user would need their own temp table or view. From what I can tell, search engines (google, for example) grab the whole recordset and use the web script to parse it out 25 records at a time. Hopefully, someone on this list will have done that before and can provide less theoretical advice. -Josh Berkus P.S. I've also posted this to the pgsql-php list. I;ve quoted the full text of your question below my .sig for that reason. -- __AGLIO DATABASE SOLUTIONS___ Josh Berkus Complete information technology [EMAIL PROTECTED] and data management solutions (415) 436-9166 for law firms, small businesses fax 436-0137 and non-profit organizations. pager 338-4078 San Francisco Paul wrote: > > Hello, > > This is going to be a bit long, I hope some of you will take the > trouble to read it :) > > I am building a search engine for a section of a (PHP based) website. > I wish the user to be able to a number of words in the search, and the > search results to be ranked by the number of times words occur (both > different words and the same word occuring multiple times are good). > > My (simplified) table structure is this: > == > Table "entry_fti" > Attribute |Type | Modifier > ---+-+-- > string| varchar(25) | > id| oid | > Index: entry_fti_string_idx > >Table "entry" >Attribute | Type |Modifier > ---+---+-- > - > entry_id |integer| not null default > nextval('entry_id_seq'::text) > entry_name|text | > entry_description_html|text | > entry_image_id|integer| not null default 0 > entry_tn_image_id |integer| not null default 0 > entry_live|boolean| not null default 't' > Index: entry_pkey > >Table "image" > Attribute |Type |Modifier > +-+ > image_id | integer | not null default nextval('image_id_seq'::text) > image_name | varchar(32) | > height | integer | not null > width | integer | not null > Indices: image_pkey > == > > And my (simplified) query looks like this: > == > SELECT COUNT(entry_fti.id) AS rating, > entry.entry_name AS name, > entry.entry_id AS id, > entry.entry_description_html AS description_html, > image.image_name AS thumb1_name, > image.height AS thumb1_height, > image.width AS thumb1_width > FROM entry, entry_fti, image > WHEREentry_fti.id=entry.oid > ANDentry.entrytn_image_id=image.image_id > ANDentry.entry_live = 't'::bool > AND( > entry_fti.string ~'^word1' > OR > entry_fti.string ~'^word2' > OR >. >. > OR > entry_fti.string ~'^wordn' > ) > GROUP BY entry.entry_id, > entry.entry_name, > entry.entry_description_html, > image.image_name, > image.height, > image.width > ORDER BY rating DESC > == > > Now this all works, which is good. My problem now is that I want to > limit the number of results shown on a page to 20 and show the number > of pages of extra results, much like you'd see on any search engine site. > Naturally I immedia
Re: [PHP] Re: [SQL] FTI, paged, ranked searching and efficiency.
Stephen, > How come nobody's ever thought of cursors? > > DECLARE foo CURSOR FOR SELECT stuff FROM stuff WHERE foo ORDER BY > something; > > Hop forward N rows? > MOVE FORWARD $n IN foo > > Want M rows? > FETCH FORWARD $m IN foo I'm intrigued by this. How would I retrieve cursor rows into a web application? If we could output a cursor to a functon result (we can't), it would be easy, but I'm not sure otherwise. -Josh -- __AGLIO DATABASE SOLUTIONS___ Josh Berkus Complete information technology [EMAIL PROTECTED] and data management solutions (415) 436-9166 for law firms, small businesses fax 436-0137 and non-profit organizations. pager 338-4078 San Francisco
Re: [SQL] Index problem
Patrik Kudo <[EMAIL PROTECTED]> writes: > select * from elever where userid like 'walth%'; <-- Not OK! > Droping and recreating the index solves the problem, but that's not > good enough for me since the problem has reoccured on a different > machine with a different database. vacuum and vacuum analyze does not > report any problem with the table. > Both times the problem occured with userid's starting with a > "w". Postgres is running with a Swedish locale, and on FreeBSD this > means that "w" and "v" (among a number of other letter) are treated > equally when collating/sorting. I suppose this could be part of the > problem. > Is this a known problem? If so, is it fixed in 7.0.2? I've not seen > this happen on any of our postgres 7.0.2 systems, but as I can't even > reproduce it on the 6.5.3, that's no guarantee it's fixed... Hmm. I can think of two known issues that may be relevant. First off, there are indeed known problems with LIKE index optimization in non-ASCII locales. I believe that 7.0 fixes the problems for locales that just have a non-ASCII sort order of individual characters, but we've since realized that it can still do the wrong thing in locales where there are special rules for handling multi-character sequences. I don't know the Swedish rules offhand, so don't know if that's a problem for you. However, a LIKE optimization problem would not be fixed by dropping and recreating the index. This makes me think you are being bit by the other issue: if you compile with LOCALE support then it is critical that the postmaster *always* be started with the same LOCALE settings (at least the same LC_COLLATE value). If LOCALE varies that means your text sort ordering varies, which means that indexes on text columns may appear out-of-order, which causes index searches to miss entries they should have found. If you insert entries under different LOCALE settings then you may wind up with an index that is not consistent with *any* single LOCALE; the only cure for that is to drop and rebuild the index. Unfortunately, it's way too easy to get bit by this bug. The most common error is to start the postmaster by hand from a shell account whose LOCALE environment is different from what's supplied when the postmaster is started from a boot-time script. Best bet is to set the correct LOCALE values in a wrapper script that you use in both cases. We have talked about fixing this by saving the active LOCALE variables at initdb time, and having the postmaster adopt those values whenever it's started. But it hasn't got done yet. (Peter, would it be easy to make GUC handle this? There'd need to be some way to cause guc.c to do a putenv() ...) regards, tom lane
Re: [SQL] Index problem
Peter Eisentraut <[EMAIL PROTECTED]> writes: >> But it hasn't got done yet. (Peter, would it be >> easy to make GUC handle this? There'd need to be some way to cause >> guc.c to do a putenv() ...) > I thought pg_control was the place for this. Any idea how long a locale > string can legally be? Doesn't look too hard beyond that. That would work too, I guess. > Then again, perhaps this should be a per-database thing, much like the > encoding. The postmaster doesn't do any collating, so it should be safe > to have different locale in different backends. -- Unless someone puts an > index on pg_database.datname. :-) There already is an index on pg_group.groname, so I think we'd better be conservative and require the same locale across the whole installation. Hmm ... doesn't that mean that per-database encoding is a bogus concept too!? regards, tom lane