Re: [GENERAL] Full Text Search 101?

2009-11-12 Thread Aurynn Shaw

Hi Jonathan;


 I posted to this forum once before and was able to receive help.
 Thanks again!

 I'm trying to implement full text search capabilities.  Basically, I
 have a very simple data catalog type of website (http://
 gis.drcog.org/datacatalog), where the user can type in a word or words
 to search for records that match the criteria.  I have a table with a
 few fields, but I want to be able to search/index two fields -- the
 name of the dataset and the description.  (or more, if I can index
 fields from other tables too, that would be great).  I'd like to be
 able to use a full text search to rank the results in terms of
 relevance.

You can set up an index per table, but I've found that having a single 
vector table with bridge tables to the data I want to search works well.

This would be in the style of:

create table vectors (
 id serial primary key,
 vector tsvector not null
);

create table interesting_data (
 id serial primary key,
 textual text not null
);

CREATE TABLE vector_to_interesting (
 v_id int not null references vectors(id),
 i_id int not null references interesting_data(id)
);

And then

SELECT id.*
  FROM interesting_data itd,
   vectors v,
   vector_to_interesting itv
 WHERE v.id = itv.v_id
   AND itv.i_id = itd.id
   AND v.vector @@ to_tsquery('searchpattern');

Allowing for some procedures around that that return multiple row types, 
to the client software.



 What is the best way to handle this?  I've read through the PostgreSQL
 documentation and don't quite understand it although I'm trying to
 understand and am 'playing around' with this on a development server.
 Is the default text search configuration enough? I've tried setting my
 own configuration but get errors about not finding .dict dictionary
 files?  I have a default install of PostgreSQL 8.4.

 Also, I've created a tsvector column and created indexes, but it
 didn't seem to delete stop words from the indexes.  Should the stop
 words be indexed?

The default english configuration in PostgreSQL should have done this. 
Building your own configuration tends towards being a bit more advanced, 
and english should suit most needs.



 Also, I don't quite understand how to create indexes but rank certain
 words as being more important than others, for instance, maybe having
 words that come from the name column carrying more importance than
 words coming from the description column.

For part of this, there's ts_rank(tsvector, tsquery), as well as the 
relative weighting system in the textsearch modules. Unfortunately, I 
don't have much experience with the relative weighting, but, ts_rank() 
should get you partly there.


I also found an interesting page on relative weighting @
http://www.postgresonline.com/special_feature.php?sf_name=postgresql83tsearch_cheatsheetoutputformat=html

that might be useful for you.

 Finally, I'd like
 watersheds to come up when someone searches for water so I don't
 think I have this configured properly, because this record is not
 returned.

Well, there's two ways to achieve this.
The first is to manually the stopwords, which is getting into the custom 
configuration and a bit on the more advanced side.

The other is that in PG8.4, you can do

to_tsquery('water:*')

which will tell the search parser to do a partial match, which would 
return watershed, in this instance.
My testing has shown it to be a bit slower (30ms, vs 15ms for a 
non-partial search), but not egregiously slow.



 Is there a good tutorial or maybe documentation that is a bit easier
 to understand?  I know my database is far from complicated but I can't
 seem to find a configuration that works well.  When I try ranking my
 results, most of the results end up with the same rank, so something
 must be wrong?

 Can someone point me in the right direction?

 Thanks for the help.  I appreciate it.


Hope this helps. :)

Regards,
--
Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

as...@commandprompt.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Dump all objects under a specific schema

2007-08-08 Thread Aurynn Shaw

Hi;

Hello List,

Is there a way that I can only dump all objects under a specific  
schema?


I'd like to dump all tables, views' definition and data under a  
specific schema.



pg_dump --help says that you're looking for the -n flag, so,

pg_dump -n your_schema -f output.sql

Hope that helps,

Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

[EMAIL PROTECTED]



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Mac OS X

2007-07-12 Thread Aurynn Shaw
I tried to install postgres onto my macbook via 'fink' and don't  
like it all that much.
I decided to install from source, it's a fallback to my slackware  
days.


But fink already created a user postgres and I can't seem to find  
anything to change it's configuration settings for shell, home  
director...


If anyone has suggestions I would appreciate it.

Also, does anyone know of a more current installation write-up for  
Mac other than what Apple provides?  It's written around 7.4 and  
I'm not sure it's going to be optimal.




Personally, I use the PG package available on http://www.entropy.ch/ 
software/macosx/postgresql/. It worked flawlessly, set up initdb, and  
configured PG to start at boot time.


Thanks,
Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

[EMAIL PROTECTED]



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] collision in serial numbers after INSERT?

2007-06-01 Thread Aurynn Shaw

Hi;


Thanks Aurynn, but then I have another question --

Even if I do a 'SELECT nextval('your_sequence');', how do I prevent  
an insert from happening between me selecting the next serial value  
and then actually inserting it?


It seems like I should lock the table if i want to be certain.


SELECT nextval('your_sequence') updates the sequence as well, so the  
next transaction that calls SELECT nextval('your_sequence') will get  
your_return_value + 1. Once you SELECT nextval('your_sequence'), no  
other call to nextval will get the sequence number you were just  
given, barring an act such as using setval().


The logic would be akin to:

SELECT nextval('your_sequence');

-- any amount of stuff can happen here, including other transactions  
that alter the sequence


INSERT INTO your_table (serial_field, data) VALUES  
(sequence_value_you_selected, 'some data');


You can read more about how sequences work
http://www.postgresql.org/docs/current/static/functions-sequence.html

Hope that helps,
Aurynn.



On 6/1/07, Aurynn Shaw [EMAIL PROTECTED]  wrote:
 Hello all -

 I'm working on a site with PHP and Postgres, coming from a MySQL
 background.

 I was looking for an equivalent to the mysql_insert_id()  
function, and

 a site recommended this:

 function postg_insert_id($tablename, $fieldname)
 {
  global connection_id;
  $result=pg_exec($connection_id, SELECT last_value FROM $ 
{tablename}_

 ${fieldname}_seq);
  $seq_array=pg_fetch_row($result, 0);
  return $seq_array[0];
 }

 It relies on pg's sequencing ability.

 However, I wondered, if I were in an environment where there were  
many
 concurrent inserts, would it be possible that I didn't get the  
serial

 number of the insert that *I* just did? That if I do an insert, and
 several inserts happen after mine, wouldn't I get the id of the  
latest

 row, which is several inserts after mine?

Everything that deals with sequences happens outside of transactions,
so this could theoretically happen.

The usual way to avoid this is to do:

SELECT nextval('your_sequence');

Then do your insert with that in the serial field.

Hope that helps,
Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

[EMAIL PROTECTED]





--
Computers are useless. They can only give you answers
-- Pablo Picasso


Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

[EMAIL PROTECTED]



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] On-line / off-line trace of SQL statements presented to the Postgres SQL engine

2007-05-30 Thread Aurynn Shaw



We are using a three-tier application with J2EE, JBoss, Hibernate and
a Postgres database.

It would be a nice thing to monitor or trace the actual SQL statements
processed by the DB. I do not really need the result set as I can get
this - if required - using SQL against the DB through the Pgsql
interface.


You can configure Postgres to give you this information in the  
logfiles, or before you execute any queries on your connection.


The configuration option in postgresql.conf is, I believe,  
log_statement.


Alternatively, you can run SET log_statement to 'all'; on your JDBC  
connection before your application begins to query the database.


http://www.postgresql.org/docs/8.2/static/runtime-config-logging.html

Hope that helps.

Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

[EMAIL PROTECTED]



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly