Re: [GENERAL] integration of fulltext search in bytea/docs

2009-07-30 Thread Radek Novotný
Is there possible to create pg trigger that runs shell script? 
-- 

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online 
streaming) 
a videoarchív ve formátu FLASH.

Bc. Radek Novotný
jednatel Mediawork group s.r.o.

tel.: +420 724 020 361
email: radek.novo...@mediawork.cz
http://www.mediawork.cz

Dne St 29. července 2009 17:38:31 Pavel Stehule napsal(a):
 Hello

 Dne 29. červenec 2009 16:46 Radek Novotný radek.novo...@mediawork.cz

 napsal(a):
  Hello,
  is there in the roadmap of postgre integration of fulltext searching in
  documents saved in blobs (bytea)?

 What I know, no. PostgreSQL doesn't know about others binary formats,
 so it cannot do it.

  For example linux antiword can export fine text output that can be
  inserted into varchar field.

 I understand it well. Three years ago, we used antiword. We stored two
 values - bytea - original word doc, and text - antiword output. The
 overhead is minimal, and this solution worked very well.

  Would be very very nice (postgre users can be proud to be first) to save
  documents into bytea and search that field via to_tsvector, to_tsquery
  ...

 It's should be very slow. You have to do repeated transformation.

 Pavel Stehule

  --
 
  www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
  streaming) a videoarchív ve formátu FLASH.
 
  Bc. Radek Novotný
  jednatel Mediawork group s.r.o.
 
  tel.: +420 724 020 361
  email: radek.novo...@mediawork.cz
  http://www.mediawork.cz


Re: [GENERAL] integration of fulltext search in bytea/docs

2009-07-30 Thread Sam Mason
On Thu, Jul 30, 2009 at 03:09:12PM +0200, Radek Novotnnn wrote:
 Is there possible to create pg trigger that runs shell script? 

Yes, pl/perl can do this.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] integration of fulltext search in bytea/docs

2009-07-30 Thread Michael Glaesemann


On Jul 30, 2009, at 9:09 , Radek Novotný wrote:


Is there possible to create pg trigger that runs shell script?



[Please don't top post.]

Yes. You can use an untrusted language such as pl/perlu to run system  
commands.


Michael Glaesemann
grzm seespotcode net




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] integration of fulltext search in bytea/docs

2009-07-30 Thread A. Kretschmer
In response to Radek Novotný :
 Is there possible to create pg trigger that runs shell script?

Sure, use an untrusted language for the trigger-function.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] integration of fulltext search in bytea/docs

2009-07-30 Thread Tom Lane
Michael Glaesemann g...@seespotcode.net writes:
 On Jul 30, 2009, at 9:09 , Radek Novotný wrote:
 Is there possible to create pg trigger that runs shell script?

 Yes. You can use an untrusted language such as pl/perlu to run system  
 commands.

The fact that you can do it doesn't make it a good idea ...

If you do this, you'll need to consider what happens if the database
transaction rolls back after calling your trigger.  The effects in the
filesystem are still there, but the effects in the database aren't.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] integration of fulltext search in bytea/docs

2009-07-29 Thread Sam Mason
On Wed, Jul 29, 2009 at 04:46:43PM +0200, Radek Novotnnn wrote:
 is there in the roadmap of postgre integration of fulltext searching in 
 documents saved in blobs (bytea)?

Do you mean bytea or large-objects?

 Would be very very nice (postgre users can be proud to be first) to save 
 documents into bytea and search that field via to_tsvector, to_tsquery ...

This seems easy; for large objects, just use lo_export() to dump the
blob out to the filesystem, and then use something like pl/perl to run
antiword on it, saving the results to another file and then returning
the file line-by-line as a SETOF TEXT (I think this is the best way of
handling things in case the resulting text file is enormous anyway).  If
this code was called runfilter we can use it like:

  UPDATE myfiles f SET tsidx = (
SELECT ts_accum(to_tsvector(t))
FROM runfilter(f.loid) t);

Where we've defined ts_accum to be:

  CREATE AGGREGATE ts_accum (tsvector) (
SFUNC = tsvector_concat,
STYPE = tsvector,
INITCOND = ''
  );

bytea is different because you know when the values has changed (i.e.
write a trigger) but you need to write more code to get the bytea value
out into the filesystem.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] integration of fulltext search in bytea/docs

2009-07-29 Thread Pavel Stehule
Hello

Dne 29. červenec 2009 16:46 Radek Novotný radek.novo...@mediawork.cz
napsal(a):
 Hello,
 is there in the roadmap of postgre integration of fulltext searching in
 documents saved in blobs (bytea)?


What I know, no. PostgreSQL doesn't know about others binary formats,
so it cannot do it.

 For example linux antiword can export fine text output that can be inserted
 into varchar field.

I understand it well. Three years ago, we used antiword. We stored two
values - bytea - original word doc, and text - antiword output. The
overhead is minimal, and this solution worked very well.


 Would be very very nice (postgre users can be proud to be first) to save
 documents into bytea and search that field via to_tsvector, to_tsquery ...


It's should be very slow. You have to do repeated transformation.

Pavel Stehule

 --

 www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
 streaming) a videoarchív ve formátu FLASH.

 Bc. Radek Novotný
 jednatel Mediawork group s.r.o.

 tel.: +420 724 020 361
 email: radek.novo...@mediawork.cz
 http://www.mediawork.cz



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general