i haven't tested it yet but have a look in the manual... ;-)

---- cut http://www.aspseek.org/man/aspseek.conf.5.html ----
external converters

index(1) has an ability to deal with document types other than text/plain
and text/html. It does so with the help of an external programs or scripts,
which can convert from some format to text/plain (or text/html), so you are
able to index .ps, .pdf etc.

Converter from/type to/type command line
Specifies that for converting documents with MIME-type from/type to
MIME-type to/type the command specified by command line will be used.
Argument from/type can be any type returned by Web server. Argument to/type
can be either text/plain or text/html.

In the command line you usually specify program or script to run, together
with its options. Program is expected to to read from stdin and write the
converted document to stdout.

If your program can't deal with stdin/stdout streams, you should use $in and
$out strings in command line, and they will be substituted with two file
names in /tmp directory. index(1) will create files with unique names, write
the document downloaded to the first file (referenced as $in), run the
/bin/prog, read the second file (referenced as $out) into memory, and then
delete both files.

You can also use $url in command line, it will be substituted with the
actual URL of downloaded document. You can use it in your own scripts to
distinguish between a different document variations, or to be able to write
one script for many different MIME-types.

Please note that index(1) relies on a Content-Type header returned by a Web
server. Some Web-servers are misconfigured and give wrong info (for example,
return header Content-Type: audio/x-pn-realaudio-plugin for .rpm files).

Examples: Converter application/postscript text/plain ps2ascii
# ps2ascii can't deal with PDF files from stdin
Converter application/pdf text/plain ps2ascii $in $out
---- cut ----


mfg

Markus Rietzler
* <rietzler_software/> | http://www.rietzler-software.de
* Wuppertal-Navigator | http://www.wuppertal-navigator.de
* eMail: [EMAIL PROTECTED]

Neue Nordstrasse 43
42105 Wuppertal

Fon: 0700.RIETZLER (0700.7438 9537)
     0202.420830
Fax: 0202.242 24 66

-----Ursprungliche Nachricht-----
Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]Im Auftrag von Diego
Montalvo
Gesendet: Donnerstag, 21. Februar 2002 19:53
An: [EMAIL PROTECTED]
Betreff: Re: AW: [aseek-users] ASPSeek - PDF / RTF


Kir or Markus,

What are the proper steps to having this type of
functionality?  I must first download the converters
then , how do I configure ASPSeek for the external
converters?

Diego





--- Kir Kolyshkin <[EMAIL PROTECTED]> wrote:
> ASPSeek will also present "text version" of beer.pdf
> to be viewed
> (in the place where "cached" link usually is), much
> like as Google does,
> so you can see the result of conversion. Excerpts
> are also supported.
>
> > [EMAIL PROTECTED] wrote:
> >
> > no no,
> >
> > the external converter is started from aspseek
> during the index process when aspseek finds a pdf
> file.
> > so in your case:
> >
> > when aspseek indexes www.crazy.com and finds
> beer.pdf it starts the converter. the converter
> reads the pdf-document convert it to txt/html. now
> aspseek indexes this export.
> >
> > no your users can search also in pdf documents. so
> when "beer" is in beer.bdf, aspseek will list the
> link to beer.pdf as a result and even displays the
> short extract. your users now can click on the link
> and acrobat reader opens to display the pdf-file.
> >
> > so external converter means a helper programme for
> apseek to index pdf-documents.
> >
> > Markus Rietzler
> > * kommunikation & online service
> > * RZF NRW
> > * Tel: 0211.4572-130
> >
> > -----Urspr|ngliche Nachricht-----
> > Von: Diego Montalvo [mailto:[EMAIL PROTECTED]]
> > Gesendet am: Donnerstag, 21. Februar 2002 16:55
> > An: [EMAIL PROTECTED]
> > Betreff: Re: [aseek-users] ASPSeek - PDF / RTF
> >
> > Kir,
> >
> > I am somewhat confused,  so ASPSeek will crawl and
> > index .PDF and such files,  but will not present
> them
> > as .html?  Therefore I need a external converter?
> >
> > Or does an external converter first convert, then
> I
> > run ASPSeek?
> >
> > example:  I want to index "www.crazy.com/beer.pdf"
>  i
> > simply use ASPSeek, to retreive words from
> "beer.pdf"
> > but then I mst use an external program to view in
> > html?
> >
> > do you have a link to such a search engine using
> > ASPSeek with external converters?
> >
> > Diego
> >
> > --- Kir Kolyshkin <[EMAIL PROTECTED]> wrote:
> > > Diego Montalvo wrote:
> > > >
> > > > Hello,
> > > >
> > > > In the ASPSeek Manual pages there is a mention
> > > that
> > > > ASPSeek understands PDF, RTF formats with help
> of
> > > an
> > > > external program,  what program is that?  I
> would
> > > like
> > > > to embed it into ASPSeek.
> > >
> > > There's no need to embed. Manual talks about
> > > External Converters,
> > > described in
> > >
> http://www.aspseek.org/man/aspseek.conf.5.html#lbAM
> > > So as long as you have program that can convert,
> > > say, pdf to html,
> > > you can index pdf documents with aspseek.
> > >
> > > Good ps to text (or html) converter is here:
> > > http://www.nzdl.org/html/prescript.html
> > > There are also links to other such tools.
> > >
> > > As for converter from rtf or doc format, I know
> of
> > > word2x:               http://word2x.alcom.co.uk/
> > > antiword:
> http://www.winfield.demon.nl/index.html
> > > unrtf:
> http://www.geocities.com/tuorfa/unrtf.html
> > > --
> > > [EMAIL PROTECTED]  http://kir.vtx.ru/    ICQ
> 7551596
> > > Phone +7 903 6722750
> > > Hi, I'm a signature virus: copy me to your
> > > .signature to help me spread!
> > > --
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Sports - Coverage of the 2002 Olympic Games
> > http://sports.yahoo.com
>
> --
> [EMAIL PROTECTED]  http://kir.vtx.ru/    ICQ 7551596
> Phone +7 903 6722750
> Hi, I'm a signature virus: copy me to your
> .signature to help me spread!
> --


__________________________________________________
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com

Reply via email to