Re: Moving Verity collections from Win to Linux (PDF/DOC problem)

2004-02-17 Thread Jamie Jackson
Hi Dave,

While I had already read those links, they were indeed among the most
helpful. However, it's still hard to prethink pitfalls associated with
Lucene/CFMX spidering just going by these tutorials. Therefore, in
order to eliminate a lot of the unknown, I'm going to avoid Lucene for
the time being, and try to hack a Verity solution together.

Thanks,
Jamie

On Fri, 13 Feb 2004 18:14:09 -0500, in cf-talk you wrote:

Perhaps these links might help in your quest?

Searching with Lucene and MX:
Part 1: http://www.sys-con.com/coldfusion/article.cfm?id=629
Part 2: http://www.sys-con.com/coldfusion/article.cfm?id=639

Extracting text from a PDF (from Matt Liotta's 1/12 blog entries):
http://devilm.com/mt/mt-tb.cgi/60
 [Todays Threads] 
 [This Message] 
 [Subscription] 
 [Fast Unsubscribe] 
 [User Settings]




Re: Moving Verity collections from Win to Linux (PDF/DOC problem)

2004-02-13 Thread Rob Rohan
It would appear you are going the free route, if that is not true don't
forget about google.
http://www.google.com/services/

On Fri, 2004-02-13 at 13:56, Jamie Jackson wrote:
 I've been tasked with estimating the LOE of making a CFMX/Linux site
 searchable. The site needs to be spidered (as opposed to a *regular*
 Verity index), and PDFs and DOCs need to be indexed as well.
 
 Issue: AFAIK, Verity still can't directly index DOCs and PDFs.
 
 The options as I see them, are:
 1. Copy site to a Win box (running CF5), and do the VK2
 spidering/indexing there, then move the collection to the CFMX/Linux
 box.
 2. Stick with _CFMX_/Linux/VK2, and run toText routines on problem
 file types.
 3. Go with Lucene.
 
 Seeing that MM/Verity isn't addressing the PDF/DOC issue (or are
 they?), it seems that the best long-term solution would be #3
 (Lucene), but it's a big unknown for me. I don't have much of a clue
 as to how long it would take me (a Java novice) to set up a
 spider/index/search for the first time, and what potential
 deficiencies I'd be left with once it had been set up.
 
 #2 seems okay, but it could get complicated when it comes to crawling
 to the text alternatives. I'm also unsure what becomes of metadata
 (i.e. titles) when doing these conversions.
 
 However, the solution that falls best within my current skillset is
 #1, as I've done several Win/VK2/CF5 spiders. Here's the question: Is
 this solution as straightforward as it seems? I know there are several
 steps, but having done the aforementioned spiders, I would guess it
 would take me two days to knock this out (leaving me with a somewhat
 less than automatic process for future updates... which I could
 automate later). Are there any GOTCHAs here?
 
 Thanks,
 Jamie

 [Todays Threads] 
 [This Message] 
 [Subscription] 
 [Fast Unsubscribe] 
 [User Settings]




Re: Moving Verity collections from Win to Linux (PDF/DOC problem)

2004-02-13 Thread Dave Carabetta
 On Fri, 2004-02-13 at 13:56, Jamie Jackson wrote:
  I've been tasked with estimating the LOE of making a CFMX/Linux site
  searchable. The site needs to be spidered (as opposed to a *regular*
  Verity index), and PDFs and DOCs need to be indexed as well.
  
  Issue: AFAIK, Verity still can't directly index DOCs and PDFs.
  
  The options as I see them, are:
  1. Copy site to a Win box (running CF5), and do the VK2
  spidering/indexing there, then move the collection to the CFMX/Linux
  box.
  2. Stick with _CFMX_/Linux/VK2, and run toText routines on problem
  file types.
  3. Go with Lucene.
  
  Seeing that MM/Verity isn't addressing the PDF/DOC issue (or are
  they?), it seems that the best long-term solution would be #3
  (Lucene), but it's a big unknown for me. I don't have much of a clue
  as to how long it would take me (a Java novice) to set up a
  spider/index/search for the first time, and what potential
  deficiencies I'd be left with once it had been set up.
  
  #2 seems okay, but it could get complicated when it comes to crawling
  to the text alternatives. I'm also unsure what becomes of metadata
  (i.e. titles) when doing these conversions.
  
  However, the solution that falls best within my current skillset is
  #1, as I've done several Win/VK2/CF5 spiders. Here's the question: Is
  this solution as straightforward as it seems? I know there are several
  steps, but having done the aforementioned spiders, I would guess it
  would take me two days to knock this out (leaving me with a somewhat
  less than automatic process for future updates... which I could
  automate later). Are there any GOTCHAs here?
  

Perhaps these links might help in your quest?

Searching with Lucene and MX:
Part 1: http://www.sys-con.com/coldfusion/article.cfm?id=629
Part 2: http://www.sys-con.com/coldfusion/article.cfm?id=639

Extracting text from a PDF (from Matt Liotta's 1/12 blog entries):
http://devilm.com/mt/mt-tb.cgi/60

Regards,
Dave.
 [Todays Threads] 
 [This Message] 
 [Subscription] 
 [Fast Unsubscribe] 
 [User Settings]




Re: Moving Verity collections from Win to Linux (PDF/DOC problem)

2004-02-13 Thread Jamie Jackson
On 13 Feb 2004 14:46:18 -0800, in cf-talk you wrote:

It would appear you are going the free route, if that is not true don't
forget about google.
http://www.google.com/services/

Hmm, I had forgotten about Google. If I can do what I need with
robots.txt (wrt filtering), this might be a viable solution for this
project.

Does anyone know what's an average ballpark of Google's index
frequency (how long a stale index might live)?

This is a noteworthy solution, but if anybody has it, I'd appreciate
any information on the other solutions I mentioned.

Thanks,
Jamie
 [Todays Threads] 
 [This Message] 
 [Subscription] 
 [Fast Unsubscribe] 
 [User Settings]




RE: Moving Verity Collections

2000-08-09 Thread LISTS


Can you export the reg entry in the Allaire key that represent the verity
collections? Then import on the new machine.

John Cesta

http://www.cybersmarts.net
-
ColdFusion ASP and ActiveState PERL Hosting

www.serverautomationtools.com

-Original Message-
From: Morgan, Thomas J. [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 09, 2000 8:02 AM
To: '[EMAIL PROTECTED]'
Subject: Moving Verity Collections


I am upgrading our web server and need to move some Verity collections from
the old server to the new one.  Any suggestions on the procedure.  Thanks.

Thomas J. Morgan
Information Delivery Systems
Research Triangle Institute
3040 Cornwallis Road
RTP, NC  27709
(919)541-7414
[EMAIL PROTECTED]
Http:\\ids.rti.org


--
Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
To Unsubscribe visit
http://www.houseoffusion.com/index.cfm?sidebar=listsbody=lists/cf_talk or
send a message to [EMAIL PROTECTED] with 'unsubscribe' in
the body.

--
Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
To Unsubscribe visit 
http://www.houseoffusion.com/index.cfm?sidebar=listsbody=lists/cf_talk or send a 
message to [EMAIL PROTECTED] with 'unsubscribe' in the body.