Re: cfindex is taking forever - and one more question
I've optimized things as much as I could by building a number of collections and limiting each to a specific doc type. Next question!! I'm trying to return a few sentences from each doc with the search term highlighted. So, I use "ContextPassages" like below. However, very rarely is #searchResults.context# actually giving me anything. Out of 30 returned documents, maybe only 3 return content for #searchResults.context#. Usually it's empty/null/ Suggestions? ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360462 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
RE: cfindex is taking forever
This is not a CF solution but it may at least help with what it has to trawl through - in any case this will help anything else that has to call or access the document. This is for PDF files but you might consider converting Office files to PDF at your discretion of course - a properly prepared PDF version of an Office document can be up to a quarter of the file size of the source document - that's useful and print and view quality is not compromised. I'm a big fan of PDF but unfortunately it's a file format that suffers a lot from bad file preparation - the result is unnecessarily big files amongst other things. Try optimising all the PDFs to see if this reduces the size of some of the files - I suspect it might. You'll need to check that the output settings (e.g. print resolution, image resolution etc.) are suitable for the end purpose of the document but from my experience the default settings are usually quite good. The good news is you can automate this process over the entire file system with Acrobat Pro's batch feature. Hope that helps in some way! ++ Kevin Parker ++ -Original Message- From: Les Mizzell [mailto:lesm...@bellsouth.net] Sent: Thursday, 9 April 2015 8:23 AM To: cf-talk Subject: cfindex is taking forever I'm working on building a search interface for a "document depo" on a site. The document folder has files going all the way back to 2005, and includes a number of 10+ meg pdf files, a few that are over 20 megs, countless Word and Excel files, Power Point presentations I don't have access to the CFAdministrator, so: The collection was created successfully as far as I can tell. However, indexing has been running (or at least the wheel on my browser is still turning) for almost 3 hours now. I'm going to forget about it and go mow my grass and see what's happening when I finish. I'm thinking though ... too much stuff to index? Or is amount of time not out of line for a very large collection of files? Also, I've not been able to find a list of legally accepted extensions. I might have something listed that's just going to cause it to crap out anyway. Thoughts? Try something else? What exactly? ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360439 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: cfindex is taking forever
Not in front of a computer right now, but there is an option in the CFcollection tag to list or get a collection details (something like that). Pretty sure that gives you the record or document count and maybe even size . I think that is accessible while indexing is happening. You could possibly write a quick script to see how far along things are. On Apr 8, 2015 6:51 PM, "Les Mizzell" wrote: > > > > That doesn't actually sound unreasonable, but it might be useful to > > come up with a document count more specific than "very large". > > > Approx 3000 documents - around 3 gb of data > ... it's still running from what I can tell. > > ~ ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360437 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: cfindex is taking forever
> That doesn't actually sound unreasonable, but it might be useful to > come up with a document count more specific than "very large". Approx 3000 documents - around 3 gb of data ... it's still running from what I can tell. ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360436 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: cfindex is taking forever
you also have to take your disk iops into consideration. If you are on a VPS then this will give you much slower disk performance, especially if its not SSD, and actions like this can take a lot longer. On Wed, Apr 8, 2015 at 11:32 PM, Les Mizzell wrote: > > > I'm going to forget about it and go mow my grass and see what's > happening when I finish. > > Well crap, somebody stole my lawnmower. This is why we can't have nice > things > > ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360435 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: cfindex is taking forever
> The collection was created successfully as far as I can tell. However, > indexing has been running (or at least the wheel on my browser is still > turning) for almost 3 hours now. I'm going to forget about it and go mow > my grass and see what's happening when I finish. > > I'm thinking though ... too much stuff to index? Or is amount of time > not out of line for a very large collection of files? That doesn't actually sound unreasonable, but it might be useful to come up with a document count more specific than "very large". > Thoughts? Try something else? What exactly? Have you considered Solr instead of Verity? Not that this would solve the problem of indexing a lot of files, specifically. Dave Watts, CTO, Fig Leaf Software 1-202-527-9569 http://www.figleaf.com/ http://training.figleaf.com/ Fig Leaf Software is a Service-Disabled Veteran-Owned Small Business (SDVOSB) on GSA Schedule, and provides the highest caliber vendor- authorized instruction at our training centers, online, or onsite. ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360433 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: cfindex is taking forever
> I'm going to forget about it and go mow my grass and see what's happening when I finish. Well crap, somebody stole my lawnmower. This is why we can't have nice things ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:360434 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm