Re: [SLUG] search engine for company network (OT)
Hi Sebastian, One option worth looking at if you have some time is transferring your content across to a Content Management System (CMS). I had never used a CMS until I was asked to revue several of them for a Government Department. After seeing them in operation I was really impressed. There are several open source ones available, even including some excellent Australian ones like Mambo. Support an Australian CMS if you possibly can as they are competitive with the rest. The CMS that I evaluated was Plone. It is a python based CMS and runs on Linux. Quite easy to install and set up. Very quickly, I will explain what they do and how they work. You can setup hierarchical folders and sub-folders to store your documents. Maybe you might setup folders for clients. For every document (file) you can add a description, comments and other information. The comments and descriptions you add are searchable. You should be able to save your autocad files no problem. A sophisticated search window will let you do easy searches. Most CMS's allow you to attach previews, in png or jpeg format which may or not be helpful. When searching. From what I have seen, a CMS system like Plone or Mambo might just be what you are looking for. They really do offer incredible capabilities.. Sebastian Spiess wrote: hi all, I know this is not a 100% linux related question but it's open source baby :-) On our company network we have a daily growing number of documents in lots of folders and stuff. Most of it is organised in project folders and has reoccurring folder structures and file names. We are working hard on giving it more and clearer structure but sometimes it is still hard to find some files. I want to suggest to install a search engine which will index our existing files so that employees can crawl quickly though projects history. I've heard of the various desktop search engines like beagle, tracker and google desktop but are there open source engines which can be run on a server so that many can connect to it and search? Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other proprietary software in our daily work so those kind of files would need to be indexed. Does anyone has a idea, something I could investigate further? a software name? -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
Sebastian Spiess wrote: hi all, I know this is not a 100% linux related question but it's open source baby :-) On our company network we have a daily growing number of documents in lots of folders and stuff. Most of it is organised in project folders and has reoccurring folder structures and file names. We are working hard on giving it more and clearer structure but sometimes it is still hard to find some files. I want to suggest to install a search engine which will index our existing files so that employees can crawl quickly though projects history. I've heard of the various desktop search engines like beagle, tracker and google desktop but are there open source engines which can be run on a server so that many can connect to it and search? Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other proprietary software in our daily work so those kind of files would need to be indexed. Does anyone has a idea, something I could investigate further? a software name? cheers, seb Hi Sebastian, I have a brief look thru the replies so far and no one has mentioned IBM Omni Find Yahoo Edition (http://omnifind.ibm.yahoo.net/index.php) probably 'cause its not Opensource :) however it is free and looks to remain free for some time. I think it might be right up your alley! I was in the same position as you some time ago, small company with lots and lots of docs ranging from pdfs in a technical library to CAD files of differing formats to word, open office documents, pictures etc in project folders. We have a fairly stringent file system management plan in place but when you not quite sure what your looking for, a decent indexed search goes a long long way, especially when looking for that darn part number buried deep in a CAD drawing which you did 5 years ago :). Note it picks up .dwg files (even non Autocad ones :) ) and a whole range of other file types. It has a limit of 200 000 files and a maximum of 5 collections but this should cover most small business. Some of the others I tried/looked at: *Regain* - http://regain.sourceforge.net/ - actually the best after IBM OmniFind! *Terrier* - http://ir.dcs.gla.ac.uk/terrier/ *Egothor* - http://www.egothor.org/ *Lucene* - http://lucene.apache.org/java/docs/index.html - The IBM Ominfind uses this also! Below is a quick note from my work diary when i was researching and trying solutions: "This was by the far the easiest, most advanced (in terms of development) and provided the best results from all of the softwares that I tested and looked at. The only issue when installing was a missing Java RHEL compatibility package, once this was yummed on my test server the install went very smoothly. The software has a web interface for configuring and searching and uses a port off its own java applet server, Jetty, I think. The download package includes its own Java runtime environments which alleviates the pain of trying to get the right version, for that matter, a working version of Java. The crawling process is pretty resource hungry but seems very quick for what it is doing, the results are even more surprising, lots of results and fairly relevant ones at that, out of all the software that I tried this picked up the most file types and search the most files. Sometimes the crawler does not index every file but that is something I am working on. I currently have it indexing over 200,000 files and it only results in an index size of 4-5gb, thats without caching the files…. The catch with this software? well it is not entirely Opensource, it uses the Lucene package but also incorporates some fairly heavy stuff from Yahoo and IBM, they have stated also they do not plan to make this particular version a paided one. They have a entrprise version for more than 500,000 files. The seem to be trying to get a foot in the searching world by providing a free version to entice people/companies in. Probably not such a bad idea, g$$gle really needs some proper competition" Feel free to contact me if you think I can help out - I would be happy to try! I can even send you a de-sensitized screenie of a typical search on our server! -- Best Regards, Gerard "In God we trust, all others bring data" -- Framed plaque from the '60s, hanging in the Mission Evaluation Room at Johnson Space Center, downstairs from Mission Control. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
Excerpts from Sebastian Spiess's message of Wed May 14 07:10:01 +1000 2008: > hi all, > > I know this is not a 100% linux related question but it's open source baby :-) > > On our company network we have a daily growing number of documents in lots of > folders and stuff. Most of it is organised in > project folders and has reoccurring folder structures and file names. > We are working hard on giving it more and clearer structure but sometimes it > is > still hard to find some files. > > I want to suggest to install a search engine which will index our existing > files so that employees can crawl quickly though > projects history. > > I've heard of the various desktop search engines like beagle, tracker and > google desktop but are there open source engines > which can be run on a server so that many can connect to it and search? > > Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other > proprietary software in our daily work so those > kind of files would need to be indexed. > > > Does anyone has a idea, something I could investigate further? a software > name? Hutch/solr might be a good fit. There's quite a good overview of both here: http://www.danielnaber.de/publications/jazoon07_naber.pdf rgh > > cheers, seb > -- +61 (0) 410 646 369 [EMAIL PROTECTED] You're worried criminals will continue to penetrate into cyberspace, and I'm worried complexity, poor design and mismanagement will be there to meet them - Marcus Ranum -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
On 14/05/2008 11:58 AM, Mick Pollard wrote: On Wed, 14 May 2008 07:10:01 +1000 Sebastian Spiess <[EMAIL PROTECTED]> wrote: On our company network we have a daily growing number of documents in lots of folders and stuff. Most of it is organised in project folders and has reoccurring folder structures and file names. I want to suggest to install a search engine which will index our existing files so that employees can crawl quickly though projects history. I've heard of the various desktop search engines like beagle, tracker and google desktop but are there open source engines which can be run on a server so that many can connect to it and search? Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other proprietary software in our daily work so those kind of files would need to be indexed. Does anyone has a idea, something I could investigate further? a software name? Sounds like a document management system is what you might be better off with. This allows for searching of documents complete with an ACL system to protect private documents. Have you had a look at http://www.knowledgetree.com/. This http://bitnami.org/stack/knowledgetree is a simple way of installing it all. I find Bitnami stacks are great for evaluating a package before deploying it the traditional way. I'd like to second Mick's suggestion. We've been working with KnowledgeTree for the past 4 years or so and have no complaints whatsoever. Nigel. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
On Wed, 14 May 2008 07:10:01 +1000 Sebastian Spiess <[EMAIL PROTECTED]> wrote: > On our company network we have a daily growing number of documents in lots of > folders and stuff. Most of it is organised in > project folders and has reoccurring folder structures and file names. > > I want to suggest to install a search engine which will index our existing > files so that employees can crawl quickly though > projects history. > > I've heard of the various desktop search engines like beagle, tracker and > google desktop but are there open source engines > which can be run on a server so that many can connect to it and search? > > Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other > proprietary software in our daily work so those > kind of files would need to be indexed. > > > Does anyone has a idea, something I could investigate further? a software > name? > Sounds like a document management system is what you might be better off with. This allows for searching of documents complete with an ACL system to protect private documents. Have you had a look at http://www.knowledgetree.com/. This http://bitnami.org/stack/knowledgetree is a simple way of installing it all. I find Bitnami stacks are great for evaluating a package before deploying it the traditional way. -- Regards Mick Pollard ( lunix ) BOFH Excuse of the day: Asynchronous Memory Expiry pgp6cR714KLIB.pgp Description: PGP signature -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
Rev Simon Rumble wrote: However, nothing beats Google Desktop. It's changed my work life enormously! I'm forced to use that other, horrible OS and its even worse mail client, but by indexing the whole lot everything is just a very quick search away. Google Desktop can be pointed at network drives, so it'll index those for you too. Google desktop does have a network web interface that people can search the files through a lan web page thingo. Its an addin, can't remember the name. here we are: http://dnka.com/ dave -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
This one time, at band camp, Glen Turner wrote: > It took me as long to set up consistent authentication between > Samba, NFS and Apache as to do everything else. Your mileage > may vary depending what mechanism you use for authentication. This is the main advantage of the desktop solutions. The search engine indexes what you have access to, nothing more. With a centralised solution, you essentially have to overlay your authentication and permissions system(s) over the top of the search engine, and give the search engine access to everything. Or only allow it to index stuff that everyone has access to. -- Rev Simon Rumble <[EMAIL PROTECTED]> www.rumble.net The Tourist Engineer Just because you're on holiday, doesn't mean you're not a geek. http://engineer.openguides.org/ "The music business is a cruel and shallow money trench, a long plastic hallway where thieves and pimps run free, and good men die like dogs. There's also a negative side." - Source unknown, often erroniously attributed to Hunter S. Thompson -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
Sebastian Spiess wrote: Does anyone has a idea, something I could investigate further? a software name? I index my server's disks using htdig. There are backends for .PDF .DOC, OpenDocument and so on and it's not at all difficult to add support for other file formats (basically you write a small program to spit out the text in the file. I wrote one to pull the ID3 tags from my music files, based on that I wouldn't expect any trouble writing one for DXF.) The way it works is that I present my servers disks via Samba, NFS and WebDAV. Reading WebDAV is just like reading a web server. So htdig will index it fine and when users search they use the web interface and pull the matching file using HTTP when they click on the link. Obviously you protect both htdig and the WebDAV using HTTPS and authentication. htdig isn't perfect. But it's a nice lightweight search engine, well worth the hassle installing and will get you started enough so that if you want something heavier then you'll have a much better notion of your requirements. It took me as long to set up consistent authentication between Samba, NFS and Apache as to do everything else. Your mileage may vary depending what mechanism you use for authentication. -- Glen Turner -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
If you're talking a network shared drive, then something simple like htdig gets you most of the way there. You can either crawl the disk like locate(8) does, or crawl the intranet webserver if that's how you're accessing your docs, then you call the hdig CGI and get back your results. Cheap, simple open sourcey, and I'm sure the quality of the results you get back won't be very good :) But it's a central tool that everyone can enjoy, unlike the personal copmuter solutions like beagle and google desktop. If you want to *spend* money, though, some companies sell network appliances that do all this for you. 2008/5/13 Sebastian Spiess <[EMAIL PROTECTED]>: > hi all, > > I know this is not a 100% linux related question but it's open source baby > :-) > > On our company network we have a daily growing number of documents in lots > of folders and stuff. Most of it is organised in project folders and has > reoccurring folder structures and file names. > > We are working hard on giving it more and clearer structure but sometimes > it is still hard to find some files. > > I want to suggest to install a search engine which will index our existing > files so that employees can crawl quickly though projects history. > > I've heard of the various desktop search engines like beagle, tracker and > google desktop but are there open source engines which can be run on a > server so that many can connect to it and search? > > Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other > proprietary software in our daily work so those kind of files would need to > be indexed. > > > Does anyone has a idea, something I could investigate further? a software > name? > > > cheers, seb > > -- > SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ > Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html > -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
This one time, at band camp, Sebastian Spiess wrote: > I've heard of the various desktop search engines like beagle, tracker and > google desktop but are there open source engines which can be run on a > server so that many can connect to it and search? AFAIK Beagle and Tracker are free software. However, nothing beats Google Desktop. It's changed my work life enormously! I'm forced to use that other, horrible OS and its even worse mail client, but by indexing the whole lot everything is just a very quick search away. Google Desktop can be pointed at network drives, so it'll index those for you too. -- Rev Simon Rumble <[EMAIL PROTECTED]> www.rumble.net The Tourist Engineer Because geeks travel too. http://engineer.openguides.org/ "a skid mark on the bed sheet of Australian politics" - John Howard, as described by Dean Mighell -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] search engine for company network (OT)
On Wed, May 14, 2008 at 7:10 AM, Sebastian Spiess < [EMAIL PROTECTED]> wrote: > hi all, > > I know this is not a 100% linux related question but it's open source baby > :-) > > On our company network we have a daily growing number of documents in lots > of folders and stuff. Most of it is organised in project folders and has > reoccurring folder structures and file names. > > We are working hard on giving it more and clearer structure but sometimes > it is still hard to find some files. > > I want to suggest to install a search engine which will index our existing > files so that employees can crawl quickly though projects history. > > I've heard of the various desktop search engines like beagle, tracker and > google desktop but are there open source engines which can be run on a > server so that many can connect to it and search? > > Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other > proprietary software in our daily work so those kind of files would need to > be indexed. > > > Does anyone has a idea, something I could investigate further? a software > name? I have absolutely no experience with it, but just last night I read about OpenKM (http://www.openkm.org/) and it sounds like something which might fit the bill. I'd try it for our company too if/when I get time. I'd be interested to hear what you come up with as we can use something like this too. Cheers, --Amos -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
[SLUG] search engine for company network (OT)
hi all, I know this is not a 100% linux related question but it's open source baby :-) On our company network we have a daily growing number of documents in lots of folders and stuff. Most of it is organised in project folders and has reoccurring folder structures and file names. We are working hard on giving it more and clearer structure but sometimes it is still hard to find some files. I want to suggest to install a search engine which will index our existing files so that employees can crawl quickly though projects history. I've heard of the various desktop search engines like beagle, tracker and google desktop but are there open source engines which can be run on a server so that many can connect to it and search? Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other proprietary software in our daily work so those kind of files would need to be indexed. Does anyone has a idea, something I could investigate further? a software name? cheers, seb -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html