Re: [SLUG] search engine for company network (OT)

2008-05-19 Thread david . lyon


Hi Sebastian,

One option worth looking at if you have some time is transferring your  
content across to a Content Management System (CMS).


I had never used a CMS until I was asked to revue several of them for  
a Government Department. After seeing them in operation I was really  
impressed.


There are several open source ones available, even including some  
excellent Australian ones like Mambo. Support an Australian CMS if you  
possibly can as they are competitive with the rest.


The CMS that I evaluated was Plone. It is a python based CMS and runs  
on Linux. Quite easy to install and set up.


Very quickly, I will explain what they do and how they work.

You can setup hierarchical folders and sub-folders to store your  
documents. Maybe you might setup folders for clients.


For every document (file) you can add a description, comments and  
other information.


The comments and descriptions you add are searchable. You should be  
able to save your autocad files no problem.


A sophisticated search window will let you do easy searches.

Most CMS's allow you to attach previews, in png or jpeg format which  
may or not be helpful. When searching.


From what I have seen, a CMS system like Plone or Mambo might just be  
what you are looking for.


They really do offer incredible capabilities..



Sebastian Spiess wrote:

hi all,

I know this is not a 100% linux related question but it's open   
source baby :-)


On our company network we have a daily growing number of documents   
in lots of folders and stuff. Most of it is organised in project   
folders and has reoccurring folder structures and file names.


We are working hard on giving it more and clearer structure but   
sometimes it is still hard to find some files.


I want to suggest to install a search engine which will index our   
existing files so that employees can crawl quickly though projects   
history.


I've heard of the various desktop search engines like beagle,   
tracker and google desktop but are there open source engines which   
can be run on a server so that many can connect to it and search?


Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and  
 other proprietary software in our daily work so those kind of  
files  would need to be indexed.



Does anyone has a idea, something I could investigate further? a   
software name?



--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-14 Thread Gerard Blacklock

Sebastian Spiess wrote:

hi all,

I know this is not a 100% linux related question but it's open source 
baby :-)


On our company network we have a daily growing number of documents in 
lots of folders and stuff. Most of it is organised in project folders 
and has reoccurring folder structures and file names.


We are working hard on giving it more and clearer structure but 
sometimes it is still hard to find some files.


I want to suggest to install a search engine which will index our 
existing files so that employees can crawl quickly though projects 
history.


I've heard of the various desktop search engines like beagle, tracker 
and google desktop but are there open source engines which can be run 
on a server so that many can connect to it and search?


Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and 
other proprietary software in our daily work so those kind of files 
would need to be indexed.



Does anyone has a idea, something I could investigate further? a 
software name?



cheers, seb


Hi Sebastian,

I have a brief look thru the replies so far and no one has mentioned IBM 
Omni Find Yahoo Edition (http://omnifind.ibm.yahoo.net/index.php) 
probably 'cause its not Opensource :) however it is free and looks to 
remain free for some time. I think it might be right up your alley!


I was in the same position as you some time ago, small company with lots 
and lots of docs ranging from pdfs in a technical library to CAD files 
of differing formats to word, open office documents, pictures etc in 
project folders. We have a fairly stringent file system management plan 
in place but when you not quite sure what your looking for, a decent 
indexed search goes a long long way, especially when looking for that 
darn part number buried deep in a CAD drawing which you did 5 years ago 
:). Note it picks up .dwg files (even non Autocad ones :) ) and a whole 
range of other file types. It has a limit of 200 000 files and a maximum 
of 5 collections but this should cover most small business.


Some of the others I tried/looked at:
*Regain* - http://regain.sourceforge.net/ - actually the best after IBM 
OmniFind!

*Terrier* - http://ir.dcs.gla.ac.uk/terrier/
*Egothor* - http://www.egothor.org/
*Lucene* - http://lucene.apache.org/java/docs/index.html - The IBM 
Ominfind uses this also!


Below is a quick note from my work diary when i was researching and 
trying solutions:


"This was by the far the easiest, most advanced (in terms of 
development) and provided the best results from all of the softwares 
that I tested and looked at. The only issue when installing was a 
missing Java RHEL compatibility package, once this was yummed on my test 
server the install went very smoothly.


The software has a web interface for configuring and searching and uses 
a port off its own java applet server, Jetty, I think. The download 
package includes its own Java runtime environments which alleviates the 
pain of trying to get the right version, for that matter, a working 
version of Java.


The crawling process is pretty resource hungry but seems very quick for 
what it is doing, the results are even more surprising, lots of results 
and fairly relevant ones at that, out of all the software that I tried 
this picked up the most file types and search the most files. Sometimes 
the crawler does not index every file but that is something I am working 
on. I currently have it indexing over 200,000 files and it only results 
in an index size of 4-5gb, thats without caching the files….


The catch with this software? well it is not entirely Opensource, it 
uses the Lucene package but also incorporates some fairly heavy stuff 
from Yahoo and IBM, they have stated also they do not plan to make this 
particular version a paided one. They have a entrprise version for more 
than 500,000 files. The seem to be trying to get a foot in the searching 
world by providing a free version to entice people/companies in. 
Probably not such a bad idea, g$$gle really needs some proper competition"



Feel free to contact me if you think I can help out - I would be happy 
to try! I can even send you a de-sensitized screenie of a typical search 
on our server!


--
Best Regards,

Gerard

"In God we trust, all others bring data"
-- Framed plaque from the '60s, hanging in the Mission Evaluation Room at 
Johnson Space Center, downstairs from Mission Control.

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Richard Heycock
Excerpts from Sebastian Spiess's message of Wed May 14 07:10:01 +1000 2008:
> hi all,
> 
> I know this is not a 100% linux related question but it's open source baby :-)
> 
> On our company network we have a daily growing number of documents in lots of
> folders and stuff. Most of it is organised in 
> project folders and has reoccurring folder structures and file names.
> We are working hard on giving it more and clearer structure but sometimes it 
> is
> still hard to find some files.
> 
> I want to suggest to install a search engine which will index our existing
> files so that employees can crawl quickly though 
> projects history.
> 
> I've heard of the various desktop search engines like beagle, tracker and
> google desktop but are there open source engines 
> which can be run on a server so that many can connect to it and search?
> 
> Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other
> proprietary software in our daily work so those 
> kind of files would need to be indexed.
> 
> 
> Does anyone has a idea, something I could investigate further? a software 
> name?

Hutch/solr might be a good fit. There's quite a good overview of both here:

http://www.danielnaber.de/publications/jazoon07_naber.pdf

rgh

> 
> cheers, seb
> 

-- 
+61 (0) 410 646 369
[EMAIL PROTECTED]

You're worried criminals will continue to penetrate into cyberspace, and
I'm worried complexity, poor design and mismanagement will be there to meet
them - Marcus Ranum
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Nigel Allen

On 14/05/2008 11:58 AM, Mick Pollard wrote:

On Wed, 14 May 2008 07:10:01 +1000
Sebastian Spiess <[EMAIL PROTECTED]> wrote:

  
On our company network we have a daily growing number of documents in lots of folders and stuff. Most of it is organised in 
project folders and has reoccurring folder structures and file names.


I want to suggest to install a search engine which will index our existing files so that employees can crawl quickly though 
projects history.


I've heard of the various desktop search engines like beagle, tracker and google desktop but are there open source engines 
which can be run on a server so that many can connect to it and search?


Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other proprietary software in our daily work so those 
kind of files would need to be indexed.



Does anyone has a idea, something I could investigate further? a software name?



Sounds like a document management system is what you might be better off with. 
This allows for searching of documents complete with an ACL system to protect 
private documents.
Have you had a look at http://www.knowledgetree.com/. 
This http://bitnami.org/stack/knowledgetree is a simple way of installing it all. I find Bitnami stacks are great for evaluating a package before deploying it the traditional way.


  
I'd like to second Mick's suggestion. We've been working with 
KnowledgeTree for the past 4 years or so and have no complaints whatsoever.


Nigel.

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Mick Pollard
On Wed, 14 May 2008 07:10:01 +1000
Sebastian Spiess <[EMAIL PROTECTED]> wrote:

> On our company network we have a daily growing number of documents in lots of 
> folders and stuff. Most of it is organised in 
> project folders and has reoccurring folder structures and file names.
> 
> I want to suggest to install a search engine which will index our existing 
> files so that employees can crawl quickly though 
> projects history.
> 
> I've heard of the various desktop search engines like beagle, tracker and 
> google desktop but are there open source engines 
> which can be run on a server so that many can connect to it and search?
> 
> Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other 
> proprietary software in our daily work so those 
> kind of files would need to be indexed.
> 
> 
> Does anyone has a idea, something I could investigate further? a software 
> name?
> 
Sounds like a document management system is what you might be better off with. 
This allows for searching of documents complete with an ACL system to protect 
private documents.
Have you had a look at http://www.knowledgetree.com/. 
This http://bitnami.org/stack/knowledgetree is a simple way of installing it 
all. I find Bitnami stacks are great for evaluating a package before deploying 
it the traditional way.

-- 
Regards
Mick Pollard ( lunix )

BOFH Excuse of the day:
Asynchronous Memory Expiry


pgp6cR714KLIB.pgp
Description: PGP signature
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread David Kempe

Rev Simon Rumble wrote:
However, nothing beats Google Desktop.  It's changed my work life 
enormously!  I'm forced to use that other, horrible OS and its even 
worse mail client, but by indexing the whole lot everything is just a 
very quick search away.


Google Desktop can be pointed at network drives, so it'll index those 
for you too.
  


Google desktop does have a network web interface that people can search 
the files through a lan web page thingo.

Its an addin, can't remember the name. here we are: http://dnka.com/

dave
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Rev Simon Rumble
This one time, at band camp, Glen Turner wrote:

> It took me as long to set up consistent authentication between
> Samba, NFS and Apache as to do everything else.  Your mileage
> may vary depending what mechanism you use for authentication.

This is the main advantage of the desktop solutions.  The search engine 
indexes what you have access to, nothing more.  With a centralised 
solution, you essentially have to overlay your authentication and 
permissions system(s) over the top of the search engine, and give the 
search engine access to everything.  Or only allow it to index stuff 
that everyone has access to.

-- 
Rev Simon Rumble <[EMAIL PROTECTED]>
www.rumble.net

The Tourist Engineer
Just because you're on holiday, doesn't mean you're not a geek.
http://engineer.openguides.org/

 "The music business is a cruel and shallow money trench, a long
  plastic hallway where thieves and pimps run free, and good men
  die like dogs. There's also a negative side."
- Source unknown, often erroniously attributed to Hunter S. Thompson
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Glen Turner

Sebastian Spiess wrote:

Does anyone has a idea, something I could investigate further? a 
software name?


I index my server's disks using htdig. There are backends for .PDF
.DOC, OpenDocument and so on and it's not at all difficult to add
support for other file formats (basically you write a small program
to spit out the text in the file. I wrote one to pull the ID3 tags
from my music files, based on that I wouldn't expect any trouble
writing one for DXF.)

The way it works is that I present my servers disks via Samba, NFS
and WebDAV. Reading WebDAV is just like reading a web server. So
htdig will index it fine and when users search they use the web
interface and pull the matching file using HTTP when they click
on the link.   Obviously you protect both htdig and the WebDAV
using HTTPS and authentication.

htdig isn't perfect. But it's a nice lightweight search engine,
well worth the hassle installing and will get you started enough
so that if you want something heavier then you'll have a much
better notion of your requirements.

It took me as long to set up consistent authentication between
Samba, NFS and Apache as to do everything else.  Your mileage
may vary depending what mechanism you use for authentication.

--
 Glen Turner
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Jamie Wilkinson
If you're talking a network shared drive, then something simple like htdig
gets you most of the way there.  You can either crawl the disk like
locate(8) does, or crawl the intranet webserver if that's how you're
accessing your docs, then you call the hdig CGI and get back your results.
Cheap, simple open sourcey, and I'm sure the quality of the results you get
back won't be very good :)  But it's a central tool that everyone can enjoy,
unlike the personal copmuter solutions like beagle and google desktop.

If you want to *spend* money, though, some companies sell network appliances
that do all this for you.

2008/5/13 Sebastian Spiess <[EMAIL PROTECTED]>:

> hi all,
>
> I know this is not a 100% linux related question but it's open source baby
> :-)
>
> On our company network we have a daily growing number of documents in lots
> of folders and stuff. Most of it is organised in project folders and has
> reoccurring folder structures and file names.
>
> We are working hard on giving it more and clearer structure but sometimes
> it is still hard to find some files.
>
> I want to suggest to install a search engine which will index our existing
> files so that employees can crawl quickly though projects history.
>
> I've heard of the various desktop search engines like beagle, tracker and
> google desktop but are there open source engines which can be run on a
> server so that many can connect to it and search?
>
> Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other
> proprietary software in our daily work so those kind of files would need to
> be indexed.
>
>
> Does anyone has a idea, something I could investigate further? a software
> name?
>
>
> cheers, seb
>
> --
> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Rev Simon Rumble
This one time, at band camp, Sebastian Spiess wrote:

> I've heard of the various desktop search engines like beagle, tracker and 
> google desktop but are there open source engines which can be run on a 
> server so that many can connect to it and search?

AFAIK Beagle and Tracker are free software.

However, nothing beats Google Desktop.  It's changed my work life 
enormously!  I'm forced to use that other, horrible OS and its even 
worse mail client, but by indexing the whole lot everything is just a 
very quick search away.

Google Desktop can be pointed at network drives, so it'll index those 
for you too.

-- 
Rev Simon Rumble <[EMAIL PROTECTED]>
www.rumble.net

The Tourist Engineer
Because geeks travel too.
http://engineer.openguides.org/

"a skid mark on the bed sheet of Australian politics"
- John Howard, as described by Dean Mighell
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] search engine for company network (OT)

2008-05-13 Thread Amos Shapira
On Wed, May 14, 2008 at 7:10 AM, Sebastian Spiess <
[EMAIL PROTECTED]> wrote:

> hi all,
>
> I know this is not a 100% linux related question but it's open source baby
> :-)
>
> On our company network we have a daily growing number of documents in lots
> of folders and stuff. Most of it is organised in project folders and has
> reoccurring folder structures and file names.
>
> We are working hard on giving it more and clearer structure but sometimes
> it is still hard to find some files.
>
> I want to suggest to install a search engine which will index our existing
> files so that employees can crawl quickly though projects history.
>
> I've heard of the various desktop search engines like beagle, tracker and
> google desktop but are there open source engines which can be run on a
> server so that many can connect to it and search?
>
> Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other
> proprietary software in our daily work so those kind of files would need to
> be indexed.
>
>
> Does anyone has a idea, something I could investigate further? a software
> name?


I have absolutely no experience with it, but just last night I read about
OpenKM (http://www.openkm.org/) and it sounds like something which might fit
the bill. I'd try it for our company too if/when I get time.

I'd be interested to hear what you come up with as we can use something like
this too.

Cheers,

--Amos
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


[SLUG] search engine for company network (OT)

2008-05-13 Thread Sebastian Spiess

hi all,

I know this is not a 100% linux related question but it's open source baby :-)

On our company network we have a daily growing number of documents in lots of folders and stuff. Most of it is organised in 
project folders and has reoccurring folder structures and file names.


We are working hard on giving it more and clearer structure but sometimes it is 
still hard to find some files.

I want to suggest to install a search engine which will index our existing files so that employees can crawl quickly though 
projects history.


I've heard of the various desktop search engines like beagle, tracker and google desktop but are there open source engines 
which can be run on a server so that many can connect to it and search?


Sadly we are relying on MS office (2001), AutoCAD (R16 to 2008) and other proprietary software in our daily work so those 
kind of files would need to be indexed.



Does anyone has a idea, something I could investigate further? a software name?


cheers, seb

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html