Hello Grant (cc-ing aperture-devel),

I am one of the Aperture admins, I can tell you a bit more about Aperture's mail facilities.

Short intro: Aperture is a framework for crawling and full-text and metadata extraction of a growing number of sources and file formats. We try to select the best-of-breed of the large number of open source libraries that tackle a specific source or format (e.g. PDFBox, Poi, JavaMail) and write some glue code around it so that they can be invoked in a uniform way. It's currently used in a number of desktop and enterprise search applications, both research and production systems.

At the moment we support a number of mail systems.

We can crawl IMAP mail boxes through JavaMail. In general it seems to work well, problems are usually caused by IMAP servers not conforming to the IMAP specs.

Some people have used the ImapCrawler to crawl MS Exchange as well. Some succeeded, some didn't. I don't really know whether the fault is in Aperture's code or in the Exchange configuration but I would be happy to take a look at it when someone runs into problems.

Outlook can also be crawled by connecting to a running Outlook process through jacob.dll. Others on aperture-devel can tell you more about its current status. Besides this crawler, I would also be very interested in having a crawler that directly processes .pst files, as to stay clear from communicating with other processes outside your own control.

People have been working on crawling Thunderbird mailboxes but I don't know what the current status is.

Ultimately, we try to support any major mail system. In practice, effort is usually dependent on knowledge and experience as well as customer demand.

We are happy to help you out with trying to get Aperture working in your domain and looking into the problems that you may encounter.


Kind regards,

Chris
--

Grant Ingersoll wrote:
Anyone have any recommendations on a decent, open (doesn't have to be Apache license, but would prefer non-GPL if possible), extractor for MS Exchange and/or PST files? The Zoe link on the FAQ [1] seems dead.

For mbox, I think mstor will suffice for me and I think tropo (from the FAQ should work for IMAP). Does anyone have experience with http://aperture.sourceforge.net/

[1] http://wiki.apache.org/lucene-java/LuceneFAQ#head-bcba2effabe224d5fb8c1761e4da1fedceb9800e

Cheers,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to