Re: Indexing all versions of Microsoft Office Documents

2010-04-27 Thread Shashi Kant
If you are on Windows try the Microsoft IFilter API - it supports
current Office versions.
http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en



On Tue, Apr 27, 2010 at 6:08 AM, Roland Villemoes  
wrote:
> Hi All,
>
> Does anyone have a running solution indexing Microsoft Office Documents e.g. 
> .docx .xlsx etc. ?
>
> I can see a lot of examples using Tika for rich content extraction, but still 
> nothing when it comes to newer versions of Microsoft Office?
> What libraries to use of not Tika?
>
> med venlig hilsen/best regards
>
> Roland Villemoes
> Tel: (+45) 22 69 59 62
> E-Mail: mailto:r...@alpha-solutions.dk
>
> Alpha Solutions A/S
> Borgergade 2, 3.sal, 1300 København K
> Tel: (+45) 70 20 65 38
> Web: http://www.alpha-solutions.dk
>
> ** This message including any attachments may contain confidential and/or 
> privileged information intended only for the person or entity to which it is 
> addressed. If you are not the intended recipient you should delete this 
> message. Any printing, copying, distribution or other use of this message is 
> strictly prohibited. If you have received this message in error, please 
> notify the sender immediately by telephone, or e-mail and delete all copies 
> of this message and any attachments from your system. Thank you.
>
>


Re: Indexing all versions of Microsoft Office Documents

2010-04-27 Thread Otis Gospodnetic
Roland,

A better place to ask might in fact be tika-user mailing list.  Sorry, I don't 
have the answer except for this pointer.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Roland Villemoes 
> To: "solr-user@lucene.apache.org" 
> Sent: Tue, April 27, 2010 6:08:30 AM
> Subject: Indexing all versions of Microsoft Office Documents
> 
> Hi All,

Does anyone have a running solution indexing Microsoft Office 
> Documents e.g. .docx .xlsx etc. ?

I can see a lot of examples using Tika 
> for rich content extraction, but still nothing when it comes to newer 
> versions 
> of Microsoft Office?
What libraries to use of not Tika?

med venlig 
> hilsen/best regards

Roland Villemoes
Tel: (+45) 22 69 59 62
E-Mail: 
> mailto:
> href="mailto:r...@alpha-solutions.dk";>r...@alpha-solutions.dk

Alpha 
> Solutions A/S
Borgergade 2, 3.sal, 1300 København K
Tel: (+45) 70 20 65 
> 38
Web: 
> >http://www.alpha-solutions.dk<
> target=_blank >http://www.alpha-solutions.dk/>

** This message 
> including any attachments may contain confidential and/or privileged 
> information 
> intended only for the person or entity to which it is addressed. If you are 
> not 
> the intended recipient you should delete this message. Any printing, copying, 
> distribution or other use of this message is strictly prohibited. If you have 
> received this message in error, please notify the sender immediately by 
> telephone, or e-mail and delete all copies of this message and any 
> attachments 
> from your system. Thank you.


Indexing all versions of Microsoft Office Documents

2010-04-27 Thread Roland Villemoes
Hi All,

Does anyone have a running solution indexing Microsoft Office Documents e.g. 
.docx .xlsx etc. ?

I can see a lot of examples using Tika for rich content extraction, but still 
nothing when it comes to newer versions of Microsoft Office?
What libraries to use of not Tika?

med venlig hilsen/best regards

Roland Villemoes
Tel: (+45) 22 69 59 62
E-Mail: mailto:r...@alpha-solutions.dk

Alpha Solutions A/S
Borgergade 2, 3.sal, 1300 København K
Tel: (+45) 70 20 65 38
Web: http://www.alpha-solutions.dk

** This message including any attachments may contain confidential and/or 
privileged information intended only for the person or entity to which it is 
addressed. If you are not the intended recipient you should delete this 
message. Any printing, copying, distribution or other use of this message is 
strictly prohibited. If you have received this message in error, please notify 
the sender immediately by telephone, or e-mail and delete all copies of this 
message and any attachments from your system. Thank you.