Indexing the latests MS Office documents

2010-01-03 Thread Roland Villemoes
Hi All,

Anyone who knows how to index the latest MS office documents like .docx and 
.xlsx  ?

>From searching it seems like Tika only supports the earlier formats .doc and 
>.xls



med venlig hilsen/best regards

Roland Villemoes
Tel: (+45) 22 69 59 62
E-Mail: mailto:r...@alpha-solutions.dk



Re: Indexing the latests MS Office documents

2010-01-03 Thread Mattmann, Chris A (388J)
Hi Roland,

You probably want to send your email to tika-u...@lucene.apache.org.

Best of luck!

Cheers,
Chris



On 1/3/10 4:00 PM, "Roland Villemoes"  wrote:

> Hi All,
> 
> Anyone who knows how to index the latest MS office documents like .docx and
> .xlsx  ?
> 
> From searching it seems like Tika only supports the earlier formats .doc and
> .xls
> 
> 
> 
> med venlig hilsen/best regards
> 
> Roland Villemoes
> Tel: (+45) 22 69 59 62
> E-Mail: mailto:r...@alpha-solutions.dk
> 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: Indexing the latests MS Office documents

2010-01-04 Thread Peter Wolanin
You must have been searching old documentation - I think tika 0,3+ has
support for the new MS formats.  but don't take my word for it - why
don't you build tika and try it?

-Peter

On Sun, Jan 3, 2010 at 7:00 PM, Roland Villemoes  
wrote:
> Hi All,
>
> Anyone who knows how to index the latest MS office documents like .docx and 
> .xlsx  ?
>
> From searching it seems like Tika only supports the earlier formats .doc and 
> .xls
>
>
>
> med venlig hilsen/best regards
>
> Roland Villemoes
> Tel: (+45) 22 69 59 62
> E-Mail: mailto:r...@alpha-solutions.dk
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: Indexing the latests MS Office documents

2010-01-05 Thread Jay Hill
The version of Tika in the 1.4 release definitely parses the most current
Office formats (.docx, .pptx, etc.) and they index as expected.

-Jay


On Mon, Jan 4, 2010 at 6:02 PM, Peter Wolanin wrote:

> You must have been searching old documentation - I think tika 0,3+ has
> support for the new MS formats.  but don't take my word for it - why
> don't you build tika and try it?
>
> -Peter
>
> On Sun, Jan 3, 2010 at 7:00 PM, Roland Villemoes 
> wrote:
> > Hi All,
> >
> > Anyone who knows how to index the latest MS office documents like .docx
> and .xlsx  ?
> >
> > From searching it seems like Tika only supports the earlier formats .doc
> and .xls
> >
> >
> >
> > med venlig hilsen/best regards
> >
> > Roland Villemoes
> > Tel: (+45) 22 69 59 62
> > E-Mail: mailto:r...@alpha-solutions.dk
> >
> >
>
>
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wola...@acquia.com
>