This looks great. Thank you Thierry! ----- Original Message ----- From: "Thierry Ferrero" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, November 10, 2004 12:23 PM Subject: Re: Indexing MS Files
> I used OpenOffice API to convert all Word and Excel version. > For me it's "the solution" for complex Word and Excel document. > http://api.openoffice.org/ > Good luck ! > > // UNO API > import com.sun.star.bridge.XUnoUrlResolver; > import com.sun.star.uno.XComponentContext; > import com.sun.star.uno.UnoRuntime; > import com.sun.star.frame.XComponentLoader; > import com.sun.star.frame.XStorable; > import com.sun.star.beans.PropertyValue; > import com.sun.star.beans.XPropertySet; > import com.sun.star.lang.XComponent; > import com.sun.star.lang.XMultiComponentFactory; > import com.sun.star.connection.NoConnectException; > import com.sun.star.io.IOException; > > > /** This class implements a http servlet in order to convert an incoming > document > * with help of a running OpenOffice.org and to push the converted file back > * to the client. > */ > public class DocConverter { > > private String stringHost; > private String stringPort; > private Xcontext xcontext; > private Xbase xbase; > > public DocConverter(Xbase xbase,Xcontext xcontext,ServletContext sc) { > > this.xbase=xbase; > this.xcontext=xcontext; > stringHost=ApplicationUtil.getParameter(sc,"openoffice.oohost"); > stringPort=ApplicationUtil.getParameter(sc,"openoffice.ooport"); > } > > public synchronized String convertToTxt(String namedoc, String pathdoc, > String stringConvertType, String stringExtension) { > > String stringConvertedFile = this.convertDocument(namedoc, pathdoc, > stringConvertType, stringExtension); > return stringConvertedFile; > } > > > /** This method converts a document to a given type by using a running > * OpenOffice.org and saves the converted document to the specified > * working directory. > * @param stringDocumentName The full path name of the file on the server to > be converted. > * @param stringConvertType Type to convert to. > * @param stringExtension This string will be appended to the file name of > the converted file. > * @return The full path name of the converted file will be returned. > * @see stringWorkingDirectory > */ > private String convertDocument(String namedoc, String pathdoc, String > stringConvertType, String stringExtension ) { > > String tagerr=""; > String stringUrl=""; > String stringConvertedFile = ""; > // Converting the document to the favoured type > try { > tagerr="0"; > // Composing the URL - suppression de l'extension > stringUrl = pathdoc+"/"+namedoc; > stringUrl=stringUrl.replace( '\\', '/' ); > /* Bootstraps a component context with the jurt base components > registered. Component context to be granted to a component for > running. > Arbitrary values can be retrieved from the context. */ > XComponentContext xcomponentcontext = > com.sun.star.comp.helper.Bootstrap.createInitialComponentContext( > null ); > > /* Gets the service manager instance to be used (or null). This method > has > been added for convenience, because the service manager is a often > used > object. */ > XMultiComponentFactory xmulticomponentfactory = > xcomponentcontext.getServiceManager(); > tagerr="2"; > /* Creates an instance of the component UnoUrlResolver which > supports the services specified by the factory. */ > Object objectUrlResolver = > xmulticomponentfactory.createInstanceWithContext( > "com.sun.star.bridge.UnoUrlResolver", xcomponentcontext ); > // Create a new url resolver > XUnoUrlResolver xurlresolver = ( XUnoUrlResolver ) > UnoRuntime.queryInterface( XUnoUrlResolver.class, > objectUrlResolver ); > // Resolves an object that is specified as follow: > // uno:<connection description>;<protocol description>;<initial object > name> > Object objectInitial = xurlresolver.resolve( > "uno:socket,host=" + stringHost + ",port=" + stringPort + > ";urp;StarOffice.ServiceManager" ); > > // Create a service manager from the initial object > xmulticomponentfactory = ( XMultiComponentFactory ) > UnoRuntime.queryInterface( XMultiComponentFactory.class, > objectInitial ); > // Query for the XPropertySet interface. > XPropertySet xpropertysetMultiComponentFactory = ( XPropertySet ) > UnoRuntime.queryInterface( XPropertySet.class, > xmulticomponentfactory ); > // Get the default context from the office server. > Object objectDefaultContext = > xpropertysetMultiComponentFactory.getPropertyValue( > "DefaultContext" ); > > // Query for the interface XComponentContext. > xcomponentcontext = ( XComponentContext ) UnoRuntime.queryInterface( > XComponentContext.class, objectDefaultContext ); > > /* A desktop environment contains tasks with one or more > frames in which components can be loaded. Desktop is the > environment for components which can instanciate within > frames. */ > XComponentLoader xcomponentloader = ( XComponentLoader ) > UnoRuntime.queryInterface( XComponentLoader.class, > xmulticomponentfactory.createInstanceWithContext( > "com.sun.star.frame.Desktop", xcomponentcontext ) ); > > // Preparing properties for loading the document > PropertyValue propertyvalue[] = new PropertyValue[ 1 ]; > // Setting the flag for hidding the open document > propertyvalue[ 0 ] = new PropertyValue(); > propertyvalue[ 0 ].Name = "Hidden"; > propertyvalue[ 0 ].Value = new Boolean(true); > > > // Loading the wanted document > Object objectDocumentToStore = > xcomponentloader.loadComponentFromURL( > stringUrl, "_blank", 0, propertyvalue ); > > // Getting an object that will offer a simple way to store a document > to a URL. > XStorable xstorable = > ( XStorable ) UnoRuntime.queryInterface( XStorable.class, > objectDocumentToStore ); > > // Preparing properties for converting the document > propertyvalue = new PropertyValue[2]; > // Setting the flag for overwriting > propertyvalue[0] = new PropertyValue(); > propertyvalue[0].Name = "Overwrite"; > propertyvalue[0].Value = new Boolean(true); > > // Setting the filter name > propertyvalue[1] = new PropertyValue(); > propertyvalue[1].Name = "FilterName"; > propertyvalue[1].Value = stringConvertType; > > // Appending the favoured extension to the origin document name > //if(stringUrl.lastIndexOf(".")!=0){ > //stringUrl=stringUrl.substring(0,stringUrl.lastIndexOf(".")); > //} > > if(namedoc.lastIndexOf(".")!=-1){ > namedoc=namedoc.substring(0,namedoc.lastIndexOf(".")); > } > > //stringConvertedFile = stringUrl + "." + stringExtension; > > > > stringConvertedFile=xbase.getAlias("local")+"/oo_tmp/"+namedoc+"."+stringExt > ension; > > stringConvertedFile=stringConvertedFile.replace( '\\', '/' ); > > // Storing and converting the document > xstorable.storeToURL( stringConvertedFile, propertyvalue ); > > // Getting the method dispose() for closing the document > XComponent xcomponent = > ( XComponent ) UnoRuntime.queryInterface( XComponent.class, > xstorable ); > > // Closing the converted document > xcomponent.dispose(); > } > > catch(NoConnectException ex ) { > return( "" ); > } > catch( IOException ex ) { > return( "" ); > } > catch( Exception ex ) { > return( "" ); > } > > > // Returning the name of the converted file > return( stringConvertedFile ); > } > > > ----- Original Message ----- > From: "Luke Shannon" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Wednesday, November 10, 2004 5:59 PM > Subject: Re: Indexing MS Files > > > > Thanks Otis. I am looking forward to this book. Any idea when it may be > > released? > > > > ----- Original Message ----- > > From: "Otis Gospodnetic" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Wednesday, November 10, 2004 11:54 AM > > Subject: Re: Indexing MS Files > > > > > > > That's one place to start. The other one would be textmining.org, at > > > least for Word files. > > > I used both POI and Textmining API in Lucene in Action, and the latter > > > was much simpler to use. You can also find some comments about both > > > libs in lucene-user archives. People tend to like Textmining API > > > better. > > > > > > Otis > > > > > > --- Luke Shannon <[EMAIL PROTECTED]> wrote: > > > > > > > I need to index Word, Excel and Power Point files. > > > > > > > > Is this the place to start? > > > > > > > > http://jakarta.apache.org/poi/ > > > > > > > > Is there something better? > > > > > > > > Thanks, > > > > > > > > Luke > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]