On Feb 2, 2005, at 2:40 AM, jac jac wrote:
May I know whether Lucene currently supports indexing of xml documents?
That's a loaded question. Lucene "supports" it by being able to index
text, sure. But Lucene does not include an XML parser and the facility
to automatically turn an XML file into a Lucene document, nor would you
want that. For example - in my current project, I'm parsing XML
documents, and indexing pieces of them individually as Lucene Documents
- in fact I'm doing that in all kinds of various ways too.
The demo applications that you've tried are not designed for anything
but a very very basic demonstration of how to use Lucene - these
example applications were never intended to be used as-is for anything
other than some code you could borrow and learn from to build your own
custom solutions.
If you want a quick jump on processing XML with Lucene, try out the
code that comes with Lucene in Action (grab it from
www.lucenebook.com). When you get the code, run this:
$ ant ExtensionFileHandler
Buildfile: build.xml
...
ExtensionFileHandler:
[echo]
[echo] This example demonstrates the file extension document
handler.
[echo] Documents with extensions .xml, .rtf, .doc, .pdf,
.html, and .txt are
[echo] all handled by the framework. The contents of the
Lucene Document
[echo] built for the specified file is displayed.
[echo]
[input] Press return to continue...
[input] File: [src/lia/handlingtypes/data/HTML.html]
src/lia/handlingtypes/data/addressbook.xml
[echo] Running lia.handlingtypes.framework.ExtensionFileHandler...
[java] log4j:WARN No appenders could be found for logger
(org.apache.commons.digester.Digester.sax).
[java] log4j:WARN Please initialize the log4j system properly.
[java] Document Keyword Keyword Keyword
Keyword Keyword Keyword
Keyword>
BUILD SUCCESSFUL
Total time: 18 seconds
Note that I typed in the path to an XML file where it asks for [input].
Now dig into the source tree and borrow what you need from
src/lia/handlingtypes
Erik
I tried building an index to index all my directories in webapps:
via:
java org.apache.lucene.demo.IndexFiles /homedir/tomcat/webapps
then I tried using the following command to search:
java org.apache.lucene.demo.SearchFiles
and i typed in my query. I was able to see the files which directs me
the path which holds my data.
However, when I do
java org.apache.lucene.demo.IndexHTML -create -index /homedir/index ..
and I went to my website I realised it can't serach for the data I
wanted instead.
I want to search data within XML documents... May I know if the
current demo version allows indexing of XML documents?
Why is it that after I do "java org.apache.lucene.demo.IndexHTML
-create -index /homedir/index .." then the data I wanted can't be
searched? thanks alot!
jac
Yahoo! Mobile
- Download the latest ringtones, games, and more!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]