Lucene comes with some "demo" applications that demonstrate how to use it. I translated one of them from Java into Jython, and it still works pretty much the same way, but takes an extra ten seconds to start up; here's the code, which is noticeably but not dramatically shorter than the Java version.
I wrote this because I wrote a list of desiderata for making a full-text index of my mail for the Nth time, and I realized that Lucene pretty much had all the items on my list, so maybe I'd be better off biting the bullet and using Java and Lucene instead of writing another text indexer from scratch. Jython seems to make Java a *lot* easier to deal with. Just being able to interactively import a package or class, inspect its attributes, instantiate it, and so on, makes a big difference in my experience of using Java. (I wish it included a way to interactively inspect the signatures and doc comments of the things thus inspected.) And Java now works out of the box on Debian, thanks to `gij`, which is another big plus, and even Sun's Java is supposed to be free software now, although I haven't looked lately to see if they've finished that process. It's too bad my laptop is still too small and slow to run Eclipse, and for some reason my `gcj-4.1` documentation is missing. #!/usr/bin/env jython """A Jython version of org.apache.lucene.demo.IndexFiles, the Lucene demo. I haven't gotten this working in `jythonc` yet, because of what I think is a classpath problem. """ # Because this is a modified version of IndexFiles.java from the # Lucene distribution, it carries the same licensing: # Copyright 2004 The Apache Software Foundation # Copyright 2008 Kragen Javier Sitaker # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import sys from java.util import Date from java.io import IOException, File, FileNotFoundException from org.apache.lucene.index import IndexWriter from org.apache.lucene.analysis.standard import StandardAnalyzer from org.apache.lucene.demo import FileDocument # Jython, being 2.1, doesn't have True and False. Fortunately it does # seem to map 1 to Java "true" when appropriate, which is kind of # scary. True = 1 def main(argv): usage = "jython %s <root_directory>" % argv[0] if len(argv) != 2: sys.stderr.write("Usage: %s\n" % usage) sys.exit(1) start = Date() try: writer = IndexWriter("index", StandardAnalyzer(), True) indexDocs(writer, File(argv[1])) writer.optimize() writer.close() print Date().getTime() - start.getTime(), "total milliseconds" except IOException, e: print " caught a", e.getClass() print " with message:", e.getMessage() def indexDocs(writer, file): # do not try to index files that cannot be read if not file.canRead(): return if file.isDirectory(): # "or []" because an IO error could occur, it says for ii in file.list() or []: indexDocs(writer, File(file, ii)) else: print "adding", file try: writer.addDocument(FileDocument.Document(file)) except FileNotFoundException, fnfe: # at least on Windows, some temporary files raise this # exception with an "access denied" message, and checking # if the file can be read doesn't help, it says. pass if __name__ == '__main__': main(sys.argv)