Here is the english version of the article for those who are interested. Lucene version 2.9 released
Content-Management systems like the ones powering the channels at AOL, social networks like LinkedIn, the cloud nebula cloud computing platform at NASA: Nearly no application that does not need to implement search functionality. In many cases, behind the search boxes Lucene is at work: Initially written by Doug Cutting, the search index is being developed by a large and diverse community as a project at the Apache Software Foundation since eight years. Last week on Friday version 2.9 was released. Lucene committer Uwe Schindler gave an overview of the new features of Lucene at the Apache Hadoop Get Together at newthinking store Berlin: Especially interesting for current users are the many performance improvements: The new version brings per-segment searches and caches, near real-time search. Identification and sorting of hits will be faster due to optimizations in the collector- and scorer APIs. First analysis http://www.lucidimagination.com/blog/2009/09/22/contrived-fieldcache-load-test-lucene-2-4-vs-lucene-2-9/ look very good. Most notably the refactoring towards per-segment search can yield performance improvements for a vast number of users. Per-Segment search takes advantage of the fact that the majority of segments don't change when an index is modified. If necessary at all segments are only reloaded partially which reduces the time to re-open an index, enables the reuse of internal cache structures and prevents a large amount of objects from being garbage collected. Beside runtime improvements and many changes of expert APIs, Lucene 2.9 introduces a new "TokenStream" API. The new API introduces stronger typing and enables developers to place arbitrary attributes on a "Token" level within the analysis process. One example use case of this API is to annotate terms with type information that can be reused at later analysis stages or during indexing. That way one can mark terms that represent places, names or companies with so-called named-entity-recognition software. The former "Token"-API has been marked as "deprecated" and will be removed in Lucenes next major release. Custom Analyzers and TokenStreams based on older releases should be re-factored to use the new API which represents the major downside of this API change. Especially geo- and product searches benefit from an improved, trie-based range query implementation: Searches for products within a range of prizes instead of exact prizes will be substantially faster after the upgrade. In addition highlighting of found query terms in large documents has been optimized. Beside the improvements on the Lucene-Core version 2.9 introduces several new contrib modules. Lucene-spatial formerly known as "Local Lucene" has been donated to the Apache Lucene Project earlier this year. This new module brings basic support for localized and distance search as well as filter and sort capabilities of search results based on GeoHash and Cartesien Tiers. The lucene-remote module moved Lucenes dependency on JAVA RMI from the core into a optional contrib module which enables the new release to run on platforms with a reduced standard library like Android on a "as is" basis. As a further addition a new and more flexible Query-Parser was introduced that is fully compatibile with the existing core Query-Parser syntax. Lucene-queryparser provides extended programmatic control for customization and cleanly separates the syntax from its internal representation. After all a great improvement for users aiming to build their own query syntax for Lucene. The new Lucene version comes with new query types, enhanced payload queries as well as new multi term queries that allow for wild cards. The components responsible for parsing indexed documents have been extended to support for persian and arabic languages. Support for Chinese document processing has been improved as well. But the release does not only bring lots of new features and performance improvements. In some cases the API was changed. So instead of doing a drop in replacement of the old version it is recommended to carefully read the release notes and changes and recompile the code that uses Lucene to identify any incompatibilities early on. Version 2.9 is the last release before 3.0 that is expected in the near future. Version 3.0 will remove any code currently marked as deprecated. In addition this new version will be the first to have JVM 1.5 as a requirement instead of 1.4. A detailed analysis of the changes in Lucene 2.9 is available online as webinar at http://www.lucidimagination.com/blog/2009/09/22/lucene29_webinar/. Early November Apache Con US http://us.apachecon.com/c/acus2009 has lots of room for discussions with developers in trainings, two Lucene conference tracks and several user meetups. On Mon, Oct 5, 2009 at 5:11 PM, Simon Willnauer <simon.willna...@googlemail.com> wrote: > > Hey Lucene Users, > > Heise.de > (http://www.heise.de/open/artikel/Such-Engine-Lucene-in-Version-2-9-erschienen-810377.html) > has just published an article about the new 2.9 release. > Unfortunately they only published the german version while we tried to get > the english one too. Thanks to Isabel (http://blog.isabel-drost.de/) again > for all the work. > > simon --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org