Re: German article about Lucene 2.9

Simon Willnauer Mon, 05 Oct 2009 08:52:50 -0700

Here is the english version of the article for those who are interested.

Lucene version 2.9 released

Content-Management systems like the ones powering the channels at AOL,
social networks like LinkedIn, the cloud nebula cloud computing
platform at NASA: Nearly no application that does not need to
implement search functionality. In many cases, behind the search boxes
Lucene is at work: Initially written by Doug Cutting, the search index
is being developed by a large and diverse community as a project at
the Apache Software Foundation since eight years.

Last week on Friday version 2.9 was released. Lucene committer Uwe
Schindler gave an overview of the new features of Lucene at the Apache
Hadoop Get Together at newthinking store Berlin: Especially
interesting for current users are the many performance improvements:
The new version brings per-segment searches and caches, near real-time
search. Identification and sorting of hits will be faster due to
optimizations in the collector- and scorer APIs. First analysis
http://www.lucidimagination.com/blog/2009/09/22/contrived-fieldcache-load-test-lucene-2-4-vs-lucene-2-9/
look very good.

Most notably the refactoring towards per-segment search can yield
performance improvements for a vast number of users. Per-Segment
search takes advantage of the fact that the majority of segments don't
change when an index is modified. If necessary at all segments are
only reloaded partially which reduces the time to re-open an index,
enables the reuse of internal cache structures and prevents a large
amount of objects from being garbage collected.

Beside runtime improvements and many changes of expert APIs, Lucene
2.9 introduces a new "TokenStream" API. The new API introduces
stronger typing and enables developers to place arbitrary attributes
on a "Token" level within the analysis process. One example use case
of this API is to annotate terms with type information that can be
reused at later analysis stages or during indexing. That way one can
mark terms that represent places, names or companies with so-called
named-entity-recognition software. The former "Token"-API has been
marked as "deprecated" and will be removed in Lucenes next major
release. Custom Analyzers and TokenStreams based on older releases
should be re-factored to use the new API which represents the major
downside of this API change.

Especially geo- and product searches benefit from an improved,
trie-based range query implementation: Searches for products within a
range of prizes instead of exact prizes will be substantially faster
after the upgrade. In addition highlighting of found query terms in
large documents has been optimized.

Beside the improvements on the Lucene-Core version 2.9 introduces
several new contrib modules. Lucene-spatial formerly known as "Local
Lucene" has been donated to the Apache Lucene Project earlier this
year. This new module brings basic support for localized and distance
search as well as filter and sort capabilities of search results based
on GeoHash and Cartesien Tiers. The lucene-remote module moved Lucenes
dependency on JAVA RMI from the core into a optional contrib module
which enables the new release to run on platforms with a reduced
standard library like Android on a "as is" basis. As a further
addition a new and more flexible Query-Parser was introduced that is
fully compatibile with the existing core Query-Parser syntax.
Lucene-queryparser provides extended programmatic control for
customization and cleanly separates the syntax from its internal
representation. After all a great improvement for users aiming to
build their own query syntax for Lucene.

The new Lucene version comes with new query types, enhanced payload
queries as well as new multi term queries that allow for wild cards.
The components responsible for parsing indexed documents have been
extended to support for persian and arabic languages. Support for
Chinese document processing has been improved as well.

But the release does not only bring lots of new features and
performance improvements. In some cases the API was changed. So
instead of doing a drop in replacement of the old version it is
recommended to carefully read the release notes and changes and
recompile the code that uses Lucene to identify any incompatibilities
early on.

Version 2.9 is the last release before 3.0 that is expected in the
near future. Version 3.0 will remove any code currently marked as
deprecated. In addition this new version will be the first to have JVM
1.5 as a requirement instead of 1.4.
A detailed analysis of the changes in Lucene 2.9 is available online
as webinar at http://www.lucidimagination.com/blog/2009/09/22/lucene29_webinar/.
Early November Apache Con US http://us.apachecon.com/c/acus2009 has
lots of room for discussions with developers in trainings, two Lucene
conference tracks and several user meetups.

On Mon, Oct 5, 2009 at 5:11 PM, Simon Willnauer
<[email protected]> wrote:
>
> Hey Lucene Users,
>
> Heise.de 
> (http://www.heise.de/open/artikel/Such-Engine-Lucene-in-Version-2-9-erschienen-810377.html)
>  has just published an article about the new 2.9 release.
> Unfortunately they only published the german version while we tried to get 
> the english one too. Thanks to Isabel (http://blog.isabel-drost.de/) again 
> for all the work.
>
> simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: German article about Lucene 2.9

Reply via email to