The moment has come :) Fantastic! Congratulations to everyone and thanks to all of you who committed your time to make the release happen.
Dawid On Fri, Oct 12, 2012 at 10:10 AM, Robert Muir <rm...@apache.org> wrote: > October 12 2012, Apache Luceneā 4.0 available. > The Lucene PMC is pleased to announce the release of Apache Lucene 4.0 > > Apache Lucene is a high-performance, full-featured text search engine > library written entirely in Java. It is a technology suitable for > nearly any application that requires full-text search, especially > cross-platform. > > This release contains numerous bug fixes, optimizations, and > improvements, some of which are highlighted below. The release is > available for immediate download at: > http://lucene.apache.org/core/mirrors-core-latest-redir.html > > See the CHANGES.txt file included with the release for a full list of details. > > Lucene 4.0 Release Highlights: > > * The index formats for terms, postings lists, stored fields, term > vectors, etc are pluggable via the Codec api. You can select from the > provided implementations or customize the index format with your own > Codec to meet your needs. > > * Similarity has been decoupled from the vector space model (TF/IDF). > Additional models such as BM25, Divergence from Randomness, Language > Models, and Information-based models are provided (see > http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4). > > * The new doc values feature stores typed values per-document. It > can be used for custom scoring factors (accessible via Similarity), > for pre-sorted Sort values, and more. > > * IndexWriter now flushes segments to disk concurrently, when the > application uses multiple threads for indexing, resulting in > substantial performance improvements (see > http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html). > > * Per-document normalization factors ("norms") are no longer limited > to a single byte. Similarity implementations can use any DocValues > type to store norms. > > * New index statistics have been added, including the number of > tokens for a term or field, number of postings for a field, and number > of documents with a posting for a field. These support additional > scoring models (see > http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-40.html). > > * A new default term dictionary/index (BlockTree) indexes shared > prefixes instead of every n'th term. This is not only more time- and > space- efficient, but can avoid going to disk at all for terms that do > not exist in certain cases. Alternative term dictionary implementions > are provided and pluggable via the Codec api. > > * Indexed terms are no longer limited to UTF-16 char sequences; they > can now be any binary value encoded as byte arrays. By default, text > terms are encoded as UTF-8 bytes. Sort order of terms is defined by > their binary value, which is identical to UTF-8 (Unicode code point) > sort order. > > * Substantially faster performance when using a Filter during searching. > > * File-system based directories can rate-limit the IO (MB/sec) of > merge threads, to reduce IO contention between merging and searching > threads. > > * A number of alternative Codecs and components have been added: > "Appending" works with append-only filesystems (such as Hadoop DFS), > "Memory" writes the entire terms+postings as an FST read into RAM (see > http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-with.html), > "Pulsing" inlines the postings for low-frequency terms into the term > dictionary (see > http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-primary-key.html), > "SimpleText" writes all files in plain-text for easy > debugging/transparency (see > http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html), > "Bloom" uses a bloom filter to sometimes avoid disk seeks when looking > up terms, "Direct" holds all postings as simple byte[] and int[] for > very fast performance at the cost of very high RAM consumption, > "Block" use a new index layout and compression scheme for improved > performance, among others. > > * Term offsets can be optionally encoded into the postings lists and > retrieved per-position. > > * A new AutomatonQuery returns all documents containing any term > matching a provided finite-state automaton (see > http://www.slideshare.net/otisg/finite-state-queries-in-lucene). > > * FuzzyQuery is 100-200 times faster than in past releases (see > http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html). > > * A new spell checker, DirectSpellChecker, finds possible corrections > directly against the main search index without requiring a separate > index. > > * Various in-memory data structures such as the term dictionary and > FieldCache are represented more efficiently with less object overhead > (see > http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html). > > * All search logic is now required to work per segment, IndexReader > was therefore refactored to differentiate between atomic and composite > readers (see > http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html). > > * Lucene 4.0 provides a modular API, consolidating components such as > Analyzers and Queries that were previously scattered across Lucene > core, contrib, and Solr. These modules also include additional > functionality such as UIMA analyzer integration and a completely > reworked spatial search implementation. > > Noteworthy changes since 4.0-BETA: > > * A new "Block" PostingsFormat offering improved search performance > and index compression. This will likely become the default format in a > future release. (see > http://blog.mikemccandless.com/2012/08/lucenes-new-blockpostingsformat-thanks.html). > > * All non-default codec implementations were moved to a separated > codecs module. Just add lucene-codecs-4.0.0.jar to your classpath to > test these out. > > * Payloads can be optionally stored on the term vectors. > > * Many bugfixes and optimizations. > > Please read CHANGES.txt and MIGRATE.txt for a full list of new > features and notes on upgrading. Particularly, the new apis are not > compatible with previous versions of Lucene, however, file format > backwards compatibility is provided for indexes from the 3.0 series > and the 4.0-alpha and -beta releases. > > Please report any feedback to the mailing lists > (http://lucene.apache.org/core/discussion.html) > > Note: The Apache Software Foundation uses an extensive mirroring > network for distributing releases. It is possible that the mirror you > are using may not have replicated the release yet. If that is the > case, please try another mirror. This also goes for Maven access. > > Happy searching, > > Apache Lucene/Solr Developers > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org