Re: [ANNOUNCE] Apache Lucene 4.0 released.

Chris Male Fri, 12 Oct 2012 01:59:01 -0700

A great day.

On Fri, Oct 12, 2012 at 9:34 PM, Uwe Schindler <u...@thetaphi.de> wrote:


> Thanks Robert for doing the hard work of managing this release!
>

Absolutely.  Thanks Robert.


>
> I am happy that the release finally came out, after a long time of
> development, code refactoring, and lots of non-finite beer-automatons!
>
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -----Original Message-----
> > From: Robert Muir [mailto:rm...@apache.org]
> > Sent: Friday, October 12, 2012 10:10 AM
> > To: dev@lucene.apache.org; Lucene mailing list; java-user; announce
> > Subject: [ANNOUNCE] Apache Lucene 4.0 released.
> >
> > October 12 2012, Apache Lucene‚ 4.0 available.
> > The Lucene PMC is pleased to announce the release of Apache Lucene 4.0
> >
> > Apache Lucene is a high-performance, full-featured text search engine
> library
> > written entirely in Java. It is a technology suitable for nearly any
> application
> > that requires full-text search, especially cross-platform.
> >
> > This release contains numerous bug fixes, optimizations, and
> improvements,
> > some of which are highlighted below.  The release is available for
> immediate
> > download at:
> >    http://lucene.apache.org/core/mirrors-core-latest-redir.html
> >
> > See the CHANGES.txt file included with the release for a full list of
> details.
> >
> > Lucene 4.0 Release Highlights:
> >
> >  * The index formats for terms, postings lists, stored fields, term
> vectors, etc
> > are pluggable via the Codec api. You can select from the provided
> > implementations or customize the index format with your own Codec to meet
> > your needs.
> >
> >  * Similarity has been decoupled from the vector space model (TF/IDF).
> > Additional models such as BM25, Divergence from Randomness, Language
> > Models, and Information-based models are provided (see
> >
> http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-
> > 4).
> >
> >  * The new doc values feature stores typed values per-document.  It can
> be
> > used for custom scoring factors (accessible via Similarity), for
> pre-sorted Sort
> > values, and more.
> >
> >  * IndexWriter now flushes segments to disk concurrently, when the
> application
> > uses multiple threads for indexing, resulting in substantial performance
> > improvements (see http://blog.mikemccandless.com/2011/05/265-indexing-
> > speedup-with-lucenes.html).
> >
> >  * Per-document normalization factors ("norms") are no longer limited to
> a
> > single byte. Similarity implementations can use any DocValues type to
> store
> > norms.
> >
> >  * New index statistics have been added, including the number of tokens
> for a
> > term or field, number of postings for a field, and number of documents
> with a
> > posting for a field.  These support additional scoring models (see
> > http://blog.mikemccandless.com/2012/03/new-index-statistics-in-lucene-
> > 40.html).
> >
> >  * A new default term dictionary/index (BlockTree) indexes shared
> prefixes
> > instead of every n'th term. This is not only more time- and
> > space- efficient, but can avoid going to disk at all for terms that do
> not exist in
> > certain cases. Alternative term dictionary implementions are provided and
> > pluggable via the Codec api.
> >
> >  * Indexed terms are no longer limited to UTF-16 char sequences; they
> can now
> > be any binary value encoded as byte arrays. By default, text terms are
> encoded
> > as UTF-8 bytes. Sort order of terms is defined by their binary value,
> which is
> > identical to UTF-8 (Unicode code point) sort order.
> >
> >  * Substantially faster performance when using a Filter during searching.
> >
> >  * File-system based directories can rate-limit the IO (MB/sec) of merge
> > threads, to reduce IO contention between merging and searching threads.
> >
> >  * A number of alternative Codecs and components have been added:
> > "Appending" works with append-only filesystems (such as Hadoop DFS),
> > "Memory" writes the entire terms+postings as an FST read into RAM (see
> >
> http://blog.mikemccandless.com/2011/06/primary-key-lookups-are-28x-faster-
> > with.html),
> > "Pulsing" inlines the postings for low-frequency terms into the term
> dictionary
> > (see http://blog.mikemccandless.com/2010/06/lucenes-pulsingcodec-on-
> > primary-key.html),
> > "SimpleText" writes all files in plain-text for easy
> debugging/transparency (see
> > http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html),
> > "Bloom" uses a bloom filter to sometimes avoid disk seeks when looking up
> > terms, "Direct" holds all postings as simple byte[] and int[] for very
> fast
> > performance at the cost of very high RAM consumption, "Block" use a new
> > index layout and compression scheme for improved performance, among
> > others.
> >
> >  * Term offsets can be optionally encoded into the postings lists and
> retrieved
> > per-position.
> >
> >  * A new AutomatonQuery returns all documents containing any term
> matching
> > a provided finite-state automaton (see
> http://www.slideshare.net/otisg/finite-
> > state-queries-in-lucene).
> >
> >  * FuzzyQuery is 100-200 times faster than in past releases (see
> > http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-
> > faster.html).
> >
> >  * A new spell checker, DirectSpellChecker, finds possible corrections
> directly
> > against the main search index without requiring a separate index.
> >
> >  * Various in-memory data structures such as the term dictionary and
> > FieldCache are represented more efficiently with less object overhead
> (see
> > http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-
> > searching.html).
> >
> >  * All search logic is now required to work per segment, IndexReader was
> > therefore refactored to differentiate between atomic and composite
> readers
> > (see
> http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).
> >
> >  * Lucene 4.0 provides a modular API, consolidating components such as
> > Analyzers and Queries that were previously scattered across Lucene core,
> > contrib, and Solr. These modules also include additional functionality
> such as
> > UIMA analyzer integration and a completely reworked spatial search
> > implementation.
> >
> > Noteworthy changes since 4.0-BETA:
> >
> >  * A new "Block" PostingsFormat offering improved search performance and
> > index compression. This will likely become the default format in a future
> > release. (see http://blog.mikemccandless.com/2012/08/lucenes-new-
> > blockpostingsformat-thanks.html).
> >
> >  * All non-default codec implementations were moved to a separated codecs
> > module. Just add lucene-codecs-4.0.0.jar to your classpath to test these
> out.
> >
> >  * Payloads can be optionally stored on the term vectors.
> >
> >  * Many bugfixes and optimizations.
> >
> > Please read CHANGES.txt and MIGRATE.txt for a full list of new features
> and
> > notes on upgrading. Particularly, the new apis are not compatible with
> previous
> > versions of Lucene, however, file format backwards compatibility is
> provided
> > for indexes from the 3.0 series and the 4.0-alpha and -beta releases.
> >
> > Please report any feedback to the mailing lists
> > (http://lucene.apache.org/core/discussion.html)
> >
> > Note: The Apache Software Foundation uses an extensive mirroring network
> for
> > distributing releases.  It is possible that the mirror you are using may
> not have
> > replicated the release yet.  If that is the case, please try another
> mirror.  This
> > also goes for Maven access.
> >
> > Happy searching,
> >
> > Apache Lucene/Solr Developers
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Chris Male | Open Source Search Developer | elasticsearch |
www.e<http://www.dutchworks.nl>
lasticsearch.com

Re: [ANNOUNCE] Apache Lucene 4.0 released.

Reply via email to