Re: [VOTE] Release PyLucene 8.8.1

2021-03-08 Thread Michael McCandless
+1 I ran my usual smoke test: install JCC, PyLucene, then index and optimize the first 100K documents from a Wikipedia English snapshot, and run a couple queries. Sorry for being late to the party too! Mike McCandless http://blog.mikemccandless.com On Mon, Mar 1, 2021 at 9:35 PM Andi Vajda

Re: [VOTE] Release PyLucene 8.6.1

2020-08-25 Thread Michael McCandless
+1 to release. I ran my usual smoke test to index, forceMerge and search the first 100K documents from English Wikipedia export, on Arch Linux, Java 1.11.06, Python 3.8.1 -- test ran fine! Thanks Andi. Mike McCandless http://blog.mikemccandless.com On Mon, Aug 24, 2020 at 7:56 PM Andi Vajda

Re: Memory usage

2019-11-07 Thread Michael McCandless
Hi Siddharth, Your understanding of MMapDirectory is correct -- only give your JVM enough heap to not spend too much CPU on GC, and then let the OS use all available remaining RAM to cache hot pages from your index. There are some structures Lucene loads into JVM heap, but even those are being

Re: [VOTE] Release PyLucene 7.6.0 (rc1)

2019-01-07 Thread Michael McCandless
+1 to release! I ran my usual simple test indexing the first 100K docs from an old wikipedia export, force merging, and running a few searches. Thank you for continuing to release PyLucene Andi! Mike McCandless http://blog.mikemccandless.com On Fri, Jan 4, 2019 at 4:59 PM Andi Vajda wrote:

Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

2017-03-29 Thread Michael McCandless
+1 to release. I tested on Ubuntu 16.04 with Python 3.5.2 and Java 1.8.0_121. I ran my usual smoke test of indexing first 100K docs from Wikipedia English export and running a few searches. But first I had to run 2to3 on this ancient script! I had to apply Ruediger's patch to JCC's setup.py

Re: [VOTE] Release PyLucene 6.4.1 (rc1)

2017-02-12 Thread Michael McCandless
+1 to release. I ran my usual smoke test: indexing first 100K docs from English Wikipedia export, optimizing, running a couple searches, on Ubuntu 16.04, Java 1.8.0_101, Python 2.7.12. Mike McCandless http://blog.mikemccandless.com On Sun, Feb 12, 2017 at 5:25 AM, Michael McCandless <

Re: [VOTE] Release PyLucene 6.4.1 (rc1)

2017-02-12 Thread Michael McCandless
Sorry, I will have a look! Mike McCandless http://blog.mikemccandless.com On Sat, Feb 11, 2017 at 5:23 PM, Andi Vajda wrote: > > Ping ? > Two more PMC votes are needed before this release can happen. > Thanks ! > > Andi.. > >> On Feb 6, 2017, at 13:38, Andi Vajda

Re: Doing Range/NUmber queries

2016-08-09 Thread Michael McCandless
No, you must replace the entire document: the old one is removed, and the new one is indexed in its place. The one exception to this is update-able document value (e.g. see IW.updateNumericDocValue). Mike McCandless http://blog.mikemccandless.com On Tue, Aug 9, 2016 at 2:49 PM, lukes

Re: Doing Range/NUmber queries

2016-08-09 Thread Michael McCandless
For 1), you need to copy it yourself, i.e. add another Field to the Lucene Document you are about to index, with the same (string, numeric, etc.) value from the first field. For 2), it's best to use points (IntPoint, etc.) for range filtering. For 3), to search a boolean value, just map your

[ANNOUNCE] Apache Lucene 5.5.0 released

2016-02-23 Thread Michael McCandless
23 February 2016, Apache Lucene™ 5.5.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 5.5.0, expected to be the last 5.x feature release before Lucene 6.0.0. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It

[ANNOUNCE] Apache Solr 5.5.0 released

2016-02-23 Thread Michael McCandless
23 February 2016, Apache Solr™ 5.5.0 available Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document

[ANNOUNCE] Apache Lucene 4.10.4 released

2015-03-05 Thread Michael McCandless
March 2015, Apache Lucene™ 4.10.4 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.4 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

[ANNOUNCE] Apache Solr 4.10.4 released

2015-03-05 Thread Michael McCandless
October 2014, Apache Solr™ 4.10.4 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: [ANNOUNCE] Apache Lucene 4.10.4 released

2015-03-05 Thread Michael McCandless
Correction: the download link for Lucene 4.10.4 is: http://www.apache.org/dyn/closer.cgi/lucene/java/4.10.4 Mike McCandless http://blog.mikemccandless.com On Thu, Mar 5, 2015 at 10:26 AM, Michael McCandless luc...@mikemccandless.com wrote: March 2015, Apache Lucene™ 4.10.4 available

Re: How can I make better project than Lucene?

2014-11-18 Thread Michael McCandless
On Tue, Nov 18, 2014 at 1:16 PM, Marvin Humphrey mar...@rectangular.com wrote: On Sat, Nov 15, 2014 at 3:22 AM, Michael McCandless luc...@mikemccandless.com wrote: The analysis chain (attributes) is overly complex. If you were to start from scratch, what would the analysis chain look like

Re: How can I make better project than Lucene?

2014-11-15 Thread Michael McCandless
Actually I think competing projects is very healthy for open source development. There are many things you could explore to contrast with Lucene, e.g. write your new search engine in Go not Java: Java has many problems, maybe Go fixes them. Go also has a low-latency garbage collector in

Re: How can I make better project than Lucene?

2014-11-15 Thread Michael McCandless
; which implies an existing license whose terms he is willing to break. Not a good first step.;-) will -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Saturday, November 15, 2014 6:22 AM To: general@lucene.apache.org Subject: Re: How can I make

Re: How can I make better project than Lucene?

2014-11-15 Thread Michael McCandless
Yes it does. Mike McCandless http://blog.mikemccandless.com On Sat, Nov 15, 2014 at 8:53 AM, Will Martin wmartin...@gmail.com wrote: Um, doesn't the Apache license require inclusion of the license? Just sayin' -Original Message- From: Michael McCandless [mailto:luc

[ANNOUNCE] Apache Lucene 4.10.2 released

2014-10-31 Thread Michael McCandless
October 2014, Apache Lucene™ 4.10.2 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.2 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

[ANNOUNCE] Apache Solr 4.10.2 released

2014-10-31 Thread Michael McCandless
October 2014, Apache Solr™ 4.10.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: [VOTE] Release PyLucene 4.10.1-1

2014-10-03 Thread Michael McCandless
+1 to release I ran my usual smoke test, indexing, optimizing searching first 100 K Wikipedia English docs... Mike McCandless http://blog.mikemccandless.com On Wed, Oct 1, 2014 at 7:13 PM, Andi Vajda va...@apache.org wrote: The PyLucene 4.10.1-1 release tracking the recent release of

[ANNOUNCE] Apache Lucene 4.10.1 released

2014-09-29 Thread Michael McCandless
September 2014, Apache Lucene™ 4.10.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.10.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

[ANNOUNCE] Apache Solr 4.10.1 released

2014-09-29 Thread Michael McCandless
September 2014, Apache Solr™ 4.10.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.10.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

[ANNOUNCE] Apache Lucene 4.9.1 released

2014-09-22 Thread Michael McCandless
September 2014, Apache Lucene™ 4.9.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.9.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

[ANNOUNCE] Apache Solr 4.9.1 released

2014-09-22 Thread Michael McCandless
September 2014, Apache Solr™ 4.9.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.9.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: [VOTE] Release PyLucene 4.9.0-0

2014-07-14 Thread Michael McCandless
+1 I ran my usual smoke test: index first 100K docs from Wikipedia (en), do a few searches, run forceMerge. Mike McCandless http://blog.mikemccandless.com On Mon, Jul 7, 2014 at 11:14 AM, Andi Vajda va...@apache.org wrote: The PyLucene 4.9.0-0 release tracking the recent release of Apache

Re: Near real time reader using ControlledRealTimeReopenThread

2014-06-25 Thread Michael McCandless
Don't call IndexWriter.commit with each added document. Call it only when you need to ensure durability (all index changes are written to stable storage). You spawn CRTRT, passing it your SearcherManager and IndexWriter, and it periodically reopens for you, with methods to wait for a specific

Re: [VOTE] Release PyLucene 4.8.0-1

2014-05-03 Thread Michael McCandless
+1 to release. I ran my usual smoke test: index first 100K Wikipedia docs, forceMerge, run a few searches. Mike McCandless http://blog.mikemccandless.com On Wed, Apr 30, 2014 at 5:07 PM, Andi Vajda va...@apache.org wrote: The PyLucene 4.8.0-1 release tracking the recent release of Apache

Re: [VOTE] Release PyLucene 4.6.1-0

2014-02-07 Thread Michael McCandless
Hmm I see many ._* files in the .tar.gz, e.g.: mike@vine:~/src/pylucene-4.6.1-0/jcc$ tar tzf pylucene-4.6.1-0-src.tar.gz | head ./._pylucene-4.6.1-0 pylucene-4.6.1-0/ pylucene-4.6.1-0/._CHANGES pylucene-4.6.1-0/CHANGES pylucene-4.6.1-0/._CREDITS pylucene-4.6.1-0/CREDITS

Re: [nag] [VOTE] Release PyLucene 4.5.0-2

2013-10-18 Thread Michael McCandless
+1 to wait for 4.5.1 instead? Mike McCandless http://blog.mikemccandless.com On Thu, Oct 17, 2013 at 10:43 PM, Andi Vajda va...@apache.org wrote: One more PMC vote is needed to finalize this release. Then, we could wait for Lucene 4.5.1 to happen instead ? Andi.. -- Forwarded

Re: [VOTE] Release PyLucene 4.3.1-1

2013-06-30 Thread Michael McCandless
Hmm I see two test failures, on Linux, Python 2.7.3, Java 1.7.0_07 : ERROR: testCachingWorks (__main__.CachingWrapperFilterTestCase) -- Traceback (most recent call last): File test/test_CachingWrapperFilter.py, line 53, in

Re: Re[2]: minFuzzyLength in FuzzySuggester behaves differently for English and Russian

2013-06-05 Thread Michael McCandless
On Wed, Jun 5, 2013 at 2:51 AM, Artem Lukanin ice...@mail.ru wrote: OK, I will try to do it myself. Thank you! As I understand I have to clone lucene_solr_4_3 from https://github.com/apache/lucene-solr.git and upload a patch to the issue for review? I'm not a git user, but that sounds

Re: IndexWriter.commit() performance

2013-06-05 Thread Michael McCandless
On Tue, Jun 4, 2013 at 7:31 PM, Renata Vaccaro ren...@emailtopia.com wrote: Thanks. I need the documents to be searchable as soon as they are added. I also need the documents added to survive a machine crash. Soft commits and NRT gets might work, but from what I've read they are only

Re: minFuzzyLength in FuzzySuggester behaves differently for English and Russian

2013-06-03 Thread Michael McCandless
This unfortunately is a limitation of the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the

Re: minFuzzyLength in FuzzySuggester behaves differently for English and Russian

2013-06-03 Thread Michael McCandless
Thanks Artem. If you have time/energy to work out a patch that would be great :) Mike McCandless http://blog.mikemccandless.com On Mon, Jun 3, 2013 at 7:17 AM, Artem Lukanin ice...@mail.ru wrote: I have opened an issue: https://issues.apache.org/jira/browse/LUCENE-5030 -- View this

Re: How to convert TermDocs and TermEnum ??

2013-05-24 Thread Michael McCandless
Hi, Have a look at MIGRATE.txt? Mike McCandless http://blog.mikemccandless.com On Mon, May 20, 2013 at 10:54 AM, A. Lotfi majidna...@yahoo.com wrote: Hi, I found some difficulties converting from old API to the newest one : import org.apache.lucene.index.TermDocs; // does not exist

Re: [VOTE] Release PyLucene 4.3.0-1

2013-05-08 Thread Michael McCandless
+1 to release! Exciting to finally have a PyLucene 4.x :) I ran my usual smoke test (index first 100K Wikipedia docs and run a couple searches) and it looks great! Only strangeness was ... I set JDK['linux2'] to my install location (Oracle JDK), and normally this works fine, but this time

Re: [VOTE] Release PyLucene 4.2.1

2013-04-15 Thread Michael McCandless
I'm having trouble on an Ubuntu 12.10 box, using Java 1.7_07 and Python 2.7.3. I was able to build and install both JCC and PyLucene, apparently successfully. I can import lucene in Python and print lucene.VERSION and confirm it's 4.2.1. lucene.initVM(lucene.CLASSPATH) succeeds. Yet, there are

Re: Welcome Tommaso Teofili to the PMC

2013-03-17 Thread Michael McCandless
Welcome Tommaso! Mike McCandless http://blog.mikemccandless.com On Sun, Mar 17, 2013 at 11:04 AM, Steve Rowe sar...@gmail.com wrote: I'm pleased to announce that Tommaso Teofili has accepted the PMC's invitation to join. Welcome Tommaso! - Steve

Re: different result for 'OR'

2013-01-21 Thread Michael McCandless
That is odd. Can you print the Query.toString of the actual two queries you are running? (I think the OR must be capitalized to be parsed by the classic QueryParser?). Mike McCandless http://blog.mikemccandless.com On Mon, Jan 21, 2013 at 7:34 AM, Jeroen Venderbosch j...@woodwing.com wrote:

Re: Is there a way to clear lucene's cache?

2013-01-07 Thread Michael McCandless
Lucene itself doesn't do any caching. Maybe you are thinking of Solr? The OS also does caching, so if you want a cold test you'll have to tell the OS to flush its IO cache in between tests. EG on Linux do sudo echo 3 /proc/sys/vm/drop_caches. Mike McCandless http://blog.mikemccandless.com

Re: Welcome Sami Siren to the PMC

2012-12-12 Thread Michael McCandless
Welcome Sami! Mike McCandless http://blog.mikemccandless.com On Wed, Dec 12, 2012 at 3:17 PM, Mark Miller markrmil...@gmail.com wrote: I'm please to announce that Sami Siren has accepted the PMC's invitation to join. Welcome Sami! - Mark

Re: Welcome Alan Woodward as Lucene/Solr committer

2012-10-17 Thread Michael McCandless
Welcome aboard Alan! Happy Coding, Mike McCandless http://blog.mikemccandless.com On Wed, Oct 17, 2012 at 1:36 AM, Robert Muir rcm...@gmail.com wrote: I'm pleased to announce that the Lucene PMC has voted Alan as a Lucene/Solr committer. Alan has been contributing patches on various tricky

Re: Can Lucene be used where each entity to be ranked is a set of documents?

2012-08-22 Thread Michael McCandless
On Wed, Aug 22, 2012 at 10:36 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Aug 21, 2012 at 7:42 AM, shashank shashank91.b...@gmail.com wrote: Hello, I am working on a project wherein each entity to be ranked is not a single document but infact a group of documents. So, the ranking not

Re: Is query-time Join actually in Lucene 3.6?

2012-08-07 Thread Michael McCandless
Query-time join lives under Lucene's contrib/join in 3.6: http://lucene.apache.org/core/3_6_1/lucene-contrib/index.html#join Mike McCandless http://blog.mikemccandless.com On Tue, Aug 7, 2012 at 11:41 AM, Homer Nabble homernab...@gmail.com wrote: This page states New query-time joining is more

Re: Is it possible to Lucene with a database managed by external application ?

2012-05-29 Thread Michael McCandless
That should be fine. You just have to separately pull the added/updated rows from the DB and index them into your Lucene index. Mike McCandless http://blog.mikemccandless.com On Tue, May 29, 2012 at 3:09 AM, Ievgen Krapyva ykrap...@gmail.com wrote: Hi everybody, I've just started reading

Re: How to construct the term frequency vector of all words in dictionary?

2012-05-15 Thread Michael McCandless
You can get a TermEnum (IndexReader.terms()) and then keep calling .next() to advance to the next term, and then .docFreq() to get the document frequency (how many documents have the term) for that term... Mike McCandless http://blog.mikemccandless.com On Tue, May 15, 2012 at 1:24 PM, Aoi

Re: Lucene index directory on disk: (i) do I need to keep it and (ii) how do I handle encryption?

2012-04-24 Thread Michael McCandless
FSDirectory won't load the index into RAM. But RAMDirectory can: eg, you can init a RAMDirectory, passing your FSDir to its ctor, to copy all files into RAM. Then you can delete the FSDir, but realize this means once your app shuts down you've lost the index. I think you can handle your

Re: Welcome Jan Høydahl to the PMC

2012-02-13 Thread Michael McCandless
Welcome Jan! Mike McCandless http://blog.mikemccandless.com On Mon, Feb 13, 2012 at 9:50 AM, Robert Muir rcm...@gmail.com wrote: Hello, I'm pleased to announce that Jan has accepted the PMC's invitation to join. Congratulations Jan! -- lucidimagination.com

[ANNOUNCE] Apache Lucene 3.4.0 released

2011-09-14 Thread Michael McCandless
September 14 2011, Apache Lucene™ 3.4.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.4.0. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

[ANNOUNCE] Apache Solr 3.4.0 released

2011-09-14 Thread Michael McCandless
September 14 2011, Apache Solr™ 3.4.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.4.0. Apache Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit

Re: Caused by: java.io.IOException: read past EOF

2011-09-09 Thread Michael McCandless
Can you post the traceback/exception? Are you overriding the default LockFactory for your Directory? Mike McCandless http://blog.mikemccandless.com On Fri, Sep 9, 2011 at 6:07 AM, Java_dev abde...@hotmail.com wrote: Hi Michael, Thx for taking time to help me out. We are using Lucene to

Re: [VOTE] Release PyLucene 3.3 (rc3)

2011-07-21 Thread Michael McCandless
+1 to release! Smoke test passed and I see grouping module classes are visible by default! Thanks Andi :) Mike McCandless http://blog.mikemccandless.com On Thu, Jul 21, 2011 at 12:47 PM, Andi Vajda va...@apache.org wrote: A problem was found with rc2. Please, vote on rc3, thanks :-) The

Re: [VOTE] Release PyLucene 3.3.0

2011-07-03 Thread Michael McCandless
Everything looks good -- I was able to compile, run all tests successfully, and run my usual smoke test (indexing optimizing searching on first 100K wikipedia docs), but... I then tried to enable the grouping module (lucene/contrib/grouping), by adding a GROUPING_JAR matching all the other

Re: [VOTE] Release PyLucene 3.2.0

2011-06-07 Thread Michael McCandless
+1 I built on OS X 10.6.6, passed all tests (I think? No overall summary in the end, but I didn't see any obvious problem), and ran my usual smoke test indexing first 100K docs from a line file from Wikipedia, and running a few searches. Mike McCandless http://blog.mikemccandless.com On Mon,

Re: Lucene: is it possible to search with an error in one letter?

2011-05-30 Thread Michael McCandless
If you want to allow for any single character change, you can use FuzzyQuery. EG, pencil~1 allows for 1 character change, pencil~2 allows for 2. Note that FuzzyQuery is very costly in 3.x, but is substantially (eg factor of 100 times) faster in trunk / 4.0. Mike http://blog.mikemccandless.com

Re: Welcome Chris Male Andi Vajda as full Solr / Lucene Committers

2011-05-23 Thread Michael McCandless
Welcome! Mike http://blog.mikemccandless.com On Mon, May 23, 2011 at 12:39 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hi folks, I am happy to announce that the Lucene PMC has accepted Chris Male and Andi Vajda as Lucene/Solr committers. Congratulations Welcome on board,

Re: Special Board Report for May 2011

2011-05-07 Thread Michael McCandless
just know others are dying to file board reports on a quarterly basis! More inline below... On May 5, 2011, at 8:27 AM, Michael McCandless wrote: On Wed, May 4, 2011 at 6:40 PM, Grant Ingersoll gsing...@apache.org wrote: 2. I think we need to prioritize getting patch

Re: Special Board Report for May 2011

2011-05-05 Thread Michael McCandless
On Wed, May 4, 2011 at 6:40 PM, Grant Ingersoll gsing...@apache.org wrote: At our core, this means we are supporting a set of libraries that can be used for search and related capabilities across a lot of different applications ranging in size and shape, as well as a server that makes those

Re: Special Board Report for May 2011

2011-05-05 Thread Michael McCandless
On Wed, May 4, 2011 at 7:26 PM, Ted Dunning ted.dunn...@gmail.com wrote: The amazing thing to me is that Lucene of all projects is having problems like this.  Lucene has always been my primary example of Open Source Done Right. I think with passion comes blowups. I think it's natural, and,

Re: IndexFiles cmd runs, even when IndexFiles.java is deleted

2011-05-02 Thread Michael McCandless
Likely the .class file is still present? Javac compiles .java files into .class files, and then java executes from .class files. Mike http://blog.mikemccandless.com On Mon, May 2, 2011 at 8:13 AM, daniel daniel_pfis...@msn.com wrote: I'm new to Lucene and Java, I'm trying to modify the

Re: [VOTE] Create Solr TLP - bigger picture

2011-04-27 Thread Michael McCandless
, Apr 27, 2011 at 9:21 AM, Shane Curcuru a...@shanecurcuru.org wrote: Michael McCandless luc...@mikemccandless.com wrote: ...snip... While I agree, out of context, Robert's use of a veto/revert wars is inappropriate, and is not how things should be done in a healthy Apache project Lucene

Re: Welcome Dawid Weiss and Stanislaw Osinski as Lucene/Solr committers

2011-02-08 Thread Michael McCandless
Welcome Dawid and Stanislaw! Mike On Tue, Feb 8, 2011 at 1:13 PM, Robert Muir rcm...@gmail.com wrote: I'm pleased to announce that the PMC has voted in Dawid Weiss and Stanislaw Osinski as Lucene/Solr committers! Welcome!

Re: [VOTE] Release PyLucene 2.9.4-1 and 3.0.3-1

2010-12-05 Thread Michael McCandless
+1 to both. I installed both on Linux (Fedora 13) and ran my test python script that indexes first 100K line docs from wikipedia and runs a few searches. No problems! Mike On Sun, Dec 5, 2010 at 1:50 AM, Andi Vajda va...@apache.org wrote: With the recent releases of Lucene Java 2.9.4 and

Re: [VOTE] Release of Apache Lucene 3.0.3 and 2.9.4 artifacts (take 2)

2010-12-01 Thread Michael McCandless
On Wed, Dec 1, 2010 at 3:38 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, Thanks to the PMC for voting on the Lucene 3.0.3 and 2.9.4 artifacts. The vote has passed with 3 positive votes: - Robert Muir - Andi Vajda - Uwe Schindler Excellent! Thanks everyone :) I will start to publish the

Re: PMC Additions

2010-11-28 Thread Michael McCandless
Welcome Simon and Koji! Mike On Sun, Nov 28, 2010 at 7:30 AM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce the addition of Simon Willnauer and Koji Sekiguchi to the Lucene PMC.  Both Simon and Koji have been long time contributors/committers to both Lucene and Solr.

Re: [VOTE] Rename Lucene Java to be Lucene Core

2010-11-10 Thread Michael McCandless
+1 Mike On Tue, Nov 9, 2010 at 3:57 PM, Grant Ingersoll gsing...@apache.org wrote: Per the discuss thread and the fact that Java is TM Oracle, I would like us to change Lucene Java to now be referred to as Lucene Core.  The primary change is on the website where the Java tab will now be the

Re: [DISCUSS] Lucene Java - Lucene Core

2010-11-08 Thread Michael McCandless
+1 Seems prudent given the current Java climate. Mike On Mon, Nov 8, 2010 at 10:57 AM, Grant Ingersoll gsing...@apache.org wrote: Hi Luceneers, esp. PMC and Committers, I'm in the process of reviewing our branding per the Trademarks committee sending out requirements.   So, expect to see

Re: Welcome Steven Rowe as Lucene/Solr committer!

2010-09-22 Thread Michael McCandless
Welcome Steven!! Mike On Wed, Sep 22, 2010 at 9:19 AM, Robert Muir rcm...@gmail.com wrote: I'm pleased to announce that the PMC has accepted Steven Rowe as Lucene/Solr committer! Welcome Steven! -- Robert Muir rcm...@gmail.com

Re: Welcome Robert Muir to the Lucene PMC

2010-07-07 Thread Michael McCandless
Congrats! Mike On Wed, Jul 7, 2010 at 2:12 PM, Grant Ingersoll gsing...@apache.org wrote: In recognition of Robert's continuing contributions to Lucene and Solr, I'm happy to announce Robert has accepted our invitation to join the Lucene PMC. Cheers, Grant Ingersoll Lucene PMC Chair

Re: [VOTE] [Take 2] Release PyLucene 2.9.3-1 and 3.0.2-1

2010-06-30 Thread Michael McCandless
+1 Mike On Tue, Jun 29, 2010 at 7:47 AM, Andi Vajda va...@apache.org wrote: The first vote started on June 18th received two PMC votes and one user vote. A couple of bugs got fixed in the meantime so I'd like to call for another vote hoping for three PMC votes to make this release

Re: [PMC] [DISCUSS] Lucy

2010-06-13 Thread Michael McCandless
Technically, it's clear that Lucy is taking an innovative and well-thought-out approach, building a search engine that folds in what's been learned from all the painful experiences of those before it. Marvin gets to chuckle whenever we have one of our massive back compat discussions... When it

Re: [VOTE] #2 Apache Lucene Java 2.9.3 and 3.0.2 artifacts to be released

2010-06-12 Thread Michael McCandless
On Fri, Jun 11, 2010 at 11:58 AM, Uwe Schindler u...@thetaphi.de wrote: Hi all, It is not yet quite clear if we should release take2 or take1 of the artifacts. Both are on my people account, please vote: [1] Release

Re: [VOTE] Apache Lucene Java 2.9.3 and 3.0.2 artifacts to be released

2010-06-11 Thread Michael McCandless
I would argue my 3 cases were borderline bugs -- they weren't just pure perf improvements. 2135 acts like a mem leak, in that we retain [often very large] memory for longer than we should. 2161 is nasty choke point in NRT (getting a new NRT reader syncs the old one thus blocking any searches,

Re: [VOTE] Apache Lucene Java 2.9.3 and 3.0.2 artifacts to be released

2010-06-08 Thread Michael McCandless
This looks like something new to me (doesn't ring a bell). It looks odd -- the assertion that's tripping would seem to indicate that a file that we are copying into a CFS file (after flushing) is still changing while we are copying, which is not good. All files should be closed before we build

Re: [VOTE] Apache Lucene Java 2.9.3 and 3.0.2 artifacts to be released

2010-06-08 Thread Michael McCandless
: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Tuesday, June 08, 2010 2:36 PM To: general@lucene.apache.org Subject: Re: [VOTE] Apache Lucene Java 2.9.3 and 3.0.2 artifacts to be released This looks like something new to me (doesn't ring a bell). It looks odd -- the assertion

Re: [VOTE] Apache Lucene Java 2.9.3 and 3.0.2 artifacts to be released

2010-06-07 Thread Michael McCandless
+1 to release. ant test passes for both -src.tar.gz downloads, and .asc's check out, and Lucene in Action 2nd Edition's tests all pass w/ 3.0.2 dropped in. Mike On Mon, Jun 7, 2010 at 4:32 PM, Andi Vajda va...@apache.org wrote: On Mon, 7 Jun 2010, Uwe Schindler wrote: I have posted a

Re: Welcome Uwe Schindler to the Lucene PMC

2010-04-01 Thread Michael McCandless
Welcome Uwe!! Mike On Thu, Apr 1, 2010 at 7:05 AM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce that the Lucene PMC has voted to add Uwe Schindler to the PMC.  Uwe has been doing a lot of work in Lucene and Solr, including several of the last releases in Lucene.

Re: java.io.IOException: read past EOF

2010-03-24 Thread Michael McCandless
Your index is in serious trouble -- you have 2 segments_N files, both of which are 0 length. This won't be easy to recover (CheckIndex won't be able to). Any idea how this happened? Was this index created using 2.4.x? Mike On Tue, Mar 23, 2010 at 5:36 PM, Jean-Michel RAMSEYER

Re: java.io.IOException: read past EOF

2010-03-24 Thread Michael McCandless
%20Naming) makes it seem that the cfs files could be used to recover most of the information from the index.  Is that not so? On Tue, Mar 23, 2010 at 11:30 PM, Michael McCandless luc...@mikemccandless.com wrote: Your index is in serious trouble -- you have 2 segments_N files, both of which

Re: Less drastic ways

2010-03-15 Thread Michael McCandless
On Sun, Mar 14, 2010 at 4:29 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Even if we merge Lucene/Solr and we treat Solr as just another Lucene contrib/module, say, contributors who care only about Solr will still patch against Solr and Lucene developers or those people who have the

Re: Less drastic ways

2010-03-14 Thread Michael McCandless
Hm, again I'm confused. If this is how it worked in Solr/Lucene land, then there wouldn't be pieces in Solr that we now want to refactor and move into Lucene core or modules. A list of about 4-5 such pieces of functionality in Solr has already been listed. That's really my main question.

Re: [VOTE] merge lucene/solr development (take 3)

2010-03-09 Thread Michael McCandless
On Tue, Mar 9, 2010 at 5:10 AM, Andrzej Bialecki a...@getopt.org wrote: Re: Nutch components - those that are reusable in Lucene or Solr contexts eventually find their way to respective projects, witness e.g. CommonGrams. In fact I think this is a great example -- as far as I can tell,

Re: [VOTE] merge lucene/solr development (take 3)

2010-03-09 Thread Michael McCandless
, we should do both ;) Mike On Tue, Mar 9, 2010 at 8:49 AM, Grant Ingersoll gsing...@apache.org wrote: On Mar 9, 2010, at 8:21 AM, Michael McCandless wrote: On Tue, Mar 9, 2010 at 7:21 AM, Grant Ingersoll gsing...@apache.org wrote: If we had that freedom (poaching is perfectly fine

Re: Composing posts for both JIRA and email (was a JIRA post)

2010-03-05 Thread Michael McCandless
Great guidelines Marvin! I agree w/ most of this, except, I do use Jira's markup (bq., {quote}) when adding comments. I'm torn between how important the first read (via the email Jira sends) is vs the I click through to the issue read it). Typically I just click through to the issue unless

Re: [VOTE] merge lucene/solr development

2010-03-04 Thread Michael McCandless
On Thu, Mar 4, 2010 at 12:41 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Why don't we just start by attempting to have a common dev list and merging committers, in the hopes that it will promote better communication about features up and down the stack, and better bug

[VOTE] Merge the development of Solr/Lucene (take 2)

2010-03-04 Thread Michael McCandless
A new vote, that slightly changes proposal from last vote (adding only that Lucene can cut a release even if Solr doesn't): * Merging the dev lists into a single list. * Merging committers. * When any change is committed (to a module that belongs to Solr or to Lucene), all tests must

Re: [VOTE] Merge the development of Solr/Lucene (take 2)

2010-03-04 Thread Michael McCandless
I forgot my vote: +1 Mike On Thu, Mar 4, 2010 at 4:33 PM, Michael McCandless luc...@mikemccandless.com wrote: A new vote, that slightly changes proposal from last vote (adding only that Lucene can cut a release even if Solr doesn't):  * Merging the dev lists into a single list.  * Merging

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
If we don't somehow first address the code duplication across the 2 projects, making Solr a TLP will make things worse. I started here with analysis because I think that's the biggest pain point: it seemed like an obvious first step to fixing the code duplication and thus the most likely to reach

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
On Mon, Mar 1, 2010 at 12:58 PM, Marvin Humphrey mar...@rectangular.com wrote: On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote: But it goes beyond analyzers: I'd like to see other modules, now in Solr, eventually moved to Lucene, because they really are core functionality

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
being a TLP affect the creation of a separate project/module for Analyzers any more so than it not being a TLP? Both Lucene-java and Solr (as a TLP) could depend on the newly created refactored Analysis project. Chris On 3/1/10 10:44 AM, Michael McCandless luc...@mikemccandless.com wrote

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
independent of synchronizing our development. Mike On Mon, Mar 1, 2010 at 1:03 PM, Michael McCandless luc...@mikemccandless.com wrote: On Mon, Mar 1, 2010 at 12:58 PM, Marvin Humphrey mar...@rectangular.com wrote: On Mon, Mar 01, 2010 at 12:44:02PM -0500, Michael McCandless wrote

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
/contributions welcome. Cheers, Chris On 3/1/10 11:25 AM, Michael McCandless luc...@mikemccandless.com wrote: Because the code dup with analyzers is only one of the problems to solve.  In fact, it's the easiest of the problems to solve (that's why I proposed it, only, first). A more differentiating

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-03-01 Thread Michael McCandless
(including me). How can we achieve these goals without making releases more difficult?  Michael On 3/1/10 9:44 AM, Michael McCandless wrote: If we don't somehow first address the code duplication across the 2 projects, making Solr a TLP will make things worse. I started here with analysis

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-02-28 Thread Michael McCandless
To make this more concrete, I think this is roughly what's being proposed: * Merging the dev lists into a single list. * Merging committers. * When a change it committed to Lucene, it must pass all Solr tests. * Release both at once. These things would not change: * Most

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

2010-02-26 Thread Michael McCandless
I think this is a good idea! LuSolr ;) (kidding) I agree with all of your points Yonik. What do other people think...? Mike On Wed, Feb 24, 2010 at 2:20 PM, Yonik Seeley yo...@apache.org wrote: I've started to think that a merge of Solr and Lucene would be in the best interest of both

Re: Stale NFS file handle Exception

2010-01-14 Thread Michael McCandless
This is a known limitation of Lucene over NFS. It's because NFS makes no effort to protect open files from deletion. Other filesystems prevent (or delay) deletion of still open files, eg on Unix the delete on last close semantics is used, on Windows the file cannot be deleted until no process

Re: Lucene PMC += Mark Miller

2010-01-14 Thread Michael McCandless
Welcome! Mike On Thu, Jan 14, 2010 at 10:37 AM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce the Lucene PMC has elected to add Mark Miller to its ranks in recognition of his longstanding contributions to the Lucene community as a committer on both Lucene Java and Solr.

Re: [spatial] Cartesian Tiers nomenclature

2009-12-30 Thread Michael McCandless
Right, NRQ is able to translate any requested range into the union (OR) of brackets (from the trie) created during indexing. Can spatial do the same thing, just with 2D instead of 1D? Ie, reconstruct any expressible shape (created at query time) as the union of some number of grids/tiers, at

Re: [spatial] Cartesian Tiers nomenclature

2009-12-29 Thread Michael McCandless
It's great that there's such a sudden burst of energy to improve spatial in both Solr and Lucene! Isn't this concept the same as trie (for Lucene's numeric fields), but in 2D not 1D? If so, I think tiles doesn't convey that they recursively subdivide. Also: why does this notion even need naming

  1   2   >