[ANNOUNCE] Apache PyLucene 2.9.4 and 3.0.3

2010-12-16 Thread Andi Vajda


I am pleased to announce the availability of the Apache PyLucene 2.9.4 and 
3.0.3 releases.


Apache PyLucene, a subproject of Apache Lucene, is a Python extension for
accessing Java Lucene. Its goal is to allow you to use Lucene's text
indexing and searching capabilities from Python. It is API compatible with
the latest versions of Java Lucene, 2.9.4 and 3.0.3.

This release contains a number of bug fixes and improvements. Details can be 
found in the changes files:


http://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_2_9_4/CHANGES
http://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_3_0_3/CHANGES
http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES

Apache PyLucene is available from the following download pages:
http://www.apache.org/dyn/closer.cgi/lucene/pylucene/pylucene-2.9.4-1-src.tar.gz
http://www.apache.org/dyn/closer.cgi/lucene/pylucene/pylucene-3.0.3-1-src.tar.gz

When downloading from a mirror site, please remember to verify the downloads 
using signatures found on the Apache site:

http://www.apache.org/dist/lucene/pylucene/KEYS

For more information on Apache PyLucene, visit the project home page:
http://lucene.apache.org/pylucene

Andi..


Solr-trunk - Build # 1344 - Failure

2010-12-16 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-trunk/1344/

All tests passed

Build Log (for compile errors):
[...truncated 20199 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



LogMergePolicy.setUseCompoundFile/DocStore

2010-12-16 Thread Shai Erera
Hi

I find it very annoying that I need to set true/false on these methods
whenever I want to control compound files creation. Is it really necessary
to allow writing doc stores in non compound files vs. the other index files
in a compound file? Does somebody know if this feature is used somewhere?

If it's crucial to keep the two methods, then how about introducing a
setCompoundMode(true/false) to turn on/off both at once? IndexWriter used to
have it, before we switched to IndexWriterConfig and I think it was very
useful.

Shai


Re: LogMergePolicy.setUseCompoundFile/DocStore

2010-12-16 Thread Earwin Burrfoot
Incoming LUCENE-2814 drops setUseCompoundDocStore()

On Thu, Dec 16, 2010 at 12:04, Shai Erera ser...@gmail.com wrote:
 Hi

 I find it very annoying that I need to set true/false on these methods
 whenever I want to control compound files creation. Is it really necessary
 to allow writing doc stores in non compound files vs. the other index files
 in a compound file? Does somebody know if this feature is used somewhere?

 If it's crucial to keep the two methods, then how about introducing a
 setCompoundMode(true/false) to turn on/off both at once? IndexWriter used to
 have it, before we switched to IndexWriterConfig and I think it was very
 useful.

 Shai




-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: strange problem of PForDelta decoder

2010-12-16 Thread Li Li
hi Michael,
   lucene 4 has so much changes that I don't know how to index and
search with specified codec. could you please give me some code
snipplets that using PFor codec so I can trace the codes.
   in you blog 
http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
   you said The AND query, curiously, got faster; I think this is
because I modified its scoring to first try seeking within the block
of decoded ints.
   I am also curious about the result because VINT only need decode
part of the doclist while PFor need decode the whole block. But I
think with conjuction queries, the main time is used for searching in
skiplist. I haven't read your codes yet. But I guess the skiplist for
VINT and the skiplist for PFor is different.
   e.g.   lucene 2.9's default skipInterval is 16, so it like
   level 1   256
   level 0  16 32 48 64 80 96 112 128 ...   256
   when need skipTo(60) we need read 0 16 32 48 64 in level0
   but when use block, e.g. block size is 128, my implementation of skiplist is
   level 1   256
   level 0 128 256
   when skipTo(60) we only read 2 item in level0 and decode the first
block which contains 128 docIDs

   How do you implement bulk read?
   I did like this: I decode a block and cache it in a int array. I
think I can buffer up to 100K docIDs and tfs for disjuction queries(it
cost less than 1MB memory for each term)
   SegmentTermDocs.read(final int[] docs, final int[] freqs)
   ...
while (i  length  count  df) {
if (curBlockIdx = curBlockSize) { //this 
condition is often
false, we may optimize it. but JVM hotspots will cache hot codes. So
...
int idBlockBytes = 
freqStream.readVInt();
curBlockIdx = 0;
for (int k = 0; k  idBlockBytes; k++) {
buffer[k] = 
freqStream.readInt();
}

blockIds = 
code.decode(buffer,idBlockBytes);
curBlockSize = blockIds.length;

int tfBlockBytes = 
freqStream.readVInt();
for (int k = 0; k  tfBlockBytes; k++) {
buffer[k] = 
freqStream.readInt();
}
blockTfs = code.decode(buffer, 
tfBlockBytes);
assert curBlockSize == decoded.length;  


}
freq = blockTfs[curBlockIdx];
doc += blockIds[curBlockIdx++];

count++;

if (deletedDocs == null || 
!deletedDocs.get(doc)) {
docs[i] = doc;
freqs[i] = freq;
++i;
}
}



2010/12/15 Michael McCandless luc...@mikemccandless.com:
 Hi Li Li,

 That issue has such a big patch, and enough of us are now iterating on
 it, that we cut a dedicated branch for it.

 But note that this branch is off of trunk (to be 4.0).

 You should be able to do this:

  svn checkout 
 https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

 And then run things in there.  I just committed FOR/PFOR prototype
 codecs from LUCENE-1410 onto that branch, so eg you can run unit tests
 using those codecs by running ant test
 -Dtests.codec=PatchedFrameOfRef.

 Please post patches back if you improve things!  We need all the help
 we can get :)

 Mike

 On Wed, Dec 15, 2010 at 5:54 AM, Li Li fancye...@gmail.com wrote:
 hi Michael
    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
    I am not familiar with patch. do I need download
 LUCENE-2723.patch(there are many patches after this name, do I need
 the latest one?) and LUCENE-2723_termscorer.patch and patch them
 (patch -p1 LUCENE-2723.patch)? I just check out the latest source
 code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene


 2010/12/14 Michael McCandless luc...@mikemccandless.com:
 Likely you are seeing the startup cost of hotspot compiling the PFOR code?

 Ie, does your test first warmup the JRE and then do the real test?

 I've also found that running -Xbatch produces more consistent results
 from run to run, however, those results may not be as fast as running
 w/o -Xbatch.

 Also, it's better to test on actual data (ie a Lucene index's
 postings), and in the full context of searching, because then we get a
 sense of what speedups a real app will see... micro-benching is nearly
 

Re: LogMergePolicy.setUseCompoundFile/DocStore

2010-12-16 Thread Shai Erera
Ok perfect !

Shai

On Thu, Dec 16, 2010 at 11:23 AM, Earwin Burrfoot ear...@gmail.com wrote:

 Incoming LUCENE-2814 drops setUseCompoundDocStore()

 On Thu, Dec 16, 2010 at 12:04, Shai Erera ser...@gmail.com wrote:
  Hi
 
  I find it very annoying that I need to set true/false on these methods
  whenever I want to control compound files creation. Is it really
 necessary
  to allow writing doc stores in non compound files vs. the other index
 files
  in a compound file? Does somebody know if this feature is used somewhere?
 
  If it's crucial to keep the two methods, then how about introducing a
  setCompoundMode(true/false) to turn on/off both at once? IndexWriter used
 to
  have it, before we switched to IndexWriterConfig and I think it was very
  useful.
 
  Shai
 



 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] Updated: (SOLR-1993) SolrJ binary update erro when commitWithin is set.

2010-12-16 Thread Maxim Valyanskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Valyanskiy updated SOLR-1993:
---

Attachment: SOLR-1993-1.4.patch

Patch for Solr 1.4 branch

 SolrJ binary update erro when commitWithin is set.
 --

 Key: SOLR-1993
 URL: https://issues.apache.org/jira/browse/SOLR-1993
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4, 1.4.1
Reporter: Phil Bingley
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-1993-1.4.patch, SOLR-1993.patch, SOLR-1993.patch, 
 SolrExampleBinaryTest.java


 Solr server is unable to unmarshall a binary update request where the 
 commitWithin property is set on the UpdateRequest class.
 The client marshalls the request with the following code
 if (updateRequest.getCommitWithin() != -1) {
   params.add(commitWithin, updateRequest.getCommitWithin());
 }
 The property is an int and when the server unmarshalls, the following error 
 happens (can't cast to ListString due to an Integer element)
 SEVERE: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 java.util.List
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.namedListToSolrParams(JavaBinUpdateRequestCodec.java:213)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.access$100(JavaBinUpdateRequestCodec.java:40)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:131)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:126)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:210)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readNamedList(JavaBinUpdateRequestCodec.java:112)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:175)
 at 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:101)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:141)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:68)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:46)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:55)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
 at java.lang.Thread.run(Thread.java:619)
 Workaround is to set the parameter manually as a string value instead of 
 setting using the property on the UpdateRequest class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1993) SolrJ binary update erro when commitWithin is set.

2010-12-16 Thread Maxim Valyanskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972036#action_12972036
 ] 

Maxim Valyanskiy commented on SOLR-1993:


I ported this patch to 1.4 branch and test it in my application. 5 min test 
passed without any problems, commitWithin parametes works as excepted.

Is it possible to include this patch in 1.4.2?

 SolrJ binary update erro when commitWithin is set.
 --

 Key: SOLR-1993
 URL: https://issues.apache.org/jira/browse/SOLR-1993
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4, 1.4.1
Reporter: Phil Bingley
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-1993-1.4.patch, SOLR-1993.patch, SOLR-1993.patch, 
 SolrExampleBinaryTest.java


 Solr server is unable to unmarshall a binary update request where the 
 commitWithin property is set on the UpdateRequest class.
 The client marshalls the request with the following code
 if (updateRequest.getCommitWithin() != -1) {
   params.add(commitWithin, updateRequest.getCommitWithin());
 }
 The property is an int and when the server unmarshalls, the following error 
 happens (can't cast to ListString due to an Integer element)
 SEVERE: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
 java.util.List
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.namedListToSolrParams(JavaBinUpdateRequestCodec.java:213)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.access$100(JavaBinUpdateRequestCodec.java:40)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:131)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:126)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:210)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readNamedList(JavaBinUpdateRequestCodec.java:112)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:175)
 at 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:101)
 at 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:141)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:68)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:46)
 at 
 org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:55)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
 at java.lang.Thread.run(Thread.java:619)
 Workaround is to set the parameter manually as a string value instead of 
 setting using the property on the UpdateRequest class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

Re: strange problem of PForDelta decoder

2010-12-16 Thread Michael McCandless
On the bulkpostings branch you can do something like this:

  CodecProvider cp = new CodecProvider();
  cp.register(new PatchedFrameOfRefCodec());
  cp.setDefaultFieldCodec(PatchedFrameOfRef);

Then whenever you create an IW or IR, use the advanced method that
accepts a CodecProvider.  Then the index will always use PForDelta to
write/read.

I suspect conjunction queries got faster because we no longer skip if
the docID we seek is already in the current buffer (currently sized
64).  Ie, skip is very costly when the target isn't far.  This was
sort of an accidental byproduct of forcing even conjunction queries
using Standard (vInt) codec to work on block buffers, but I think it's
an important opto that we should more generally apply.

Skipping for block codecs and Standard/vInt are done w/ the same class
now.  It's just that the block codec must store the long filePointer
where the block starts *and* the int offset into the block, vs
Standard codec that just stores a filePointer.

On how do we implement bulk read this is the core change on the
bulkpostings branch -- we have a new API to separately bulk-read
docDeltas, freqs, positionDeltas.  But we are rapidly iterating on
improving this (and getting to a clean PFor/For impl) now...

Mike

On Thu, Dec 16, 2010 at 4:29 AM, Li Li fancye...@gmail.com wrote:
 hi Michael,
   lucene 4 has so much changes that I don't know how to index and
 search with specified codec. could you please give me some code
 snipplets that using PFor codec so I can trace the codes.
   in you blog 
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
   you said The AND query, curiously, got faster; I think this is
 because I modified its scoring to first try seeking within the block
 of decoded ints.
   I am also curious about the result because VINT only need decode
 part of the doclist while PFor need decode the whole block. But I
 think with conjuction queries, the main time is used for searching in
 skiplist. I haven't read your codes yet. But I guess the skiplist for
 VINT and the skiplist for PFor is different.
   e.g.   lucene 2.9's default skipInterval is 16, so it like
   level 1                                               256
   level 0  16 32 48 64 80 96 112 128 ...   256
   when need skipTo(60) we need read 0 16 32 48 64 in level0
   but when use block, e.g. block size is 128, my implementation of skiplist is
   level 1       256
   level 0 128 256
   when skipTo(60) we only read 2 item in level0 and decode the first
 block which contains 128 docIDs

   How do you implement bulk read?
   I did like this: I decode a block and cache it in a int array. I
 think I can buffer up to 100K docIDs and tfs for disjuction queries(it
 cost less than 1MB memory for each term)
   SegmentTermDocs.read(final int[] docs, final int[] freqs)
           ...
                        while (i  length  count  df) {
                                if (curBlockIdx = curBlockSize) { //this 
 condition is often
 false, we may optimize it. but JVM hotspots will cache hot codes. So
 ...
                                        int idBlockBytes = 
 freqStream.readVInt();
                                        curBlockIdx = 0;
                                        for (int k = 0; k  idBlockBytes; k++) 
 {
                                                buffer[k] = 
 freqStream.readInt();
                                        }

                                        blockIds = 
 code.decode(buffer,idBlockBytes);
                                        curBlockSize = blockIds.length;

                                        int tfBlockBytes = 
 freqStream.readVInt();
                                        for (int k = 0; k  tfBlockBytes; k++) 
 {
                                                buffer[k] = 
 freqStream.readInt();
                                        }
                                        blockTfs = code.decode(buffer, 
 tfBlockBytes);
                                        assert curBlockSize == decoded.length;

                                }
                                freq = blockTfs[curBlockIdx];
                                doc += blockIds[curBlockIdx++];

                                count++;

                                if (deletedDocs == null || 
 !deletedDocs.get(doc)) {
                                        docs[i] = doc;
                                        freqs[i] = freq;
                                        ++i;
                                }
                        }



 2010/12/15 Michael McCandless luc...@mikemccandless.com:
 Hi Li Li,

 That issue has such a big patch, and enough of us are now iterating on
 it, that we cut a dedicated branch for it.

 But note that this branch is off of trunk (to be 4.0).

 You should be able to do this:

  svn checkout 
 https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

 And then run things in there.  I just committed FOR/PFOR prototype
 

[jira] Commented: (LUCENE-2815) MultiFields not thread safe

2010-12-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972057#action_12972057
 ] 

Michael McCandless commented on LUCENE-2815:


Ugh, nice finds Yonik!  We should fix these.

Maybe MultiFields should just pre-build its MapString,Term on init?

You're right, we do reuse MultiFields today (we stuff the instance of 
MultiFields onto the IndexReader with IndexReader.store/retrieveFields), but I 
wonder whether we really should?  (In fact I thought at one point we decided to 
stop doing that... yet, we still are... can't remember the details; maybe perf 
hit was too high eg for MTQs/Solr facets/etc.).

What do we need to do to make the publication safe?  Is making 
IR.store/retrieveFields sync'd sufficient?

Aside: Java concurrency is a *mess*.  I understand why JMM is needed, to get 
good perf on modern CPUs, but allowing the low level CPU cache coherency 
requirements to bubble all the way up to complex requirements in the language 
itself, is a disaster.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley

 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2814:


Attachment: LUCENE-2814.patch

First iteration.

Passes all tests except TestNRTThreads. Something to do with numDocsInStore and 
numDocsInRam merged together?
Lots of non-critical nocommits (just markers for places I'd like to recheck).
DW.docStoreEnabled and *.closeDocStore() have to go, before committing

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2259) Improve analyzer/version handling in Solr

2010-12-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2259:
--

Attachment: SOLR-2259part2.patch

here is a patch for branch_3x for part 2.

it warns if you are missing the luceneMatchVersion param in your config,
informing you that its emulating Lucene 2.4 and that this emulation is 
deprecated,
and that this parameter will be mandatory in 4.0


 Improve analyzer/version handling in Solr
 -

 Key: SOLR-2259
 URL: https://issues.apache.org/jira/browse/SOLR-2259
 Project: Solr
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259part2.patch


 We added Version for backwards compatibility support in Lucene.
 We use this to fire deprecated code to emulate old version to ensure index 
 backwards compat.
 Related: we deprecate old analysis components and eventually remove them.
 To hook into Solr, at first it defaulted to Version 2.4 emulation everywhere, 
 with the example having the latest.
 if you don't specify a version in your solrconfig, it defaults to 2.4 though.
 However, as of LUCENE-2781 2.4 is removed: but users with old configs that 
 don't specify a version should not be silently upgraded to the Version 3.0 
 emulation... this is bad.
 Additionally, when users are using deprecated emulation or using deprecated 
 factories they might not know it, and it might come as a surprise if they 
 upgrade, especially if they arent looking at java apis or java code.
 I propose:
 # in trunk: we make the solrconfig luceneMatchVersion mandatory. 
 This is simple: Uwe already has a method that will error out if its not 
 present, we just use that. 
 # in 3.x: we warn if you don't specify luceneMatchVersion in solrconfig: 
 telling you that its going to be required in 4.0 and that you are defaulting 
 to 2.4 emulation.
 For example: Warning: luceneMatchVersion is not specified in solrconfig.xml. 
 Defaulting to 2.4 emulation. You should at some point declare and reindex to 
 at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed 
 in 4.0. This parameter will be mandatory in 4.0.
 # in 3.x,trunk: we warn if you are using a deprecated matchVersion constant 
 somewhere in general, even for a specific tokenizer, telling you that you 
 need to at some point reindex with a current version before you can move to 
 the next release.
 For example: Warning: you are using 2.4 emulation, at some point you need to 
 bump and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x 
 and will be removed in 4.0
 # in 3.x,trunk: we warn if you are using a deprecated TokenStreamFactory so 
 that you know its going to be removed.
 For example: Warning: the ISOLatin1FilterFactory is deprecated and will be 
 removed in the next release. You should migrate to ASCIIFoldingFilterFactory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2288:
--

Attachment: SOLR-2288_namedlist.patch

Hi Hoss Man, thanks for starting this issue.

I looked at your patch, and personally I think NamedList should really be 
type-safe.
If users want to use it in a type-unsafe way, thats fine, but the container 
itself shouldn't be ListObject.

Here's an initial patch (all tests pass)... it also removes the deprecated 
methods.


 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2815) MultiFields not thread safe

2010-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972096#action_12972096
 ] 

Yonik Seeley commented on LUCENE-2815:
--

bq. but I wonder whether we really should? (In fact I thought at one point we 
decided to stop doing that... yet, we still are... can't remember the details; 
maybe perf hit was too high eg for MTQs/Solr facets/etc.).

It wouldn't be solr facets... that code asks for fields() once up front (per 
facet request) and the rest of the work will dwarf that.
I think there probably are a lot of random places that use it where the 
overhead could be significant.  For example IndexReader.deleteDocuments(), 
ParallelReader, FuzzyLikeThisQuery, and anyone else that uses any of the static 
methods on Field on a non-segment reader.

bq. What do we need to do to make the publication safe? Is making 
IR.store/retrieveFields sync'd sufficient?

More than sufficient.  A volatile would also work fine provided that a race 
shouldn't matter (i.e. more than one MultiFields object could be constructed).

bq. Maybe MultiFields should just pre-build its MapString,Term on init?

Ouch... those folks with 1000s of fields wouldn't be happy about that.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley

 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972107#action_12972107
 ] 

Hoss Man commented on SOLR-2288:


Robert: as mentioned, i'm trying to keep a narrow focus on this issue: dealing 
with warnings that can be cleaned up w/o changing functionality...

bq. The goal of this issue should not be to change any functionality or APIs, 
just deal with each warning

...can we please confine discusions of changing the implementation of NamedList 
(or any other classes) to distinct issues?  like SOLR-912?


 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972108#action_12972108
 ] 

Robert Muir commented on SOLR-2288:
---

bq. Robert: as mentioned, i'm trying to keep a narrow focus on this issue: 
dealing with warnings that can be cleaned up w/o changing functionality...

Ok but i didnt change the functionality? the functionality is the same, just 
the implementation is different.

This is the root cause of most of the compiler warnings, let's not dodge the 
issue.




 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972114#action_12972114
 ] 

Hoss Man commented on SOLR-2288:


bq. just the implementation is different.

fair enough -- i ment i was trying to avoid changes to either the APIs or the 
internals, just focusing on the quick wins that were easy to review at a glance 
and shouldn't affect the bytecode (CollectionObject instead of Collection; 
etc...)

I don't expect that *all* compiler warnings can be dealt with using trivial 
patches, but that's what i was trying to focus on in this issue.

changes to the internals of specific classes seem like they should be tracked 
in distinct issues with more visibility

 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972116#action_12972116
 ] 

Ryan McKinley commented on SOLR-2288:
-

For compiler warnings... without chaning the API, can we just use:  
NamedList?  rather then bind it explicitly to Object?

 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

2010-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972119#action_12972119
 ] 

Yonik Seeley commented on LUCENE-2723:
--

I tested the optimized index with mike's latest patches (since that's per 
segment on both branch and trunk).  Things are much more in line now... with 
the branch being anywhere from 2.3% to 5.4% slower, depending on the exact 
field tested.

 Speed up Lucene's low level bulk postings read API
 --

 Key: LUCENE-2723
 URL: https://issues.apache.org/jira/browse/LUCENE-2723
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2723-termscorer.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
 LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723.patch, LUCENE-2723_termscorer.patch


 Spinoff from LUCENE-1410.
 The flex DocsEnum has a simple bulk-read API that reads the next chunk
 of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
 (from LUCENE-1410).  This is not unlike sucking coffee through those
 tiny plastic coffee stirrers they hand out airplanes that,
 surprisingly, also happen to function as a straw.
 As a result we see no perf gain from using FOR/PFOR.
 I had hacked up a fix for this, described at in my blog post at
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
 I'm opening this issue to get that work to a committable point.
 So... I've worked out a new bulk-read API to address performance
 bottleneck.  It has some big changes over the current bulk-read API:
   * You can now also bulk-read positions (but not payloads), but, I
  have yet to cutover positional queries.
   * The buffer contains doc deltas, not absolute values, for docIDs
 and positions (freqs are absolute).
   * Deleted docs are not filtered out.
   * The doc  freq buffers need not be aligned.  For fixed intblock
 codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
 Group varint, etc.) they won't be.
 It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2288) clean up compiler warnings

2010-12-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972121#action_12972121
 ] 

Robert Muir commented on SOLR-2288:
---

Separately, i just want to say the following about NamedList:

All uses of this API should really be reviewed. I'm quite aware that it warns 
you about the fact that its slow for certain operations,
but in my opinion these slow operations such as get(String, int) should be 
deprecated and removed.

Any users that are using NamedList in this way, especially in loops, are very 
likely using the wrong datastructure.


 clean up compiler warnings
 --

 Key: SOLR-2288
 URL: https://issues.apache.org/jira/browse/SOLR-2288
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-2288_namedlist.patch, warning.cleanup.patch


 there's a ton of compiler warning in the solr tree, and it's high time we 
 cleaned them up, or annotate them to be suppressed so we can start making a 
 bigger stink when/if code is added to the tree thta produces warnings (we'll 
 never do a good job of noticing new warnings when we have ~175 existing ones)
 Using this issue to track related commits
 The goal of this issue should not be to change any functionality or APIs, 
 just deal with each warning in the most appropriate way;
 * fix generic declarations
 * add SuppressWarning anotation if it's safe in context

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-16 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2694:


Attachment: LUCENE-2694.patch

Attaching current state - all test pass for me and luceneutils brings 
consistent results with trunk.

{code}

   Query   QPS trunkQPS termstate  Pct diff
unit~2.0   14.70   14.39 -2.1%
  united~2.06.916.83 -1.1%
  united~1.07.427.38 -0.6%
unit state   12.31   12.37  0.5%
unit~1.0   15.41   15.49  0.5%
uni*7.187.22  0.6%
un*d7.978.04  0.9%
   unit*   12.89   13.09  1.6%
+unit +state   28.16   28.64  1.7%
+nebraska +state   81.26   82.67  1.7%
spanNear([unit, state], 10, true)   11.60   11.83  2.0%
   state   40.50   41.47  2.4%
  spanFirst(unit, 5)   47.65   48.84  2.5%
  unit state   17.72   18.19  2.7%
 u*d4.274.48  5.0%
{code}
those are the results I have for now Fuzzy only expands to 50 terms so that 
might no be very meaningful. I re-added the TermCache for this patch though... 
Will attach more info tomorrow.

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972127#action_12972127
 ] 

Robert Muir commented on LUCENE-2694:
-

We shouldn't lose the clone() optimization in StandardPostingsReader... 
the class is final so it should use 'copy' instead of calling super.clone()
This is important for -client.


 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Example documents and geospatial

2010-12-16 Thread Erick Erickson
The example docs we distribute have a bunch of stores that have the exact
same location. That lead to some head scratching about why changing the
distance in the example queries seemed to make no difference in the number
of returned results then all of a sudden it reduced the number of hits
drastically.

Any objections to a patch that adds an arbitrary distance (say 1/4 mile or
so) to all of the stores in the example docs that have the same location?

If not, I'll put up a JIRA and attach a patch.

Erick


[jira] Commented: (SOLR-2259) Improve analyzer/version handling in Solr

2010-12-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972150#action_12972150
 ] 

Robert Muir commented on SOLR-2259:
---

I committed part 2 in revision 1050064.

 Improve analyzer/version handling in Solr
 -

 Key: SOLR-2259
 URL: https://issues.apache.org/jira/browse/SOLR-2259
 Project: Solr
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: SOLR-2259.patch, SOLR-2259.patch, SOLR-2259part2.patch


 We added Version for backwards compatibility support in Lucene.
 We use this to fire deprecated code to emulate old version to ensure index 
 backwards compat.
 Related: we deprecate old analysis components and eventually remove them.
 To hook into Solr, at first it defaulted to Version 2.4 emulation everywhere, 
 with the example having the latest.
 if you don't specify a version in your solrconfig, it defaults to 2.4 though.
 However, as of LUCENE-2781 2.4 is removed: but users with old configs that 
 don't specify a version should not be silently upgraded to the Version 3.0 
 emulation... this is bad.
 Additionally, when users are using deprecated emulation or using deprecated 
 factories they might not know it, and it might come as a surprise if they 
 upgrade, especially if they arent looking at java apis or java code.
 I propose:
 # in trunk: we make the solrconfig luceneMatchVersion mandatory. 
 This is simple: Uwe already has a method that will error out if its not 
 present, we just use that. 
 # in 3.x: we warn if you don't specify luceneMatchVersion in solrconfig: 
 telling you that its going to be required in 4.0 and that you are defaulting 
 to 2.4 emulation.
 For example: Warning: luceneMatchVersion is not specified in solrconfig.xml. 
 Defaulting to 2.4 emulation. You should at some point declare and reindex to 
 at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed 
 in 4.0. This parameter will be mandatory in 4.0.
 # in 3.x,trunk: we warn if you are using a deprecated matchVersion constant 
 somewhere in general, even for a specific tokenizer, telling you that you 
 need to at some point reindex with a current version before you can move to 
 the next release.
 For example: Warning: you are using 2.4 emulation, at some point you need to 
 bump and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x 
 and will be removed in 4.0
 # in 3.x,trunk: we warn if you are using a deprecated TokenStreamFactory so 
 that you know its going to be removed.
 For example: Warning: the ISOLatin1FilterFactory is deprecated and will be 
 removed in the next release. You should migrate to ASCIIFoldingFilterFactory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2815) MultiFields not thread safe

2010-12-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2815:
---

Fix Version/s: 4.0

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972173#action_12972173
 ] 

Michael McCandless commented on LUCENE-2814:


OK I dug here... the reason why TestNRTThreads fails is because you moved the 
numDocsInRAM++ out of DW.getThreadState into WaitQueue.writeDocument.

When we buffer del terms in DW.deleteTerm/Terms/Query/Queries, we grab the 
current numDocsInRAM as the docID upto, to record when it comes time to apply 
the delete which docID we must stop at.

But with your change, this value is now an undercount, since numDocsInRAM is 
now acting like numDocsInStore.

One way to fix this would be to change the delete methods to use nextDocID 
instead of numDocsInRAM?

But I think I'd prefer to put back numDocsInRAM++ in getThreadState...

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2611) IntelliJ IDEA setup

2010-12-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972176#action_12972176
 ] 

David Smiley commented on LUCENE-2611:
--

It turns out that IntelliJ was rewriting my $MODULE_DIR$/../../   paths to 
paths relative to a path variable I defined on my system, and that is intended 
behavior according to JetBrains.  I removed the path variable... I can live 
without it after all, and that problem doesn't exist anymore.  

 IntelliJ IDEA setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming.
 The attached patch adds a new top level directory {{dev-tools/}} with sub-dir 
 {{idea/}} containing basic setup files for trunk, as well as a top-level ant 
 target named idea that copies these files into the proper locations.  This 
 arrangement avoids the messiness attendant to in-place project configuration 
 files directly checked into source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit test run per module is 
 included.
 Once {{ant idea}} has been run, the only configuration that must be performed 
 manually is configuring the project-level JDK.
 If this patch is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination module files (*.iml) in each 
 module's directory.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Example documents and geospatial

2010-12-16 Thread Grant Ingersoll
I don't think they are all the same, at least not in trunk.  I believe there 
are a few near San Fran, some near Buffalo, MN (my hometown ;-) ), and some in 
Oklahoma.  You can see this when you hit the /browse url.


On Dec 16, 2010, at 11:32 AM, Erick Erickson wrote:

 The example docs we distribute have a bunch of stores that have the exact 
 same location. That lead to some head scratching about why changing the 
 distance in the example queries seemed to make no difference in the number of 
 returned results then all of a sudden it reduced the number of hits 
 drastically.
 
 Any objections to a patch that adds an arbitrary distance (say 1/4 mile or 
 so) to all of the stores in the example docs that have the same location?
 
 If not, I'll put up a JIRA and attach a patch.
 
 Erick



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Example documents and geospatial

2010-12-16 Thread Yonik Seeley
On Thu, Dec 16, 2010 at 11:32 AM, Erick Erickson
erickerick...@gmail.com wrote:
 The example docs we distribute have a bunch of stores that have the exact
 same location. That lead to some head scratching about why changing the
 distance in the example queries seemed to make no difference in the number
 of returned results then all of a sudden it reduced the number of hits
 drastically.
 Any objections to a patch that adds an arbitrary distance (say 1/4 mile or
 so) to all of the stores in the example docs that have the same location?
 If not, I'll put up a JIRA and attach a patch.
 Erick

+1
Try not to put 'em in a lake or something though ;-)

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1410) PFOR implementation

2010-12-16 Thread hao yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao yan updated LUCENE-1410:


Attachment: LUCENE-1410.patch

This patch is to add codec support for PForDelta compression algorithms.


Changes by Hao Yan (hyan2...@gmail.com)

In summary, I added five files to support and test the codec.

In Src,
1.  org.apache.lucene.index.codecs.pfordelta.PForDelta.java
2.  org.apache.lucene.index.codecs.pfordelta.Simple16.java
3.  org.apache.lucene.index.codecs.PForDeltaFixedBlockCodec.java
4.  
org.apache.lucene.index.codecs.intblock.FixedIntBlockIndexOutputWithGetElementNum.java

In Test,
5.  
org.apache.lucene.index.codecs.intblock.TestPForDeltaFixedIntBLockCodec.java

1)  In particular, the firs class PForDelta is the core implementation
of PForDelta algorithm, which compresses exceptions using Simple16
that is implemented in the second class Simple16.
2)  The third classs PForDeltaFixedBlockCodec is similar to
org.apache.lucene.index.codesc.ockintblock.MockFixedIntBlockCodec in
Test, except that it uses PForDelta to encode the data in the buffer.
3)  The fourth class is almost the same as
org.apache.lucene.index.codecs.intblock.FixedIntBlockINdexOuput,
except that it provides an additional public function to retrieve the
value of the upto field, which is private filed in
FixedIntBlockINdexOuput. The reason I added this public function is
that the number of elements in the block that have meaningful values is not 
always equal to the blockSize or the buffer
size since the last block/buffer of a stream of data usually only
contain less number of data. In the case, I will fill all elements after the 
meaningful elements with 0s. Thus, we alwasy compress one entire block.

4)  The last class is the unit test to test PForDeltaFixedIntBlockCodec
which is very similar to
org.apache.lucene.index.codecs.mintblock.TestIntBlockCodec.

I also changed the LuceneTestCase class to add the new
PForDeltaFixeIntBlockCOde.

The unit tests and all lucence tests have passed.


 PFOR implementation
 ---

 Key: LUCENE-1410
 URL: https://issues.apache.org/jira/browse/LUCENE-1410
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Paul Elschot
Priority: Minor
 Fix For: Bulk Postings branch

 Attachments: autogen.tgz, for-summary.txt, 
 LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
 LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
 LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
 TestPFor2.java, TestPFor2.java

   Original Estimate: 21840h
  Remaining Estimate: 21840h

 Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972227#action_12972227
 ] 

Michael Busch commented on LUCENE-2814:
---

The shared doc stores are actually already completely removed in the realtime 
branch (part of LUCENE-2324).

Does someone want to help with the merge, then we can land the realtime branch 
(which is pretty much only DWPT and removing doc stores) in trunk sometime soon?

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2815) MultiFields not thread safe

2010-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972258#action_12972258
 ] 

Yonik Seeley commented on LUCENE-2815:
--

bq. It looks like MultiReaderBits also has issues with safe object publication. 

Actually, it looks like this one is OK with most of our current code.
SegmentReader.getDeletedDocs() returns an object stored in a volatile, so that 
counts as a safe publish.  Other implementations seem to either throw an 
exception or directly call a segment reader.  One exception is instantiated 
index (I think).

We can't call getDeletedDocs() just once up-front, because an IndexReader may 
still be used to delete documents.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972275#action_12972275
 ] 

Michael Busch commented on LUCENE-2814:
---

Well I need to merge with the recent changes in trunk (especially LUCENE-2680).
The merge is pretty hard, but I'm planning to spend most of my weekend on it. 

If I can get most tests to pass again (most were passing before the merge), 
then I think the only outstanding thing is LUCENE-2573 before we could land it 
in trunk.



 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2815) MultiFields not thread safe

2010-12-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972281#action_12972281
 ] 

Yonik Seeley commented on LUCENE-2815:
--

I was going to fix InstantiatedIndex, but while I was in there, I saw a lot of 
non-threadsafe code.  I think that really deserves it's own issue.
What range of docs is InstantiatedIndex faster for, and is it something we want 
to continue to maintain?

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972286#action_12972286
 ] 

Michael McCandless commented on LUCENE-2814:


I think taking things one step at a time would be good here?

Ie remove doc stores from trunk, let that bake on trunk for a while, then merge 
to RT?  So that what then remains on RT is DWPT / tiered flushing?  Else RT is 
a monster change?

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Example documents and geospatial

2010-12-16 Thread Erick Erickson
Grant:

Yep, there were fewer than I remembered, but still half a dozen or so ...
but I remember way more than that (somehow 16 comes to mind)... so obviously
some gnome has been in there already... Not all of the ones I remember were
the same at all, but enough were that it was puzzling.

Erik:
Lake? Why should I care about a lake? Actually, I never even thought about
it, glad you pointed it out..

OK, I'll put up a patch today or tomorrow. Anybody want to apply the patch
for 2275 (whitespace in mm parameter causes parse exception)?

Erick


On Thu, Dec 16, 2010 at 4:37 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Dec 16, 2010 at 11:32 AM, Erick Erickson
 erickerick...@gmail.com wrote:
  The example docs we distribute have a bunch of stores that have the exact
  same location. That lead to some head scratching about why changing the
  distance in the example queries seemed to make no difference in the
 number
  of returned results then all of a sudden it reduced the number of hits
  drastically.
  Any objections to a patch that adds an arbitrary distance (say 1/4 mile
 or
  so) to all of the stores in the example docs that have the same location?
  If not, I'll put up a JIRA and attach a patch.
  Erick

 +1
 Try not to put 'em in a lake or something though ;-)

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972288#action_12972288
 ] 

Michael Busch commented on LUCENE-2814:
---

bq. I think taking things one step at a time would be good here?

Probably still a smaller change than flex indexing ;)

But yeah in general I agree that we should do things more incrementally.  I 
think that's a mistake I've made with the RT branch so far.  In this particular 
case it's just a bit sad to redo all this work now, because I think I got the 
removal of doc stores right in RT and all related tests to pass.

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972298#action_12972298
 ] 

Earwin Burrfoot commented on LUCENE-2814:
-

So, what's the plan?

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] [Take 3] Release PyLucene 2.9.4-1 and 3.0.3-1

2010-12-16 Thread Andi Vajda


On Sun, 12 Dec 2010, Andi Vajda wrote:


A patch that improves the finding of jni.h on Mac OS X was integrated.
It made it worth blocking this release and preparing new release artifacts.
No one voted on the [Take 2] artifacts and I hope this is not inconveniencing 
anyone.


I also hope that this is it for PyLucene 2.9.4/3.0.3 :-)

So please vote to release the artifacts available from
  http://people.apache.org/~vajda/staging_area/
as PyLucene 2.9.4 and PyLucene 3.0.3.

Here is my +1


This now has passed.
Thank you to all who voted !

The releases should be announced shortly.

Andi..


[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972302#action_12972302
 ] 

Michael Busch commented on LUCENE-2814:
---

bq. So, what's the plan?

I can't really work on this much before Saturday.  But during the weekend I can 
work on the RT merge and maybe try to pull out the docstore removal changes and 
create a separate patch.  Have to see how hard that is.  If it's not too 
difficult I'll post a separate patch, otherwise I'll commit the merge to RT and 
maybe convince you guys to help a bit with getting the RT branch ready for 
landing in trunk? :)


 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972316#action_12972316
 ] 

Earwin Burrfoot commented on LUCENE-2814:
-

Instead of you pulling out docstore removal, I can finish that patch. But then 
merging's gonna be even greater bitch. Probably. But maybe not.
Do you do IRC? It can be faster to discuss in realtime, and you could also tell 
what help you need with the branch.

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2815) MultiFields not thread safe

2010-12-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2815:
-

Attachment: LUCENE-2815.patch

Here's a patch that uses a ConcurrentHashMap for the Terms cache, and makes 
IndexReader.fields volatile.

That IndexReader.fields variable is just the type of stuff that could just be 
stored in a generic cache on the IndexReader, if/when we get something like 
that.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2815.patch


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-16 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2814:


Attachment: LUCENE-2814.patch

Patch updated to trunk, no nocommits, no *.closeDocStore(), tests pass.

SegmentWriteState vs DocumentsWriter bother me.
We track flushed files in both, we inconsistently get current segment from both 
of them.


 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1397 - Failure

2010-12-16 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1397/

All tests passed

Build Log (for compile errors):
[...truncated 18399 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org