date:20091006

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762573#action_12762573
 ] 

Michael McCandless commented on LUCENE-1458:


Whoa thanks for the sudden sprint Mark!

bq. Come on old man, stop clinging to emacs

Hey!  I'm not so old :) But yeah I still cling to emacs.  Hey, I know
people who still cling to vi!

{quote}
I didn't really look at the code, but some stuff I noticed:

java 6 in pfor Arrays.copy

skiplist stuff in codecs still have package of index - not sure what is going 
on there - changed them

in IndexWriter: 
+ // Mark: read twice?
segmentInfos.read(directory);
+ segmentInfos.read(directory, codecs);
{quote}

Excellent catches!  All of these are not right.

bq. (since you don't include contrib in the tar)

Gak, sorry.  I have a bunch of mods there, cutting over to flex API.

bq. You left getEnum(IndexReader reader) in the MultiTerm queries, but no in 
PrefixQuery - just checkin'.

Woops, for back compat I think we need to leave it in (it's a
protected method), deprecated.  I'll put it back if you haven't.

bq. I guess TestBackwardsCompatibility.java has been removed from trunk or 
something? kept it here for now.

Eek, it shouldn't be -- indeed it is.  When did that happen?  We
should fix this (separately from this issue!).

Do you have more fixes coming?  If so, I'll let you sprint some more; else, 
I'll merge in, add contrib & back-compat branch, and post new patch!  Thanks :)


> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API bu

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762576#action_12762576
 ] 

Michael McCandless commented on LUCENE-1458:


bq.  One of the common statistics one needs to implement some more advanced 
scoring approaches is the average document length. Is this patch far enough 
along that I could take a look at it and think about how one might do this?

Well, thinking through how you'd do this... likely you'd want to store
the avg length (in tokens), eg as a single float per field per
segment, right?  The natural place to store this would be in the
FieldInfos, I think?.  Unfortunately, this patch doesn't yet add
extensibility to FieldInfos.

And you'd need a small customization to the indexing chain to
compute this when indexing new docs, which is already doable today
(though, package private).

But then on merging segments, you'd need an extensions point, which we
don't have today, to recompute the avg.  Hmm: how would you handle
deleted docs?  Would you want to go back to the field length for every
doc & recompute the average?  (Which'd mean you need to per doc per
field length, not just the averages).

Unfortunately, this patch doesn't yet address things like customizing
what's stored in FieldInfo or SegmentInfo, nor customizing what
happens during merging (though it takes us a big step closer to this).
I think we need both of these to "finish" flexible indexing, but I'm
thinking at this point that these should really be tackled in followon
issue(s).  This issue is already ridiculously massive.


> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the mo

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Michael McCandless

On Tue, Oct 6, 2009 at 5:54 AM, Michael McCandless (JIRA)
 wrote:
> bq. I guess TestBackwardsCompatibility.java has been removed from trunk or 
> something? kept it here for now.
>
> Eek, it shouldn't be -- indeed it is.  When did that happen?  We
> should fix this (separately from this issue!).

I'm working on restoring TestBackCompat on trunk...

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: svn commit: r822203 - /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java

2009-10-06 Thread Uwe Schindler

Sorry, I think this was one test too much to remove :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
> Sent: Tuesday, October 06, 2009 12:33 PM
> To: java-comm...@lucene.apache.org
> Subject: svn commit: r822203 -
> /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibi
> lity.java
> 
> Author: mikemccand
> Date: Tue Oct  6 10:32:43 2009
> New Revision: 822203
> 
> URL: http://svn.apache.org/viewvc?rev=822203&view=rev
> Log:
> restore TestBackwardsCompatibility
> 
> Added:
> 
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java   (with props)
> 
> Added:
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java
> URL:
> http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/
> index/TestBackwardsCompatibility.java?rev=822203&view=auto
> ==
> 
> ---
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java (added)
> +++
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java Tue Oct  6 10:32:43 2009
> @@ -0,0 +1,530 @@
> +package org.apache.lucene.index;
> +
> +/**
> + * Licensed to the Apache Software Foundation (ASF) under one or more
> + * contributor license agreements.  See the NOTICE file distributed with
> + * this work for additional information regarding copyright ownership.
> + * The ASF licenses this file to You under the Apache License, Version
> 2.0
> + * (the "License"); you may not use this file except in compliance with
> + * the License.  You may obtain a copy of the License at
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +import java.io.BufferedOutputStream;
> +import java.io.File;
> +import java.io.FileOutputStream;
> +import java.io.IOException;
> +import java.io.InputStream;
> +import java.io.OutputStream;
> +import java.util.Arrays;
> +import java.util.Enumeration;
> +import java.util.List;
> +import java.util.zip.ZipEntry;
> +import java.util.zip.ZipFile;
> +
> +import org.apache.lucene.analysis.WhitespaceAnalyzer;
> +import org.apache.lucene.document.Document;
> +import org.apache.lucene.document.Field;
> +import org.apache.lucene.search.IndexSearcher;
> +import org.apache.lucene.search.ScoreDoc;
> +import org.apache.lucene.search.TermQuery;
> +import org.apache.lucene.store.Directory;
> +import org.apache.lucene.store.FSDirectory;
> +import org.apache.lucene.util.LuceneTestCase;
> +import org.apache.lucene.util._TestUtil;
> +
> +/*
> +  Verify we can read the pre-2.1 file format, do searches
> +  against it, and add documents to it.
> +*/
> +
> +public class TestBackwardsCompatibility extends LuceneTestCase
> +{
> +
> +  // Uncomment these cases & run them on an older Lucene
> +  // version, to generate an index to test backwards
> +  // compatibility.  Then, cd to build/test/index.cfs and
> +  // run "zip index..cfs.zip *"; cd to
> +  // build/test/index.nocfs and run "zip
> +  // index..nocfs.zip *".  Then move those 2 zip
> +  // files to your trunk checkout and add them to the
> +  // oldNames array.
> +
> +  /*
> +  public void testCreatePreLocklessCFS() throws IOException {
> +createIndex("index.cfs", true);
> +  }
> +
> +  public void testCreatePreLocklessNoCFS() throws IOException {
> +createIndex("index.nocfs", false);
> +  }
> +  */
> +
> +  /* Unzips dirName + ".zip" --> dirName, removing dirName
> + first */
> +  public void unzip(String zipName, String destDirName) throws
> IOException {
> +
> +Enumeration entries;
> +ZipFile zipFile;
> +zipFile = new ZipFile(zipName + ".zip");
> +
> +entries = zipFile.entries();
> +
> +String dirName = fullDir(destDirName);
> +
> +File fileDir = new File(dirName);
> +rmDir(destDirName);
> +
> +fileDir.mkdir();
> +
> +while (entries.hasMoreElements()) {
> +  ZipEntry entry = (ZipEntry) entries.nextElement();
> +
> +  InputStream in = zipFile.getInputStream(entry);
> +  OutputStream out = new BufferedOutputStream(new
> FileOutputStream(new File(fileDir, entry.getName(;
> +
> +  byte[] buffer = new byte[8192];
> +  int len;
> +  while((len = in.read(buffer)) >= 0) {
> +out.write(buffer, 0, len);
> +  }
> +
> +  in.close();
> +  out.close();
> +}
> +
> +zipFile.close();
> +  }
> +
> +  public void testCreateCFS() throws IOException {
> +String dirName = "testindex.cfs";
> +create

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Mark Miller

Merge away - still sleeping over here. Would love to look more again  
but don't know when, so no use waiting on me.


- Mark

http://www.lucidimagination.com (mobile)

On Oct 6, 2009, at 5:54 AM, "Michael McCandless (JIRA)"  
 wrote:




   [ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762573#action_12762573 
 ]


Michael McCandless commented on LUCENE-1458:


Whoa thanks for the sudden sprint Mark!

bq. Come on old man, stop clinging to emacs

Hey!  I'm not so old :) But yeah I still cling to emacs.  Hey, I know
people who still cling to vi!

{quote}
I didn't really look at the code, but some stuff I noticed:

java 6 in pfor Arrays.copy

skiplist stuff in codecs still have package of index - not sure what  
is going on there - changed them


in IndexWriter:
+ // Mark: read twice?
segmentInfos.read(directory);
+ segmentInfos.read(directory, codecs);
{quote}

Excellent catches!  All of these are not right.

bq. (since you don't include contrib in the tar)

Gak, sorry.  I have a bunch of mods there, cutting over to flex API.

bq. You left getEnum(IndexReader reader) in the MultiTerm queries,  
but no in PrefixQuery - just checkin'.


Woops, for back compat I think we need to leave it in (it's a
protected method), deprecated.  I'll put it back if you haven't.

bq. I guess TestBackwardsCompatibility.java has been removed from  
trunk or something? kept it here for now.


Eek, it shouldn't be -- indeed it is.  When did that happen?  We
should fix this (separately from this issue!).

Do you have more fixes coming?  If so, I'll let you sprint some  
more; else, I'll merge in, add contrib & back-compat branch, and  
post new patch!  Thanks :)




Further steps towards flexible indexing
---

   Key: LUCENE-1458
   URL: https://issues.apache.org/jira/browse/LUCENE-1458
   Project: Lucene - Java
Issue Type: New Feature
Components: Index
  Affects Versions: 2.9
  Reporter: Michael McCandless
  Assignee: Michael McCandless
  Priority: Minor
   Attachments: LUCENE-1458-back-compat.patch, LUCENE-1458-back- 
compat.patch, LUCENE-1458-back-compat.patch, LUCENE-1458-back- 
compat.patch, LUCENE-1458-back-compat.patch, LUCENE-1458-back- 
compat.patch, LUCENE-1458.patch, LUCENE-1458.patch,  
LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch,  
LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch,  
LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2,  
LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2,  
LUCENE-1458.tar.bz2



I attached a very rough checkpoint of my current patch, to get early
feedback.  All tests pass, though back compat tests don't pass due to
changes to package-private APIs plus certain bugs in tests that
happened to work (eg call TermPostions.nextPosition() too many times,
which the new API asserts against).
[Aside: I think, when we commit changes to package-private APIs such
that back-compat tests don't pass, we could go back, make a branch on
the back-compat tag, commit changes to the tests to use the new
package private APIs on that branch, then fix nightly build to use  
the

tip of that branch?o]
There's still plenty to do before this is committable! This is a
rather large change:
 * Switches to a new more efficient terms dict format.  This still
   uses tii/tis files, but the tii only stores term & long offset
   (not a TermInfo).  At seek points, tis encodes term & freq/prox
   offsets absolutely instead of with deltas delta.  Also, tis/tii
   are structured by field, so we don't have to record field number
   in every term.
.
   On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
   -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
.
   RAM usage when loading terms dict index is significantly less
   since we only load an array of offsets and an array of String (no
   more TermInfo array).  It should be faster to init too.
.
   This part is basically done.
 * Introduces modular reader codec that strongly decouples terms dict
   from docs/positions readers.  EG there is no more TermInfo used
   when reading the new format.
.
   There's nice symmetry now between reading & writing in the codec
   chain -- the current docs/prox format is captured in:
{code}
FormatPostingsTermsDictWriter/Reader
FormatPostingsDocsWriter/Reader (.frq file) and
FormatPostingsPositionsWriter/Reader (.prx file).
{code}
   This part is basically done.
 * Introduces a new "flex" API for iterating through the fields,
   terms, docs and positions:
{code}
FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
{code}
   This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
   old API on top of the new API to keep back-compat.

Next steps:
 * Plug in new codecs (pulsing, pfor) to exercise the modularity /
   fix any

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762590#action_12762590
 ] 

Uwe Schindler commented on LUCENE-1458:
---

{quote}
bq. I guess TestBackwardsCompatibility.java has been removed from trunk or 
something? kept it here for now.

Eek, it shouldn't be - indeed it is. When did that happen? We
should fix this (separately from this issue!).
{quote}

My fault, I removed it during the remove backwards tests on Saturday. If we do 
not remove DateTools/DateField for 3.0 (we may need to leave it in for index 
compatibility), I will restore, these tests, too. It's easy with TortoiseSVN 
and you can also preserve the history (using svn:mergeinfo prop).

I have this on my list when going forward with removing the old TokenStream API.

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead of
> term-doc-position level, you could just add a new attribute.
>   * Test performance & iterate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail:

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762592#action_12762592
 ] 

Michael McCandless commented on LUCENE-1458:


bq. It's easy with TortoiseSVN and you can also preserve the history (using 
svn:mergeinfo prop).

Ahh -- can you do this for TestBackwardsCompatibility?  I restored it, but, 
lost all history.  Thanks.

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead of
> term-doc-position level, you could just add a new attribute.
>   * Test performance & iterate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762600#action_12762600
 ] 

Uwe Schindler commented on LUCENE-1458:
---

Done. I also did it for the BW branch, but didn't create a tag yet. The next 
tag creation for the next bigger patch is enough (no need to do it now).

What I have done: svn copy from the older revision to the same path :-)

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead of
> term-doc-position level, you could just add a new attribute.
>   * Test performance & iterate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762633#action_12762633
 ] 

Michael McCandless commented on LUCENE-1458:


bq. What I have done: svn copy from the older revision to the same path

Excellent, thanks!  It had a few problems (was still trying to deprecated APIs, 
some of which were gone) -- I just committed fixes.

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead of
> term-doc-position level, you could just add a new attribute.
>   * Test performance & iterate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: svn commit: r822284 - /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java

2009-10-06 Thread Uwe Schindler

Can you add this patch to backwards, too? I forgot, that some of the
backwards-changes also applied to BW, but for completeness, not sure, if a
tag is also needed.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
> Sent: Tuesday, October 06, 2009 4:13 PM
> To: java-comm...@lucene.apache.org
> Subject: svn commit: r822284 -
> /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibi
> lity.java
> 
> Author: mikemccand
> Date: Tue Oct  6 14:12:46 2009
> New Revision: 822284
> 
> URL: http://svn.apache.org/viewvc?rev=822284&view=rev
> Log:
> fix TestBackwardsCompability to not use deprecated APIs
> 
> Modified:
> 
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java
> 
> Modified:
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java
> URL:
> http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/
> index/TestBackwardsCompatibility.java?rev=822284&r1=822283&r2=822284&view=
> diff
> ==
> 
> ---
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java (original)
> +++
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> ity.java Tue Oct  6 14:12:46 2009
> @@ -158,11 +158,7 @@
>  for(int i=0;iString dirName = "src/test/org/apache/lucene/index/index." +
> oldNames[i];
>unzip(dirName, oldNames[i]);
> -  changeIndexNoAdds(oldNames[i], true);
> -  rmDir(oldNames[i]);
> -
> -  unzip(dirName, oldNames[i]);
> -  changeIndexNoAdds(oldNames[i], false);
> +  changeIndexNoAdds(oldNames[i]);
>rmDir(oldNames[i]);
>  }
>}
> @@ -171,11 +167,7 @@
>  for(int i=0;iString dirName = "src/test/org/apache/lucene/index/index." +
> oldNames[i];
>unzip(dirName, oldNames[i]);
> -  changeIndexWithAdds(oldNames[i], true);
> -  rmDir(oldNames[i]);
> -
> -  unzip(dirName, oldNames[i]);
> -  changeIndexWithAdds(oldNames[i], false);
> +  changeIndexWithAdds(oldNames[i]);
>rmDir(oldNames[i]);
>  }
>}
> @@ -196,7 +188,7 @@
>  dirName = fullDir(dirName);
> 
>  Directory dir = FSDirectory.open(new File(dirName));
> -IndexSearcher searcher = new IndexSearcher(dir);
> +IndexSearcher searcher = new IndexSearcher(dir, true);
>  IndexReader reader = searcher.getIndexReader();
> 
>  _TestUtil.checkIndex(dir);
> @@ -267,14 +259,14 @@
> 
>/* Open pre-lockless index, add docs, do a delete &
> * setNorm, and search */
> -  public void changeIndexWithAdds(String dirName, boolean autoCommit)
> throws IOException {
> +  public void changeIndexWithAdds(String dirName) throws IOException {
>  String origDirName = dirName;
>  dirName = fullDir(dirName);
> 
>  Directory dir = FSDirectory.open(new File(dirName));
> 
>  // open writer
> -IndexWriter writer = new IndexWriter(dir, autoCommit, new
> WhitespaceAnalyzer(), false);
> +IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(),
> false, IndexWriter.MaxFieldLength.UNLIMITED);
> 
>  // add 10 docs
>  for(int i=0;i<10;i++) {
> @@ -288,11 +280,11 @@
>  } else {
>expected = 46;
>  }
> -assertEquals("wrong doc count", expected, writer.docCount());
> +assertEquals("wrong doc count", expected, writer.maxDoc());
>  writer.close();
> 
>  // make sure searching sees right # hits
> -IndexSearcher searcher = new IndexSearcher(dir);
> +IndexSearcher searcher = new IndexSearcher(dir, true);
>  ScoreDoc[] hits = searcher.search(new TermQuery(new Term("content",
> "aaa")), null, 1000).scoreDocs;
>  Document d = searcher.doc(hits[0].doc);
>  assertEquals("wrong first document", "21", d.get("id"));
> @@ -301,7 +293,7 @@
> 
>  // make sure we can do delete & setNorm against this
>  // pre-lockless segment:
> -IndexReader reader = IndexReader.open(dir);
> +IndexReader reader = IndexReader.open(dir, false);
>  Term searchTerm = new Term("id", "6");
>  int delCount = reader.deleteDocuments(searchTerm);
>  assertEquals("wrong delete count", 1, delCount);
> @@ -309,7 +301,7 @@
>  reader.close();
> 
>  // make sure they "took":
> -searcher = new IndexSearcher(dir);
> +searcher = new IndexSearcher(dir, true);
>  hits = searcher.search(new TermQuery(new Term("content", "aaa")),
> null, 1000).scoreDocs;
>  assertEquals("wrong number of hits", 43, hits.length);
>  d = searcher.doc(hits[0].doc);
> @@ -318,11 +310,11 @@
>  searcher.close();
> 
>  // optimize
> -writer = new IndexWriter(dir, autoCommit, new WhitespaceAnalyzer(),
> false);
> +writer = new IndexWriter(dir, new WhitespaceAnalyzer(), false,
> IndexWriter.MaxFieldLength.UNLIMITED);
>  writer.

Re: svn commit: r822284 - /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java

2009-10-06 Thread Michael McCandless

OK will do.

Mike

On Tue, Oct 6, 2009 at 10:23 AM, Uwe Schindler  wrote:
> Can you add this patch to backwards, too? I forgot, that some of the
> backwards-changes also applied to BW, but for completeness, not sure, if a
> tag is also needed.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
>> Sent: Tuesday, October 06, 2009 4:13 PM
>> To: java-comm...@lucene.apache.org
>> Subject: svn commit: r822284 -
>> /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibi
>> lity.java
>>
>> Author: mikemccand
>> Date: Tue Oct  6 14:12:46 2009
>> New Revision: 822284
>>
>> URL: http://svn.apache.org/viewvc?rev=822284&view=rev
>> Log:
>> fix TestBackwardsCompability to not use deprecated APIs
>>
>> Modified:
>>
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> ity.java
>>
>> Modified:
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> ity.java
>> URL:
>> http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/
>> index/TestBackwardsCompatibility.java?rev=822284&r1=822283&r2=822284&view=
>> diff
>> ==
>> 
>> ---
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> ity.java (original)
>> +++
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> ity.java Tue Oct  6 14:12:46 2009
>> @@ -158,11 +158,7 @@
>>      for(int i=0;i>        String dirName = "src/test/org/apache/lucene/index/index." +
>> oldNames[i];
>>        unzip(dirName, oldNames[i]);
>> -      changeIndexNoAdds(oldNames[i], true);
>> -      rmDir(oldNames[i]);
>> -
>> -      unzip(dirName, oldNames[i]);
>> -      changeIndexNoAdds(oldNames[i], false);
>> +      changeIndexNoAdds(oldNames[i]);
>>        rmDir(oldNames[i]);
>>      }
>>    }
>> @@ -171,11 +167,7 @@
>>      for(int i=0;i>        String dirName = "src/test/org/apache/lucene/index/index." +
>> oldNames[i];
>>        unzip(dirName, oldNames[i]);
>> -      changeIndexWithAdds(oldNames[i], true);
>> -      rmDir(oldNames[i]);
>> -
>> -      unzip(dirName, oldNames[i]);
>> -      changeIndexWithAdds(oldNames[i], false);
>> +      changeIndexWithAdds(oldNames[i]);
>>        rmDir(oldNames[i]);
>>      }
>>    }
>> @@ -196,7 +188,7 @@
>>      dirName = fullDir(dirName);
>>
>>      Directory dir = FSDirectory.open(new File(dirName));
>> -    IndexSearcher searcher = new IndexSearcher(dir);
>> +    IndexSearcher searcher = new IndexSearcher(dir, true);
>>      IndexReader reader = searcher.getIndexReader();
>>
>>      _TestUtil.checkIndex(dir);
>> @@ -267,14 +259,14 @@
>>
>>    /* Open pre-lockless index, add docs, do a delete &
>>     * setNorm, and search */
>> -  public void changeIndexWithAdds(String dirName, boolean autoCommit)
>> throws IOException {
>> +  public void changeIndexWithAdds(String dirName) throws IOException {
>>      String origDirName = dirName;
>>      dirName = fullDir(dirName);
>>
>>      Directory dir = FSDirectory.open(new File(dirName));
>>
>>      // open writer
>> -    IndexWriter writer = new IndexWriter(dir, autoCommit, new
>> WhitespaceAnalyzer(), false);
>> +    IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(),
>> false, IndexWriter.MaxFieldLength.UNLIMITED);
>>
>>      // add 10 docs
>>      for(int i=0;i<10;i++) {
>> @@ -288,11 +280,11 @@
>>      } else {
>>        expected = 46;
>>      }
>> -    assertEquals("wrong doc count", expected, writer.docCount());
>> +    assertEquals("wrong doc count", expected, writer.maxDoc());
>>      writer.close();
>>
>>      // make sure searching sees right # hits
>> -    IndexSearcher searcher = new IndexSearcher(dir);
>> +    IndexSearcher searcher = new IndexSearcher(dir, true);
>>      ScoreDoc[] hits = searcher.search(new TermQuery(new Term("content",
>> "aaa")), null, 1000).scoreDocs;
>>      Document d = searcher.doc(hits[0].doc);
>>      assertEquals("wrong first document", "21", d.get("id"));
>> @@ -301,7 +293,7 @@
>>
>>      // make sure we can do delete & setNorm against this
>>      // pre-lockless segment:
>> -    IndexReader reader = IndexReader.open(dir);
>> +    IndexReader reader = IndexReader.open(dir, false);
>>      Term searchTerm = new Term("id", "6");
>>      int delCount = reader.deleteDocuments(searchTerm);
>>      assertEquals("wrong delete count", 1, delCount);
>> @@ -309,7 +301,7 @@
>>      reader.close();
>>
>>      // make sure they "took":
>> -    searcher = new IndexSearcher(dir);
>> +    searcher = new IndexSearcher(dir, true);
>>      hits = searcher.search(new TermQuery(new Term("content", "aaa")),
>> null, 1000).scoreDocs;
>>      assertEquals("wrong number of hits", 43, hits.length);
>>      d = searcher.doc(hits[0].doc);
>> @@ -318,11 +310,11 @@
>>      searcher.close();
>>
>>      // optimize
>

RE: svn commit: r822284 - /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java

2009-10-06 Thread Uwe Schindler

Thanks, sorry for extra work! I missed to do this after the svn copy :(

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Tuesday, October 06, 2009 4:37 PM
> To: java-dev@lucene.apache.org
> Subject: Re: svn commit: r822284 -
> /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibi
> lity.java
> 
> OK will do.
> 
> Mike
> 
> On Tue, Oct 6, 2009 at 10:23 AM, Uwe Schindler  wrote:
> > Can you add this patch to backwards, too? I forgot, that some of the
> > backwards-changes also applied to BW, but for completeness, not sure, if
> a
> > tag is also needed.
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
> >> Sent: Tuesday, October 06, 2009 4:13 PM
> >> To: java-comm...@lucene.apache.org
> >> Subject: svn commit: r822284 -
> >>
> /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibi
> >> lity.java
> >>
> >> Author: mikemccand
> >> Date: Tue Oct  6 14:12:46 2009
> >> New Revision: 822284
> >>
> >> URL: http://svn.apache.org/viewvc?rev=822284&view=rev
> >> Log:
> >> fix TestBackwardsCompability to not use deprecated APIs
> >>
> >> Modified:
> >>
> >>
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> >> ity.java
> >>
> >> Modified:
> >>
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> >> ity.java
> >> URL:
> >>
> http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/
> >>
> index/TestBackwardsCompatibility.java?rev=822284&r1=822283&r2=822284&view=
> >> diff
> >>
> ==
> >> 
> >> ---
> >>
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> >> ity.java (original)
> >> +++
> >>
> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
> >> ity.java Tue Oct  6 14:12:46 2009
> >> @@ -158,11 +158,7 @@
> >>      for(int i=0;i >>        String dirName = "src/test/org/apache/lucene/index/index." +
> >> oldNames[i];
> >>        unzip(dirName, oldNames[i]);
> >> -      changeIndexNoAdds(oldNames[i], true);
> >> -      rmDir(oldNames[i]);
> >> -
> >> -      unzip(dirName, oldNames[i]);
> >> -      changeIndexNoAdds(oldNames[i], false);
> >> +      changeIndexNoAdds(oldNames[i]);
> >>        rmDir(oldNames[i]);
> >>      }
> >>    }
> >> @@ -171,11 +167,7 @@
> >>      for(int i=0;i >>        String dirName = "src/test/org/apache/lucene/index/index." +
> >> oldNames[i];
> >>        unzip(dirName, oldNames[i]);
> >> -      changeIndexWithAdds(oldNames[i], true);
> >> -      rmDir(oldNames[i]);
> >> -
> >> -      unzip(dirName, oldNames[i]);
> >> -      changeIndexWithAdds(oldNames[i], false);
> >> +      changeIndexWithAdds(oldNames[i]);
> >>        rmDir(oldNames[i]);
> >>      }
> >>    }
> >> @@ -196,7 +188,7 @@
> >>      dirName = fullDir(dirName);
> >>
> >>      Directory dir = FSDirectory.open(new File(dirName));
> >> -    IndexSearcher searcher = new IndexSearcher(dir);
> >> +    IndexSearcher searcher = new IndexSearcher(dir, true);
> >>      IndexReader reader = searcher.getIndexReader();
> >>
> >>      _TestUtil.checkIndex(dir);
> >> @@ -267,14 +259,14 @@
> >>
> >>    /* Open pre-lockless index, add docs, do a delete &
> >>     * setNorm, and search */
> >> -  public void changeIndexWithAdds(String dirName, boolean autoCommit)
> >> throws IOException {
> >> +  public void changeIndexWithAdds(String dirName) throws IOException {
> >>      String origDirName = dirName;
> >>      dirName = fullDir(dirName);
> >>
> >>      Directory dir = FSDirectory.open(new File(dirName));
> >>
> >>      // open writer
> >> -    IndexWriter writer = new IndexWriter(dir, autoCommit, new
> >> WhitespaceAnalyzer(), false);
> >> +    IndexWriter writer = new IndexWriter(dir, new
> WhitespaceAnalyzer(),
> >> false, IndexWriter.MaxFieldLength.UNLIMITED);
> >>
> >>      // add 10 docs
> >>      for(int i=0;i<10;i++) {
> >> @@ -288,11 +280,11 @@
> >>      } else {
> >>        expected = 46;
> >>      }
> >> -    assertEquals("wrong doc count", expected, writer.docCount());
> >> +    assertEquals("wrong doc count", expected, writer.maxDoc());
> >>      writer.close();
> >>
> >>      // make sure searching sees right # hits
> >> -    IndexSearcher searcher = new IndexSearcher(dir);
> >> +    IndexSearcher searcher = new IndexSearcher(dir, true);
> >>      ScoreDoc[] hits = searcher.search(new TermQuery(new
> Term("content",
> >> "aaa")), null, 1000).scoreDocs;
> >>      Document d = searcher.doc(hits[0].doc);
> >>      assertEquals("wrong first document", "21", d.get("id"));
> >> @@ -301,7 +293,7 @@
> >>
> >>      // make sure we can do delete & setNorm against this
> >>

Re: svn commit: r822284 - /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java

2009-10-06 Thread Michael McCandless

No problem!  It's exciting :)

Mike

On Tue, Oct 6, 2009 at 10:40 AM, Uwe Schindler  wrote:
> Thanks, sorry for extra work! I missed to do this after the svn copy :(
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Tuesday, October 06, 2009 4:37 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: svn commit: r822284 -
>> /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibi
>> lity.java
>>
>> OK will do.
>>
>> Mike
>>
>> On Tue, Oct 6, 2009 at 10:23 AM, Uwe Schindler  wrote:
>> > Can you add this patch to backwards, too? I forgot, that some of the
>> > backwards-changes also applied to BW, but for completeness, not sure, if
>> a
>> > tag is also needed.
>> >
>> > -
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: u...@thetaphi.de
>> >
>> >
>> >> -Original Message-
>> >> From: mikemcc...@apache.org [mailto:mikemcc...@apache.org]
>> >> Sent: Tuesday, October 06, 2009 4:13 PM
>> >> To: java-comm...@lucene.apache.org
>> >> Subject: svn commit: r822284 -
>> >>
>> /lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibi
>> >> lity.java
>> >>
>> >> Author: mikemccand
>> >> Date: Tue Oct  6 14:12:46 2009
>> >> New Revision: 822284
>> >>
>> >> URL: http://svn.apache.org/viewvc?rev=822284&view=rev
>> >> Log:
>> >> fix TestBackwardsCompability to not use deprecated APIs
>> >>
>> >> Modified:
>> >>
>> >>
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> >> ity.java
>> >>
>> >> Modified:
>> >>
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> >> ity.java
>> >> URL:
>> >>
>> http://svn.apache.org/viewvc/lucene/java/trunk/src/test/org/apache/lucene/
>> >>
>> index/TestBackwardsCompatibility.java?rev=822284&r1=822283&r2=822284&view=
>> >> diff
>> >>
>> ==
>> >> 
>> >> ---
>> >>
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> >> ity.java (original)
>> >> +++
>> >>
>> lucene/java/trunk/src/test/org/apache/lucene/index/TestBackwardsCompatibil
>> >> ity.java Tue Oct  6 14:12:46 2009
>> >> @@ -158,11 +158,7 @@
>> >>      for(int i=0;i> >>        String dirName = "src/test/org/apache/lucene/index/index." +
>> >> oldNames[i];
>> >>        unzip(dirName, oldNames[i]);
>> >> -      changeIndexNoAdds(oldNames[i], true);
>> >> -      rmDir(oldNames[i]);
>> >> -
>> >> -      unzip(dirName, oldNames[i]);
>> >> -      changeIndexNoAdds(oldNames[i], false);
>> >> +      changeIndexNoAdds(oldNames[i]);
>> >>        rmDir(oldNames[i]);
>> >>      }
>> >>    }
>> >> @@ -171,11 +167,7 @@
>> >>      for(int i=0;i> >>        String dirName = "src/test/org/apache/lucene/index/index." +
>> >> oldNames[i];
>> >>        unzip(dirName, oldNames[i]);
>> >> -      changeIndexWithAdds(oldNames[i], true);
>> >> -      rmDir(oldNames[i]);
>> >> -
>> >> -      unzip(dirName, oldNames[i]);
>> >> -      changeIndexWithAdds(oldNames[i], false);
>> >> +      changeIndexWithAdds(oldNames[i]);
>> >>        rmDir(oldNames[i]);
>> >>      }
>> >>    }
>> >> @@ -196,7 +188,7 @@
>> >>      dirName = fullDir(dirName);
>> >>
>> >>      Directory dir = FSDirectory.open(new File(dirName));
>> >> -    IndexSearcher searcher = new IndexSearcher(dir);
>> >> +    IndexSearcher searcher = new IndexSearcher(dir, true);
>> >>      IndexReader reader = searcher.getIndexReader();
>> >>
>> >>      _TestUtil.checkIndex(dir);
>> >> @@ -267,14 +259,14 @@
>> >>
>> >>    /* Open pre-lockless index, add docs, do a delete &
>> >>     * setNorm, and search */
>> >> -  public void changeIndexWithAdds(String dirName, boolean autoCommit)
>> >> throws IOException {
>> >> +  public void changeIndexWithAdds(String dirName) throws IOException {
>> >>      String origDirName = dirName;
>> >>      dirName = fullDir(dirName);
>> >>
>> >>      Directory dir = FSDirectory.open(new File(dirName));
>> >>
>> >>      // open writer
>> >> -    IndexWriter writer = new IndexWriter(dir, autoCommit, new
>> >> WhitespaceAnalyzer(), false);
>> >> +    IndexWriter writer = new IndexWriter(dir, new
>> WhitespaceAnalyzer(),
>> >> false, IndexWriter.MaxFieldLength.UNLIMITED);
>> >>
>> >>      // add 10 docs
>> >>      for(int i=0;i<10;i++) {
>> >> @@ -288,11 +280,11 @@
>> >>      } else {
>> >>        expected = 46;
>> >>      }
>> >> -    assertEquals("wrong doc count", expected, writer.docCount());
>> >> +    assertEquals("wrong doc count", expected, writer.maxDoc());
>> >>      writer.close();
>> >>
>> >>      // make sure searching sees right # hits
>> >> -    IndexSearcher searcher = new IndexSearcher(dir);
>> >> +    IndexSearcher searcher = new IndexSearcher(dir, true);
>> >>      ScoreDoc[] hits = searcher.search(new TermQuery(new
>> Term("content",
>> >> "aaa"))

[jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1458:
---

Attachment: LUCENE-1458.patch

Uber-patch attached: started from Mark's patch (thanks!), added my contrib & 
back-compat branch changes.  All tests pass.

Also, I removed pfor from this issue.  I'll attach the pfor codec to 
LUCENE-1410.

Note that I didn't use "svn move" in generating the patch, so that the patch 
can be applied cleanly.  When it [finally] comes time to commit for real, I'll 
svn move so we preserve history.

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead of
> term-doc-position level, you could just add a new attribute.
>   * Test performance & iterate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Updated: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Mark Miller

Michael McCandless (JIRA) wrote:
>  [ 
> https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Michael McCandless updated LUCENE-1458:
> ---
>
> Attachment: LUCENE-1458.patch
>
> Uber-patch attached: started from Mark's patch (thanks!),
Anytime! Grunt work and I go together like Michael Bay and Uwe Boll.

Hope I can actually make a meaningful contribution to flexible indexing
at some point.


-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

De-basing / re-basing docIDs, or how to effectively pass calculated values from a Scorer or Filter up to (Solr's) QueryComponent.process

2009-10-06 Thread Aaron McKee



In the code I'm working with, I generate a cache of calculated values as 
a by-product within a Filter.getDocidSet implementation (and within a 
Query-ized version of the filter and its Scorer method) . These values 
are keyed off the IndexReader's docID values, since that's all that's 
accessible at that level. Ultimately, however, I need to be able to 
access these values much higher up in the stack (Solr's 
QueryComponent.process method), so that I can inject the dynamic values 
into the response as a fake field. The IDs available here, however, are 
for the entire index and not just relative to the current IndexReader. 
I'm still fairly new to Lucene and I've been scratching my head a bit 
trying to find a reliable way to map these values into the same space, 
without having to hack up too many base classes. I noticed that there 
was a related discussion at:


http://issues.apache.org/jira/browse/LUCENE-1821?focusedCommentId=12745041#action_12745041

... but also a bit of disagreement on the suggested strategies. Ideally, 
I'm also hoping there's a strategy that won't require me to hack up too 
much of the core product; subclassing IndexSearcher in the way suggested 
would basically require me to change all of the various SearchComponents 
I use in Solr, and that sounds like it'd end up a real maintenance 
nightmare. I was looking at the Collector class as possible solution, 
since it has knowledge of the docbase, but it looks like I'd then need 
to change every derived collector that the code ultimately uses and, 
including the various anonymous Collectors in Solr, that also looks like 
it'd be a fairly ghoulish solution. I suppose I'm being wishful, or 
lazy, but is there a reasonable and reliable way to do this, without 
having to fork the core code? If not, any suggestion on the best 
strategy to accomplish this, without adding too much overhead every time 
I wanted to up-rev the core Lucene and/or Solr code to the latest version?


Thanks a ton,
Aaron


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: De-basing / re-basing docIDs, or how to effectively pass calculated values from a Scorer or Filter up to (Solr's) QueryComponent.process

2009-10-06 Thread Yonik Seeley

Aaron, could you move this to solr-user?

-Yonik
http://www.lucidimagination.com


On Tue, Oct 6, 2009 at 11:22 AM, Aaron McKee  wrote:
>
> In the code I'm working with, I generate a cache of calculated values as a
> by-product within a Filter.getDocidSet implementation (and within a
> Query-ized version of the filter and its Scorer method) . These values are
> keyed off the IndexReader's docID values, since that's all that's accessible
> at that level. Ultimately, however, I need to be able to access these values
> much higher up in the stack (Solr's QueryComponent.process method), so that
> I can inject the dynamic values into the response as a fake field. The IDs
> available here, however, are for the entire index and not just relative to
> the current IndexReader. I'm still fairly new to Lucene and I've been
> scratching my head a bit trying to find a reliable way to map these values
> into the same space, without having to hack up too many base classes. I
> noticed that there was a related discussion at:
>
> http://issues.apache.org/jira/browse/LUCENE-1821?focusedCommentId=12745041#action_12745041
>
> ... but also a bit of disagreement on the suggested strategies. Ideally, I'm
> also hoping there's a strategy that won't require me to hack up too much of
> the core product; subclassing IndexSearcher in the way suggested would
> basically require me to change all of the various SearchComponents I use in
> Solr, and that sounds like it'd end up a real maintenance nightmare. I was
> looking at the Collector class as possible solution, since it has knowledge
> of the docbase, but it looks like I'd then need to change every derived
> collector that the code ultimately uses and, including the various anonymous
> Collectors in Solr, that also looks like it'd be a fairly ghoulish solution.
> I suppose I'm being wishful, or lazy, but is there a reasonable and reliable
> way to do this, without having to fork the core code? If not, any suggestion
> on the best strategy to accomplish this, without adding too much overhead
> every time I wanted to up-rev the core Lucene and/or Solr code to the latest
> version?
>
> Thanks a ton,
> Aaron

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: De-basing / re-basing docIDs, or how to effectively pass calculated values from a Scorer or Filter up to (Solr's) QueryComponent.process

2009-10-06 Thread Earwin Burrfoot

Might still be lucene-ish issue.
We already have getSequentialSubReaders() on IR, in my patched version
I augmented this with public readerIndex(), and getSubReaderStarts().
Pretty much impossible to do some postprocessing on gathered hits
without at least one of these.

On Tue, Oct 6, 2009 at 19:50, Yonik Seeley  wrote:
> Aaron, could you move this to solr-user?
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Tue, Oct 6, 2009 at 11:22 AM, Aaron McKee  wrote:
>>
>> In the code I'm working with, I generate a cache of calculated values as a
>> by-product within a Filter.getDocidSet implementation (and within a
>> Query-ized version of the filter and its Scorer method) . These values are
>> keyed off the IndexReader's docID values, since that's all that's accessible
>> at that level. Ultimately, however, I need to be able to access these values
>> much higher up in the stack (Solr's QueryComponent.process method), so that
>> I can inject the dynamic values into the response as a fake field. The IDs
>> available here, however, are for the entire index and not just relative to
>> the current IndexReader. I'm still fairly new to Lucene and I've been
>> scratching my head a bit trying to find a reliable way to map these values
>> into the same space, without having to hack up too many base classes. I
>> noticed that there was a related discussion at:
>>
>> http://issues.apache.org/jira/browse/LUCENE-1821?focusedCommentId=12745041#action_12745041
>>
>> ... but also a bit of disagreement on the suggested strategies. Ideally, I'm
>> also hoping there's a strategy that won't require me to hack up too much of
>> the core product; subclassing IndexSearcher in the way suggested would
>> basically require me to change all of the various SearchComponents I use in
>> Solr, and that sounds like it'd end up a real maintenance nightmare. I was
>> looking at the Collector class as possible solution, since it has knowledge
>> of the docbase, but it looks like I'd then need to change every derived
>> collector that the code ultimately uses and, including the various anonymous
>> Collectors in Solr, that also looks like it'd be a fairly ghoulish solution.
>> I suppose I'm being wishful, or lazy, but is there a reasonable and reliable
>> way to do this, without having to fork the core code? If not, any suggestion
>> on the best strategy to accomplish this, without adding too much overhead
>> every time I wanted to up-rev the core Lucene and/or Solr code to the latest
>> version?
>>
>> Thanks a ton,
>> Aaron
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1410:
---

Attachment: LUCENE-1410-codecs.tar.bz2

Attaching sep, intblock and pfordelta codecs, spun out of the last patch on 
LUCENE-1458.

Once LUCENE-1458 is in, we should finish the pfordelta codec to make it a real 
choice.

I actually think some combination of pulsing, standard, pfordelta and simple 
bit packing (in order by increasing term's docFreq), within a single codec, may 
be best.

Ie, rare terms (only in a doc or two) could be inlined into the the terms dict. 
 Slightly more common terms can use the more CPU intensive standard codec.  
Common terms can use cpu-friendly-yet-still-decent-compression pfordelta.  
Obsenely common terms can use bit packing for the fastest decode.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1949) FilterManager uses unsafe keys for its filter cache

2009-10-06 Thread Aaron McKee (JIRA)

FilterManager uses unsafe keys for its filter cache
---

 Key: LUCENE-1949
 URL: https://issues.apache.org/jira/browse/LUCENE-1949
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Aaron McKee
Priority: Minor


re: FilterManager.getFilter(Filter filter)

FilterManager is using the filter's hash code as the key to its filter cache, 
however hash codes are intrinsically not guaranteed to be distinct; different 
filters may hash to the same value. Although the chance of a conflict is 
hopefully low, given reasonable implementations of hashCode, it's certainly not 
impossible. When a conflict does occur, an unintended filter may be returned.

I'm unaware to what extent this class is actively being used, but noticed the 
issue during a code browse and thought I'd at least mention it. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Eks Dev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
 ] 

Eks Dev commented on LUCENE-1410:
-

Mike, 
That is definitely the way to go, distribution dependent encoding, where every 
Term gets individual treatment.
  
Take for an example simple, but not all that rare case where Index gets sorted 
on some of the indexed fields (we use it really extensively, e.g. presorted doc 
collection on user_rights/zip/city, all indexed). There you get perfectly 
"compressible"  postings by simply managing intervals of set bits. Updates 
distort this picture, but we rebuild index periodically and all gets good 
again.  At the moment we load them into RAM as Filters in IntervalSets. if that 
would be possible in lucene, we wouldn't bother with Filters (VInt decoding on 
such super dense fields was killing us, even in RAMDirectory) ...  

Thinking about your comments, isn't pulsing somewhat orthogonal to packing 
method? For example, if you load index into RAMDirecectory, one could avoid one 
indirection level and inline all postings.

Flex Indexing rocks, that is going to be the most important addition to lucene 
since it started (imo)... I would even bet on double search speed  in first 
attempt for average queries :)

Cheers, 
eks 

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Paul Elschot

Eks,

> 
> [ 
> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
>  ] 
> 
> Eks Dev commented on LUCENE-1410:
> -
> 
> Mike, 
> That is definitely the way to go, distribution dependent encoding, where 
> every Term gets individual treatment.
>   
> Take for an example simple, but not all that rare case where Index gets 
> sorted on some of the indexed fields (we use it really extensively, e.g. 
> presorted doc collection on user_rights/zip/city, all indexed). There you get 
> perfectly "compressible"  postings by simply managing intervals of set bits. 
> Updates distort this picture, but we rebuild index periodically and all gets 
> good again.  At the moment we load them into RAM as Filters in IntervalSets. 
> if that would be possible in lucene, we wouldn't bother with Filters (VInt 
> decoding on such super dense fields was killing us, even in RAMDirectory) ... 
>  

You could try switching the Filter to OpenBitSet when that takes fewer bytes 
than SortedVIntList.

Regards,
Paul Elschot

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev

Paul,
the point I was trying to make with this example was extreme,  but realistic. 
Imagine 100Mio docs, sorted on field user_rights,  a term user_rights:XX 
selects 40Mio of them (user rights...). To encode this, you need format with  
two integers (for more of such intervals you would need slightly more, but 
nevertheless, much less than for OpenBitSet, VInts, PFor...  ). Strictly 
speaking this term is dense, but highly compressible and could be inlined with 
pulsing trick...

cheers, eks  




>
>From: Paul Elschot 
>To: java-dev@lucene.apache.org
>Sent: Tuesday, 6 October, 2009 23:33:03
>Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
>
>Eks,
>
>
>> 
>>> [ 
>>> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
>>>  ] 
>>> 
>>> Eks Dev commented on LUCENE-1410:
>>> -
>>> 
>>> Mike, 
>>> That is definitely the way to go, distribution dependent encoding, where 
>>> every Term gets individual treatment.
>>> 
>>> Take for an example simple, but not all that rare case where Index gets 
>>> sorted on some of the indexed fields (we use it really extensively, e.g. 
>>> presorted doc collection on user_rights/zip/city, all indexed). There you 
>>> get perfectly "compressible"  postings by simply managing intervals of set 
>>> bits. Updates distort this picture, but we rebuild index periodically and 
>>> all gets good again.  At the moment we load them into RAM as Filters in 
>>> IntervalSets. if that would be possible in lucene, we wouldn't bother with 
>>> Filters (VInt decoding on such super dense fields was killing us, even in 
>>> RAMDirectory) ... 
>
>
>You could try switching the Filter to OpenBitSet when that takes fewer bytes 
>than SortedVIntList.
>
>
>Regards,
>>Paul Elschot
>
>
>

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev

if you would drive this example further in combination with flex-indexing 
permitting per term postings format, I could imagine some nice tools for 
optimizeHard() , where normal index construction works with defaults as planned 
for solid mix-performance case and at the end you run optimizeHard() where 
postings get resorted on such fields (basically enabling rle encoding to work) 
and at the same time all other terms get optimal encoding format for 
postings... perfect for read only indexes where you want to max performance and 
reduce ix size


>
>From: eks dev 
>To: java-dev@lucene.apache.org
>Sent: Tuesday, 6 October, 2009 23:59:12
>Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
>
>
>Paul,
>the point I was trying to make with this example was extreme,  but realistic. 
>Imagine 100Mio docs, sorted on field user_rights,  a term user_rights:XX 
>selects 40Mio of them (user rights...). To encode this, you need format with  
>two integers (for more of such intervals you would need slightly more, but 
>nevertheless, much less than for OpenBitSet, VInts, PFor...  ). Strictly 
>speaking this term is dense, but highly compressible and could be inlined with 
>pulsing trick...
>
>cheers, eks  
>
>
>
>
>>
>>From: Paul Elschot 
>>To: java-dev@lucene.apache.org
>>Sent: Tuesday, 6 October, 2009 23:33:03
>>Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
>>
>>Eks,
>>
>>
>>> 
> [ 
> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
>  ] 
> 
> Eks Dev commented on LUCENE-1410:
> -
> 
> Mike, 
> That is definitely the way to go, distribution dependent encoding, where 
> every Term gets individual treatment.
> 
> Take for an example simple, but not all that rare case where Index gets 
> sorted on some of the indexed fields (we use it really extensively, e.g. 
> presorted doc collection on user_rights/zip/city, all indexed). There you 
> get perfectly "compressible"  postings by simply managing intervals of 
> set bits. Updates distort this picture, but we rebuild index periodically 
> and all gets good again.  At the moment we load them into RAM as Filters 
> in IntervalSets. if that would be possible in lucene, we wouldn't bother 
> with Filters (VInt decoding on such super dense fields was killing us, 
> even in RAMDirectory) ... 
>>
>>
>>You could try switching the Filter to OpenBitSet when that takes fewer bytes 
>>than SortedVIntList.
>>
>>
>>Regards,
Paul Elschot
>>
>>
>>
>

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Paul Elschot

On Tuesday 06 October 2009 23:59:12 eks dev wrote:
> Paul,
> the point I was trying to make with this example was extreme,  but realistic. 
> Imagine 100Mio docs, sorted on field user_rights,  a term user_rights:XX 
> selects 40Mio of them (user rights...). To encode this, you need format with  
> two integers (for more of such intervals you would need slightly more, but 
> nevertheless, much less than for OpenBitSet, VInts, PFor...  ). Strictly 
> speaking this term is dense, but highly compressible and could be inlined 
> with pulsing trick...

Well, I've been considering to add compressed consecutive ranges to 
SortedVIntList, but I did not
get further than considering. This sounds like the perfect use case for that.

Regards,
Paul Elschot


> 
> cheers, eks  
> 
> 
> 
> 
> >
> >From: Paul Elschot 
> >To: java-dev@lucene.apache.org
> >Sent: Tuesday, 6 October, 2009 23:33:03
> >Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
> >
> >Eks,
> >
> >
> >> 
> >>> [ 
> >>> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
> >>>  ] 
> >>> 
> >>> Eks Dev commented on LUCENE-1410:
> >>> -
> >>> 
> >>> Mike, 
> >>> That is definitely the way to go, distribution dependent encoding, where 
> >>> every Term gets individual treatment.
> >>> 
> >>> Take for an example simple, but not all that rare case where Index gets 
> >>> sorted on some of the indexed fields (we use it really extensively, e.g. 
> >>> presorted doc collection on user_rights/zip/city, all indexed). There you 
> >>> get perfectly "compressible"  postings by simply managing intervals of 
> >>> set bits. Updates distort this picture, but we rebuild index periodically 
> >>> and all gets good again.  At the moment we load them into RAM as Filters 
> >>> in IntervalSets. if that would be possible in lucene, we wouldn't bother 
> >>> with Filters (VInt decoding on such super dense fields was killing us, 
> >>> even in RAMDirectory) ... 
> >
> >
> >You could try switching the Filter to OpenBitSet when that takes fewer bytes 
> >than SortedVIntList.
> >
> >
> >Regards,
> >>Paul Elschot
> >
> >
> >
> 
> 
>

[jira] Updated: (LUCENE-1856) Remove Hits

2009-10-06 Thread Michael Busch (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-1856:
--

Attachment: lucene-1856.patch

Removes Hits from core and all contribs.

All core and contrib tests pass. I'll have to commit some changes also to the 
bw-compat branch.

> Remove Hits
> ---
>
> Key: LUCENE-1856
> URL: https://issues.apache.org/jira/browse/LUCENE-1856
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Search
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1856.patch
>
>
> LUCENE-1290 removed all references to Hits from core.
> Most work to be done here is to remove all references from the contrib 
> modules and some new ones that crept into core after 1290.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Removing deprecated classes

2009-10-06 Thread Michael Busch


Hi all,

I've attached a patch to LUCENE-1856, which removes Hits. I'm not sure 
if someone has uncommitted big 3.0 patches that I'll mess up if I commit 
1856?


While working on 1856 I realized how tedious this stuff is! So Uwe, Mark 
& Co, let me know if you want me to wait with committing my patch!


 Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762828#action_12762828
 ] 

Mark Miller commented on LUCENE-1458:
-

bq. Hey! I'm not so old  But yeah I still cling to emacs. 

can you say both of those things in the same breath? Just how long did it take 
to get that phd...

I'd look it up and guestimate your age, but I think MIT still has my ip blocked 
from back when I was applying to colleges. So I'm going with the "uses emacs" 
guestimate.

bq. Hey, I know people who still cling to vi!

vi is the only one I can half way use - I know 3 commands - edit mode, leave 
edit mode, and save. And every now and then I accidently delete a whole line. 
When I make a change that I don't want to save, I have to kill the power.

The patch is in a bit of an unpatchable state ;) I think I know what editor to 
blame...Pico!

Our old friend, the $id is messing up WildcardTermEnum - no problem, I can fix 
that...

But also, NumericUtils is unpatched, Codec is missing, along with most of the 
classes from the codecs packages! This looks like my work :)

My only conclusion is that your one of those guys that can write the whole 
program once without even running it - and then it works perfectly on the first 
go. Thats the only way I can explain those classes in the wrong package 
previously as well :) No bug hunting tonight :(

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762831#action_12762831
 ] 

Mark Miller commented on LUCENE-1458:
-

nope - something else - looking through the patch I see the files I want - a 
second attempt at patching has gone over better.

A couple errors still, but stuff I think I can fix so that I can at least look 
over. False alarm. My patcher wonked out or something. I can resolve the few 
errors that popped up this time. Sweet.

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead of
> term-doc-position level, you could just add a new attribute.
>   * Test performance & iterate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762831#action_12762831
 ] 

Mark Miller edited comment on LUCENE-1458 at 10/6/09 6:23 PM:
--

nope - something else - looking through the patch I see the files I want - a 
second attempt at patching has gone over better.

A couple errors still, but stuff I think I can fix so that I can at least look 
over. False alarm. My patcher wonked out or something. I can resolve the few 
errors that popped up this time. Sweet.

*edit*

Just for reference - not sure what happened the first time - my patch preview 
looked the same both times (was only complaining about the $id), but completely 
failed on attempt one and worked on attempt two - the only issue now appears to 
be you have half switch deletedDocs to Bits from BitVector - but only have way, 
so its broken in a dozen places. Not sure what you are doing about size() and 
what not, so I'm just gonna read around.

  was (Author: markrmil...@gmail.com):
nope - something else - looking through the patch I see the files I want - 
a second attempt at patching has gone over better.

A couple errors still, but stuff I think I can fix so that I can at least look 
over. False alarm. My patcher wonked out or something. I can resolve the few 
errors that popped up this time. Sweet.
  
> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
>

[jira] Issue Comment Edited: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762831#action_12762831
 ] 

Mark Miller edited comment on LUCENE-1458 at 10/6/09 6:34 PM:
--

nope - something else - looking through the patch I see the files I want - a 
second attempt at patching has gone over better.

A couple errors still, but stuff I think I can fix so that I can at least look 
over. False alarm. My patcher wonked out or something. I can resolve the few 
errors that popped up this time. Sweet.

*edit*

Just for reference - not sure what happened the first time - my patch preview 
looked the same both times (was only complaining about the $id), but completely 
failed on attempt one and worked on attempt two - the only issue now appears to 
be you have half switch deletedDocs to Bits from BitVector - but only have way, 
so its broken in a dozen places. Not sure what you are doing about size() and 
what not, so I'm just gonna read around.

*edit*

Yes - I found it - BitVector was supposed to implement Bits - which was in the 
patch ... this patch just did not want to apply. I guess it was right, but 
Eclipse just did not want it to take ...

  was (Author: markrmil...@gmail.com):
nope - something else - looking through the patch I see the files I want - 
a second attempt at patching has gone over better.

A couple errors still, but stuff I think I can fix so that I can at least look 
over. False alarm. My patcher wonked out or something. I can resolve the few 
errors that popped up this time. Sweet.

*edit*

Just for reference - not sure what happened the first time - my patch preview 
looked the same both times (was only complaining about the $id), but completely 
failed on attempt one and worked on attempt two - the only issue now appears to 
be you have half switch deletedDocs to Bits from BitVector - but only have way, 
so its broken in a dozen places. Not sure what you are doing about size() and 
what not, so I'm just gonna read around.
  
> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsP

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-06 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762842#action_12762842
 ] 

Mark Miller commented on LUCENE-1458:
-

Bah - all this huffing an puffing over the patch and I'm too sick to stay up 
late anyway.

Have you started benching at all? I'm seeing like a 40-50% drop in same reader 
search benches with standard, sep, and pulsing. Like 80% with intblock.

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead of
> term-doc-position level, you could just add a new attribute.
>   * Test performance & iterate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Removing deprecated classes

2009-10-06 Thread Uwe Schindler

Hi Mark,

no problem, go forward. I am on a trip until Saturday evening, so no
problems.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael Busch [mailto:busch...@gmail.com]
> Sent: Wednesday, October 07, 2009 2:17 AM
> To: java-dev@lucene.apache.org
> Subject: Removing deprecated classes
> 
> Hi all,
> 
> I've attached a patch to LUCENE-1856, which removes Hits. I'm not sure
> if someone has uncommitted big 3.0 patches that I'll mess up if I commit
> 1856?
> 
> While working on 1856 I realized how tedious this stuff is! So Uwe, Mark
> & Co, let me know if you want me to wait with committing my patch!
> 
>   Michael
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

wildcardquery rewrite()

2009-10-06 Thread Robert Muir

someone asked this question on the user list:
http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery

it made me look at the wildcard rewrite(), where i see this:
if (!termContainsWildcard)
  return new TermQuery(getTerm());

is it a problem the boost is not preserved in this special case?

is it also a problem that if the user sets the default MultiTermQuery
rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
that this rewritten TermQuery isn't wrapped with a constant score?

Sorry if it seems a bit nitpicky, really the issue is that I want to
do the right thing for a more complex query I am working on, but don't
want to overkill either.
-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: wildcardquery rewrite()

2009-10-06 Thread Robert Muir

separately, perhaps we should consider doing the prefixquery rewrite
here for wildcardquery.

for example, SolrQueryParser will emit these 'wildcardqueries that
should be prefixqueries' if you are using the new reverse stuff for
leading wildcards: WildcardQuery(*foobar) ->
WildcardQuery(U+0001raboof*)

I don't think the prefix enumeration is really that much faster than
the wildcard one, but still thought I would mention it.

On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
> someone asked this question on the user list:
> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>
> it made me look at the wildcard rewrite(), where i see this:
>    if (!termContainsWildcard)
>      return new TermQuery(getTerm());
>
> is it a problem the boost is not preserved in this special case?
>
> is it also a problem that if the user sets the default MultiTermQuery
> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
> that this rewritten TermQuery isn't wrapped with a constant score?
>
> Sorry if it seems a bit nitpicky, really the issue is that I want to
> do the right thing for a more complex query I am working on, but don't
> want to overkill either.
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Removing deprecated classes

2009-10-06 Thread Michael Busch


Cool, I'll commit 1856 soon then. Thanks!
 Michael

On 10/6/09 7:12 PM, Uwe Schindler wrote:

Hi Mark,

no problem, go forward. I am on a trip until Saturday evening, so no
problems.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


   

-Original Message-
From: Michael Busch [mailto:busch...@gmail.com]
Sent: Wednesday, October 07, 2009 2:17 AM
To: java-dev@lucene.apache.org
Subject: Removing deprecated classes

Hi all,

I've attached a patch to LUCENE-1856, which removes Hits. I'm not sure
if someone has uncommitted big 3.0 patches that I'll mess up if I commit
1856?

While working on 1856 I realized how tedious this stuff is! So Uwe, Mark
&  Co, let me know if you want me to wait with committing my patch!

   Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


   



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1085) search.function should support all capabilities of Solr's search.function

2009-10-06 Thread Michael Busch (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-1085:
--

Fix Version/s: (was: 3.0)
   3.1

> search.function should support all capabilities of Solr's search.function
> -
>
> Key: LUCENE-1085
> URL: https://issues.apache.org/jira/browse/LUCENE-1085
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Doron Cohen
>Priority: Minor
> Fix For: 3.1
>
>
> Lucene search.function does not allow Solr to move to use it, and so Solr 
> currently maintains its own version of this package.
> Enhance Lucene's search.function so that Solr can move to use it, and avoid 
> this redundancy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1856) Remove Hits

2009-10-06 Thread Michael Busch (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-1856.
---

Resolution: Fixed

Committed revision 822587.

> Remove Hits
> ---
>
> Key: LUCENE-1856
> URL: https://issues.apache.org/jira/browse/LUCENE-1856
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Search
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1856.patch
>
>
> LUCENE-1290 removed all references to Hits from core.
> Most work to be done here is to remove all references from the contrib 
> modules and some new ones that crept into core after 1290.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Removing deprecated classes

2009-10-06 Thread Uwe Schindler

Everything was fine, no collision with my patches in my checkouts :-) Just
synced in Munich Airport before leaving :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael Busch [mailto:busch...@gmail.com]
> Sent: Wednesday, October 07, 2009 6:19 AM
> To: java-dev@lucene.apache.org
> Subject: Re: Removing deprecated classes
> 
> Cool, I'll commit 1856 soon then. Thanks!
>   Michael
> 
> On 10/6/09 7:12 PM, Uwe Schindler wrote:
> > Hi Mark,
> >
> > no problem, go forward. I am on a trip until Saturday evening, so no
> > problems.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >
> >> -Original Message-
> >> From: Michael Busch [mailto:busch...@gmail.com]
> >> Sent: Wednesday, October 07, 2009 2:17 AM
> >> To: java-dev@lucene.apache.org
> >> Subject: Removing deprecated classes
> >>
> >> Hi all,
> >>
> >> I've attached a patch to LUCENE-1856, which removes Hits. I'm not sure
> >> if someone has uncommitted big 3.0 patches that I'll mess up if I
> commit
> >> 1856?
> >>
> >> While working on 1856 I realized how tedious this stuff is! So Uwe,
> Mark
> >> &  Co, let me know if you want me to wait with committing my patch!
> >>
> >>Michael
> >>
> >> -
> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

39 matches

Mail list logo