date:20071203

Re: [jira] Resolved: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Michael Busch

Karl Wettin wrote:
> 
> 
> Sorry if I've missed any discussion about it,
> 
> but the snapshots still seems to depend on Clover?
> 

Hi Karl,

no that has been fixed. There should not be any clover instrumentation
in the latest snapshots anymore.

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store

2007-12-03 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548159
 ] 

Otis Gospodnetic commented on LUCENE-1039:
--

Skimmed this very quickly - looks nice and clean to me!
Why is this not in contrib yet?  I didn't spot any dependenciesare there 
any?


> Bayesian classifiers using Lucene as data store
> ---
>
> Key: LUCENE-1039
> URL: https://issues.apache.org/jira/browse/LUCENE-1039
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Karl Wettin
>Priority: Minor
> Attachments: LUCENE-1039.txt
>
>
> Bayesian classifiers using Lucene as data store. Based on the Naive Bayes and 
> Fisher method algorithms as described by Toby Segaran in "Programming 
> Collective Intelligence", ISBN 978-0-596-52932-1. 
> Have fun.
> Poor java docs, but the TestCase shows how to use it:
> {code:java}
> public class TestClassifier extends TestCase {
>   public void test() throws Exception {
> InstanceFactory instanceFactory = new InstanceFactory() {
>   public Document factory(String text, String _class) {
> Document doc = new Document();
> doc.add(new Field("class", _class, Field.Store.YES, 
> Field.Index.NO_NORMS));
> doc.add(new Field("text", text, Field.Store.YES, Field.Index.NO, 
> Field.TermVector.NO));
> doc.add(new Field("text/ngrams/start", text, Field.Store.NO, 
> Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.add(new Field("text/ngrams/inner", text, Field.Store.NO, 
> Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.add(new Field("text/ngrams/end", text, Field.Store.NO, 
> Field.Index.TOKENIZED, Field.TermVector.YES));
> return doc;
>   }
>   Analyzer analyzer = new Analyzer() {
> private int minGram = 2;
> private int maxGram = 3;
> public TokenStream tokenStream(String fieldName, Reader reader) {
>   TokenStream ts = new StandardTokenizer(reader);
>   ts = new LowerCaseFilter(ts);
>   if (fieldName.endsWith("/ngrams/start")) {
> ts = new EdgeNGramTokenFilter(ts, 
> EdgeNGramTokenFilter.Side.FRONT, minGram, maxGram);
>   } else if (fieldName.endsWith("/ngrams/inner")) {
> ts = new NGramTokenFilter(ts, minGram, maxGram);
>   } else if (fieldName.endsWith("/ngrams/end")) {
> ts = new EdgeNGramTokenFilter(ts, EdgeNGramTokenFilter.Side.BACK, 
> minGram, maxGram);
>   }
>   return ts;
> }
>   };
>   public Analyzer getAnalyzer() {
> return analyzer;
>   }
> };
> Directory dir = new RAMDirectory();
> new IndexWriter(dir, null, true).close();
> Instances instances = new Instances(dir, instanceFactory, "class");
> instances.addInstance("hello world", "en");
> instances.addInstance("hallå världen", "sv");
> instances.addInstance("this is london calling", "en");
> instances.addInstance("detta är london som ringer", "sv");
> instances.addInstance("john has a long mustache", "en");
> instances.addInstance("john har en lång mustache", "sv");
> instances.addInstance("all work and no play makes jack a dull boy", "en");
> instances.addInstance("att bara arbeta och aldrig leka gör jack en trist 
> gosse", "sv");
> instances.addInstance("shrimp sandwich", "en");
> instances.addInstance("räksmörgås", "sv");
> instances.addInstance("it's now or never", "en");
> instances.addInstance("det är nu eller aldrig", "sv");
> instances.addInstance("to tie up at a landing-stage", "en");
> instances.addInstance("att angöra en brygga", "sv");
> instances.addInstance("it's now time for the children's television 
> shows", "en");
> instances.addInstance("nu är det dags för barnprogram", "sv");
> instances.flush();
> testClassifier(instances, new NaiveBayesClassifier());
> testClassifier(instances, new FishersMethodClassifier());
> instances.close();
>   }
>   private void testClassifier(Instances instances, BayesianClassifier 
> classifier) throws IOException {
> assertEquals("sv", classifier.classify(instances, "detta blir ett 
> test")[0].getClassification());
> assertEquals("en", classifier.classify(instances, "this will be a 
> test")[0].getClassification());
> // test training data instances. all ought to match!
> for (int documentNumber = 0; documentNumber < 
> instances.getIndexReader().maxDoc(); documentNumber++) {
>   if (!instances.getIndexReader().isDeleted(documentNumber)) {
> Map features = 
> instances.extractFeatures(instances.getIndexReader(), documentNumber, 
> classifier.isNormalized());
> Document document = 
> instances.getIndexReader().document(documentNumber);
> assertEquals(document.get("class"), classifier.c

losing mails from user list?

2007-12-03 Thread Doron Cohen


Hi, do others also lose emails from the user list?

Not receiving back a response I sent to user list I checked the on-line
archive
and found that Michael already answered yesterday but I never got that,
reply,
and a few more. The dev list seems to work fine for me.

Do others see similar issues with the user list?

Thanks,
Doron


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Reopened: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

2007-12-03 Thread Michael Busch (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch reopened LUCENE-1072:
---


I'm seeing a similar issue when TokenStream.next() throws an
IOException (or a RuntimeException). The DocumentsWriter is
thereafter not usable anymore, i. e. subsequent calls of 
addDocument()  fail with a NullPointerException.

I added this test to TestIndexWriter which shows the problem:
{code:java}
  public void testExceptionFromTokenStream() throws IOException {
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new Analyzer() {

  public TokenStream tokenStream(String fieldName, Reader reader) {
return new TokenFilter(new StandardTokenizer(reader)) {
  private int count = 0;

  public Token next() throws IOException {
if (count++ == 5) {
  throw new IOException();
}
return input.next();
  }
};
  }

}, true);

Document doc = new Document();
String contents = "aa bb cc dd ee ff gg hh ii jj kk";
doc.add(new Field("content", contents, Field.Store.NO,
Field.Index.TOKENIZED));
try {
  writer.addDocument(doc);
  fail("did not hit expected exception");
} catch (Exception e) {
}

// Make sure we can add another normal document
doc = new Document();
doc.add(new Field("content", "aa bb cc dd", Field.Store.NO,
Field.Index.TOKENIZED));
writer.addDocument(doc);

// Make sure we can add another normal document
doc = new Document();
doc.add(new Field("content", "aa bb cc dd", Field.Store.NO,
Field.Index.TOKENIZED));
writer.addDocument(doc);

writer.close();
  }

{code}

> NullPointerException during indexing in 
> DocumentsWriter$ThreadState$FieldData.addPosition
> -
>
> Key: LUCENE-1072
> URL: https://issues.apache.org/jira/browse/LUCENE-1072
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
> Environment: Linux CentOS 5 x86_64 running on 2-core Pentium D, Java 
> HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using 
> lucene-core-2007-11-29_02-49-31
>Reporter: Alexei Dets
>Assignee: Michael McCandless
> Fix For: 2.3
>
> Attachments: LUCENE-1072.patch
>
>
> In my case during indexing sometimes appear documents with unusually large 
> "words" - text-encoded images in fact.
> Attempt to add document that contains field with such token produces 
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: term length 37944 exceeds max term length 
> 16383
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
> at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is expected, exception is caught and ignored. The problem is that after 
> this IndexWriter becomes somewhat corrupted and subsequent attempts to add 
> documents to the index fail as well, this time with NPE:
> java.lang.NullPointerException
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
> at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is 100% reproducible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548088
 ] 

Grant Ingersoll commented on LUCENE-1045:
-

Yes.  True.  Here you and Doug finally had me convinced and now... :-)

> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> LUCENE-1045.patch, TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548079
 ] 

Yonik Seeley commented on LUCENE-1045:
--

> With this latest patch, they will still be able to do that.

Only if they recompile.  Simply dropping in a new lucene jar would break their 
existing FieldCache usage.

> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> LUCENE-1045.patch, TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548078
 ] 

Grant Ingersoll commented on LUCENE-1045:
-

With this latest patch, they will still be able to do that.  I made FC a 
full-blown public class and deleted FieldCacheImpl.

So far, there has been one user who responded to my request for people who have 
implemented FieldCache: 
http://www.gossamer-threads.com/lists/lucene/java-user/55402

However, the user already says it isn't a big deal for us to change it.

> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> LUCENE-1045.patch, TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned LUCENE-1045:
---

Assignee: Grant Ingersoll

> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> LUCENE-1045.patch, TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548053
 ] 

Yonik Seeley commented on LUCENE-1045:
--

Actually, I'm not sure we should change it to an abstract class now... that's 
not a backward compatible change for normal users, right?

People very likely access the current FieldCache via 
FieldCache.DEFAULT.get...() or
FieldCache f = FieldCache.DEFAULT

So as long as no one has any custom implementations, we can at least add new 
methods to the FieldCache interface and implement them in FieldCacheImpl

> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> LUCENE-1045.patch, TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548030
 ] 

Paul Elschot commented on LUCENE-584:
-

In case there is a better name than Matcher for a Scorer without a score() 
method (and maybe without an explain() method), I'm all ears. Names are 
important, and at this point they can still be changed very easily.

For Matcher I'd rather have a method to estimate the number of matching docs 
than a size() method. This estimate would be useful in implementing 
conjunctions, as the Matchers with the lowest estimates could be used first. 
However, this is another issue.


> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Assignee: Michael Busch
>Priority: Minor
> Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
> lucene-584.patch, Matcher-20070905-2default.patch, 
> Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
> Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547988
 ] 

Mark Harwood commented on LUCENE-584:
-

I'm getting lost as to which patches we're considering here. I was looking at 
lucene-584-take2 patch.

MatcherProvider in the earlier patch does look like something that will help 
with caching.

>>Would those be a good starting point?

Overall I feel uncomfortable with a lot of the classnames. I think the use of 
"Matcher" says more about what you want to do with the class in this particular 
case rather than what _it_ does generally. I have other uses in mind for these 
classes that are outside of filtering search results. For me, these classes can 
be thought of much more simply as utility classes in the same mould as the java 
Collections API. Fundamentally, they are efficient implementations of 
sets/lists of integers with support for iterators. The whole thing would be a 
lot cleaner if classes were named around this scheme.
"MatcherProvider" for example is essentially a DocIdSet  which creates forms of 
DocIdSetIterators (Matchers) and could also usefully have a size() method. 



> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Assignee: Michael Busch
>Priority: Minor
> Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
> lucene-584.patch, Matcher-20070905-2default.patch, 
> Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
> Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547959
 ] 

Grant Ingersoll commented on LUCENE-935:


My cron job copies from the Hudson dir to p.a.o.  Whereas the Hudson script 
runs under a different account

I realize this isn't great, but we asked infrastructure for a headless acct for 
Hudson on p.a.o and it was denied.

I think for now, we can just leave it as is.

> Improve maven artifacts
> ---
>
> Key: LUCENE-935
> URL: https://issues.apache.org/jira/browse/LUCENE-935
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Attachments: lucene-935-new.patch, lucene-935-remote-repos.patch, 
> lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
> - artifacts "lucene-parent" should extend "apache-parent"
> - add source jars as artifacts
> - update  task to work with latest version of 
> maven-ant-tasks.jar
> - metadata filenames should not contain "local"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547958
 ] 

Paul Elschot commented on LUCENE-584:
-

Mark, in the latest Matcher-2default.patch there is the 
org.apache.lucene.MatcherProvider interface with this javadoc:

/** To be used in a cache to implement caching for a MatchFilter. */

This interface has only one method:

public Matcher getMatcher();


There is also a cache for filters in the Matcher3core.patch in the class 
CachingWrapperFilter .

Would those be a good starting point?


> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Assignee: Michael Busch
>Priority: Minor
> Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
> lucene-584.patch, Matcher-20070905-2default.patch, 
> Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
> Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Michael Busch (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547950
 ] 

Michael Busch commented on LUCENE-935:
--

generate-maven-artifacts still works fine if no m2.* properties are overridden. 
It
then deploys locally to /maven as before.

So if the account doesn't have access to p.a.o, how does it copy the artifacts 
then?
Do we use different accounts to run nightly.sh and the cron job currently?

> Improve maven artifacts
> ---
>
> Key: LUCENE-935
> URL: https://issues.apache.org/jira/browse/LUCENE-935
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Attachments: lucene-935-new.patch, lucene-935-remote-repos.patch, 
> lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
> - artifacts "lucene-parent" should extend "apache-parent"
> - add source jars as artifacts
> - update  task to work with latest version of 
> maven-ant-tasks.jar
> - metadata filenames should not contain "local"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: setRAMBufferSizeMB and DEFAULT_RAM_BUFFER_SIZE_MB missing from IndexWriter !

2007-12-03 Thread Doug Cutting


Grant Ingersoll wrote:
http://www.gossamer-threads.com/lists/lucene/java-dev/42616?search_string=javadocs%20nightly%20build contains 
the discussion on this from way back when.


In that discussion I said that, "links to nightly builds should be 
confined to 'developer' portions of the site".  That's not yet the case 
today.


I think we pretty clearly 
mark the javadocs as being nightly build, but I suppose we could make 
the Documentation->Javadoc link explicitly say it, something like 
Javadoc Nightly Build.  


Links to unreleased software and documentation must be in a separate 
"developer" section of the website.  Describing these as "nightly" 
doesn't help, it just elevates "nightly" to an officially available 
release to the general public, which it must not be.


Otherwise, we could replace the link there with 
a page that provides links to the various released versions and the 
nightly build.


Again, nightly artifacts must not available from the same section of the 
website as official releases.  We should have a separate tab or menu 
section for developer resources.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: setRAMBufferSizeMB and DEFAULT_RAM_BUFFER_SIZE_MB missing from IndexWriter !

2007-12-03 Thread Grant Ingersoll



On Dec 3, 2007, at 1:31 PM, Doug Cutting wrote:


Grant Ingersoll wrote:
Right, the javadocs are for the nightly build.   See the Site  
Versions section of http://lucene.apache.org/java/docs/index.html  
for releases.


Unreleased stuff should only be linked to in a "developer" section  
of the website.  Right now the primary javadoc links on the website  
are to unreleased documentation.  We should fix this.


Unreleased versions of software and documentation are not meant to  
be published under the Apache license.  This permits us a review of  
what's released, so that if something were to enter subversion that  
should not be released, we have an opportunity to remove it.  Thus  
we must maintain a clear line between what we expect the general  
public to download and view under the license, and the draft  
versions that we, the developer community, share amongst ourselves.   
Publishing unreleased things in a way that could be perceived as an  
official publication for end-users could weaken our control of what  
is licensed.


http://www.gossamer-threads.com/lists/lucene/java-dev/42616?search_string=javadocs%20nightly%20build 
 contains the discussion on this from way back when.  I think we  
pretty clearly mark the javadocs as being nightly build, but I suppose  
we could make the Documentation->Javadoc link explicitly say it,  
something like Javadoc Nightly Build.  Otherwise, we could replace the  
link there with a page that provides links to the various released  
versions and the nightly build.


-Grant


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: setRAMBufferSizeMB and DEFAULT_RAM_BUFFER_SIZE_MB missing from IndexWriter !

2007-12-03 Thread Doug Cutting


Grant Ingersoll wrote:
Right, the javadocs are for the nightly build.   See the Site Versions 
section of http://lucene.apache.org/java/docs/index.html for releases.


Unreleased stuff should only be linked to in a "developer" section of 
the website.  Right now the primary javadoc links on the website are to 
unreleased documentation.  We should fix this.


Unreleased versions of software and documentation are not meant to be 
published under the Apache license.  This permits us a review of what's 
released, so that if something were to enter subversion that should not 
be released, we have an opportunity to remove it.  Thus we must maintain 
a clear line between what we expect the general public to download and 
view under the license, and the draft versions that we, the developer 
community, share amongst ourselves.  Publishing unreleased things in a 
way that could be perceived as an official publication for end-users 
could weaken our control of what is licensed.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Issue Comment Edited: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547913
 ] 

gsingers edited comment on LUCENE-935 at 12/3/07 9:38 AM:
-

OK, now we just need to figure out how best to incorporate it into nightly.sh.  
Part of the problem is Hudson is the account running nightly.sh and it doesn't 
have an account on p.a.o.  Thus, the need for Hudson to access a private key of 
one of us w/ a zones account.  I don't really like this idea.



  was (Author: gsingers):
OK, now we just need to figure out how best to incorporate it into 
nightly.sh.  Part of the problem is Hudson is the account running nightly.sh 
and it doesn't have an account on p.a.o.  Thus, the need for Hudson to access a 
private key of one of us w/ a zones account.  I don't really like this idea.

Does generate-maven-artifacts fail if the remote repo stuff is not specified?  
That is, can it still just generate the artifacts, but not publish them?  
Perhaps a better approach would be to separate out generation from publication. 
 This would allow us at release time to publish to the main repo, but still use 
the cron job for nightly.
  
> Improve maven artifacts
> ---
>
> Key: LUCENE-935
> URL: https://issues.apache.org/jira/browse/LUCENE-935
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Attachments: lucene-935-new.patch, lucene-935-remote-repos.patch, 
> lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
> - artifacts "lucene-parent" should extend "apache-parent"
> - add source jars as artifacts
> - update  task to work with latest version of 
> maven-ant-tasks.jar
> - metadata filenames should not contain "local"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547913
 ] 

Grant Ingersoll commented on LUCENE-935:


OK, now we just need to figure out how best to incorporate it into nightly.sh.  
Part of the problem is Hudson is the account running nightly.sh and it doesn't 
have an account on p.a.o.  Thus, the need for Hudson to access a private key of 
one of us w/ a zones account.  I don't really like this idea.

Does generate-maven-artifacts fail if the remote repo stuff is not specified?  
That is, can it still just generate the artifacts, but not publish them?  
Perhaps a better approach would be to separate out generation from publication. 
 This would allow us at release time to publish to the main repo, but still use 
the cron job for nightly.

> Improve maven artifacts
> ---
>
> Key: LUCENE-935
> URL: https://issues.apache.org/jira/browse/LUCENE-935
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Attachments: lucene-935-new.patch, lucene-935-remote-repos.patch, 
> lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
> - artifacts "lucene-parent" should extend "apache-parent"
> - add source jars as artifacts
> - update  task to work with latest version of 
> maven-ant-tasks.jar
> - metadata filenames should not contain "local"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-1045:


Attachment: LUCENE-1045.patch

Drops ExtendedFieldCache, puts everything into FieldCache, adds support to 
SortField and FieldSortedHitQueue for sorting on bytes and longs.  Drops 
FieldCacheImpl as it doesn't really serve any purpose once you make FieldCache 
a class.

Note this breaks the back-compat. contract on FieldCache interface.


> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> LUCENE-1045.patch, TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Michael Busch (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547909
 ] 

Michael Busch commented on LUCENE-935:
--

Oups, I forgot to remove the lines that generate the checksums for the
artifacts.  does this automatically. I committed the fix.

I tried it out on zones, the build succeeds now. 

> Improve maven artifacts
> ---
>
> Key: LUCENE-935
> URL: https://issues.apache.org/jira/browse/LUCENE-935
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Attachments: lucene-935-new.patch, lucene-935-remote-repos.patch, 
> lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
> - artifacts "lucene-parent" should extend "apache-parent"
> - add source jars as artifacts
> - update  task to work with latest version of 
> maven-ant-tasks.jar
> - metadata filenames should not contain "local"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about aborting background merges

2007-12-03 Thread Michael Busch

Michael McCandless wrote:
> 
> Well that certainly sounds spooky!  You mean this test is using the
> one of IndexWriter's ctors that relies on IndexReader.indexExists() to
> decide whether to pass create=true?
> 

Not sure, but I'll try to find out today.

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547901
 ] 

Mark Harwood commented on LUCENE-584:
-

To go back to post #1 on this topic:

   _"Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
memory. It would be desirable to have an alternative BitSet implementation with 
smaller memory footprint."_

Given the motivation to move to more memory efficient structures  why is the 
only attempt at caching dedicated exclusively to caching the very structures we 
were trying to move away from?.

   _"I deprecated also CachingWrapperFilter and RemoteCachingWrapperFilter 
and added corresponding CachingBitSetFilter and RemoteCachingBitSetFilter"_

Does this suggest we are to have type-specific CachingXFilters and 
RemoteCachingXFilters created for every new filter type? Why not provide a 
single caching mechanism that works for all those other, new, more 
memory-efficient structures? I beleive the reason this hasn't been done is due 
to the issue I highlighted earlier - the cachable artefacts (what I chose to 
call "DocIdSet" here: [#action_12518642] ) are not modelled in  a way which 
promotes re-use. That's why we would end up needing a specialised caching 
implementations for each type. 

If we are to move forward from the existing Lucene implementation it's 
important to note the change:

* Filters currently produce, at great cost, BitSets. Bitsets provide both a 
cachable data structure and a thread-safe, reusable  means of iterating across 
the contents.

* By replacing BitSets with Matchers this proposal has removed an important 
aspect of the existing design -  the visibility (and therefore cachability) of 
these expensive-to-recreate data structures. Matchers are single-use, 
non-threadsafe objects and hide the data structure over which they iterate. 
With this change if I want to implement a caching mechanism in my application I 
need to know the Filter type and what sort of data structure it returns and get 
it from it directly:
  if(myFilter instanceof BitSetFilter)wrap specific data structure using 
CachingBitSetFilter
  else
  if(myFilter instanceof OpenBitSetFilter)   wrap specific data structure using 
CachingXFilter
  else...

...looks like an Anti-pattern to me. Worse, this ties the choice of 
datastructure to the type of Filter that produces it. Why can't my RangeFilter 
be free to create a SortedVIntList or a BitSet depending on the sparseness of 
matches for a particular set of criteria?

I'm not saying "lets just stick with Bitsets", just consider caching more in 
the design. Post [#action_12518642] lays out how this could be modelled with 
the introduction of DocIdSet and DocIdSetIterator as separate responsibilities 
(whereas Matcher currently combines them both).

Cheers
Mark














> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Assignee: Michael Busch
>Priority: Minor
> Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
> lucene-584.patch, Matcher-20070905-2default.patch, 
> Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
> Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2007-12-03 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547895
 ] 

Paul Elschot commented on LUCENE-584:
-

A few remarks on the lucene-584-take2 patch:

In the @deprecated javadoc at Filter.bits() a reference to BitSetFilter could 
be added.

While Filter.bits() is still deprecated, one could also use the BitSet in 
IndexSearcher
in case this turns out to be performance sensitive; see also my remark of 28 
November.

A few complete (test) classes are deprecated, it might be good to add the 
target release
for removal there.

For the rest this patch looks good to me. Did you also run ant test-contrib ?

> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.0.1
>Reporter: Peter Schäfer
>Assignee: Michael Busch
>Priority: Minor
> Attachments: bench-diff.txt, bench-diff.txt, lucene-584-take2.patch, 
> lucene-584.patch, Matcher-20070905-2default.patch, 
> Matcher-20070905-3core.patch, Matcher-20071122-1ground.patch, Some 
> Matchers.zip
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1075) Possible thread hazard in IndexWriter.close(false)

2007-12-03 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1075:
---

Attachment: LUCENE-1075.patch

Attached patch.  I'll commit in 1 or 2 days.

> Possible thread hazard in IndexWriter.close(false)
> --
>
> Key: LUCENE-1075
> URL: https://issues.apache.org/jira/browse/LUCENE-1075
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3
>
> Attachments: LUCENE-1075.patch
>
>
> Spinoff from this thread:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/55391
> On reviewing the code I found one case where an aborted merge (from
> calling close(false)) could write to files that a newly opened
> IndexWriter would also try to write to.
> I strengthened an existing test case in TestConcurrentMergeScheduler
> to tickle this case, and also modified MockRAMDirectory to throw an
> IOException if ever a file besides segments.gen is overwritten.
> However, strangely, I can't get an unhandled exception to occur during
> the test and I'm not sure why.  Still I think this is a good defensive
> check so we should commit it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1075) Possible thread hazard in IndexWriter.close(false)

2007-12-03 Thread Michael McCandless (JIRA)

Possible thread hazard in IndexWriter.close(false)
--

 Key: LUCENE-1075
 URL: https://issues.apache.org/jira/browse/LUCENE-1075
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.3


Spinoff from this thread:

  http://www.gossamer-threads.com/lists/lucene/java-dev/55391

On reviewing the code I found one case where an aborted merge (from
calling close(false)) could write to files that a newly opened
IndexWriter would also try to write to.

I strengthened an existing test case in TestConcurrentMergeScheduler
to tickle this case, and also modified MockRAMDirectory to throw an
IOException if ever a file besides segments.gen is overwritten.

However, strangely, I can't get an unhandled exception to occur during
the test and I'm not sure why.  Still I think this is a good defensive
check so we should commit it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Resolved: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Karl Wettin



3 dec 2007 kl. 08.35 skrev Michael Busch (JIRA):


we're deploying nightly snapshots
to the m2 snapshot repository.



Sorry if I've missed any discussion about it,

but the snapshots still seems to depend on Clover?


--
karl

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Issue Comment Edited: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547799
 ] 

gsingers edited comment on LUCENE-935 at 12/3/07 3:49 AM:
-

{quote}
can you try out this patch
{quote}


BUILD FAILED
/lucene/java/lucene935/build.xml:459: Specify at least one source - a file 
or a resource collection.

This happened after lucene-xml-query-parser.  It seems like most everything 
went through up to that point and I don't see any other errors.

  was (Author: gsingers):
{quote}
can you try out this patch
{quote}


BUILD FAILED
/lucene/java/lucene935/build.xml:459: Specify at least one source - a file 
or a resource collection.


Also, note, you have Zones access
  
> Improve maven artifacts
> ---
>
> Key: LUCENE-935
> URL: https://issues.apache.org/jira/browse/LUCENE-935
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Attachments: lucene-935-new.patch, lucene-935-remote-repos.patch, 
> lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
> - artifacts "lucene-parent" should extend "apache-parent"
> - add source jars as artifacts
> - update  task to work with latest version of 
> maven-ant-tasks.jar
> - metadata filenames should not contain "local"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547803
 ] 

Grant Ingersoll commented on LUCENE-1045:
-

{quote}
short support
{quote}
FieldCache already has shorts support, so no reason not to add it and bytes to 
SortField.

I will work up a patch for all of this.


> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-935) Improve maven artifacts

2007-12-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547799
 ] 

Grant Ingersoll commented on LUCENE-935:


{quote}
can you try out this patch
{quote}


BUILD FAILED
/lucene/java/lucene935/build.xml:459: Specify at least one source - a file 
or a resource collection.


Also, note, you have Zones access

> Improve maven artifacts
> ---
>
> Key: LUCENE-935
> URL: https://issues.apache.org/jira/browse/LUCENE-935
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Attachments: lucene-935-new.patch, lucene-935-remote-repos.patch, 
> lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
> - artifacts "lucene-parent" should extend "apache-parent"
> - add source jars as artifacts
> - update  task to work with latest version of 
> maven-ant-tasks.jar
> - metadata filenames should not contain "local"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about aborting background merges

2007-12-03 Thread Michael McCandless


"Michael Busch" <[EMAIL PROTECTED]> wrote:
> Michael McCandless wrote:
> 
> > Hmm ... looking at the code, I think we should also check whether a
> > merge was aborted in mergeInit.  It looks like there is a window from
> > when a merge is handed out until when it is inited such that if the
> > writer is closed in that window it could result in incorrect re-use of a
> > segment name.  Michael are you seeing such a case?
> > 
> 
> Thanks for the explanation! I'm not seeing this particular issue.
> However, a very weird thing happened in one of our stress tests. I
> believe in that test a script would push a bunch of docs into the
> IndexWriter, then close(false) the writer and open a new one immediately
> afterwards. This test ran for quite a while without any problems (up to
> 8M docs). Then the weird thing happened: suddenly the index was totally
> wiped out, meaning only the segments.gen and segments_1 were in the
> index directory. There was no IOException in the log.
> 
> I must say that I didn't write the stress test and haven't seen it yet
> either, I'm planning to take a look next week if I can. It looks like it
> might be an application bug, such that it might open an IndexWriter with
> create=true, because the segments file has the initial generation 1.
> 
> I started looking if the constructors of IndexWriter that automatically
> try to determine if the index has to be created or not might have a
> problem, maybe even in combination with aborted, but still running
> background merge threads. I think the question is if
> SegmentInfos.getCurrentSegmentGeneration() could ever falsely return -1,
> but it doesn't look like this is possible.
> 
> I'm just really guessing here, it might not be a Lucene problem at all,
> but at this moment I can't rule this possibility out.

Well that certainly sounds spooky!  You mean this test is using the
one of IndexWriter's ctors that relies on IndexReader.indexExists() to
decide whether to pass create=true?

Keep up posted!

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about aborting background merges

2007-12-03 Thread Michael Busch

Michael McCandless wrote:

> Hmm ... looking at the code, I think we should also check whether a
> merge was aborted in mergeInit.  It looks like there is a window from
> when a merge is handed out until when it is inited such that if the
> writer is closed in that window it could result in incorrect re-use of a
> segment name.  Michael are you seeing such a case?
> 

Thanks for the explanation! I'm not seeing this particular issue.
However, a very weird thing happened in one of our stress tests. I
believe in that test a script would push a bunch of docs into the
IndexWriter, then close(false) the writer and open a new one immediately
afterwards. This test ran for quite a while without any problems (up to
8M docs). Then the weird thing happened: suddenly the index was totally
wiped out, meaning only the segments.gen and segments_1 were in the
index directory. There was no IOException in the log.

I must say that I didn't write the stress test and haven't seen it yet
either, I'm planning to take a look next week if I can. It looks like it
might be an application bug, such that it might open an IndexWriter with
create=true, because the segments file has the initial generation 1.

I started looking if the constructors of IndexWriter that automatically
try to determine if the index has to be created or not might have a
problem, maybe even in combination with aborted, but still running
background merge threads. I think the question is if
SegmentInfos.getCurrentSegmentGeneration() could ever falsely return -1,
but it doesn't look like this is possible.

I'm just really guessing here, it might not be a Lucene problem at all,
but at this moment I can't rule this possibility out.

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

2007-12-03 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1072.


   Resolution: Fixed
Fix Version/s: 2.3

I just committed this.  Thanks for reporting it Alexei!

> NullPointerException during indexing in 
> DocumentsWriter$ThreadState$FieldData.addPosition
> -
>
> Key: LUCENE-1072
> URL: https://issues.apache.org/jira/browse/LUCENE-1072
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
> Environment: Linux CentOS 5 x86_64 running on 2-core Pentium D, Java 
> HotSpot(TM) 64-Bit Server VM (build 1.6.0_01-b06, mixed mode), using 
> lucene-core-2007-11-29_02-49-31
>Reporter: Alexei Dets
>Assignee: Michael McCandless
> Fix For: 2.3
>
> Attachments: LUCENE-1072.patch
>
>
> In my case during indexing sometimes appear documents with unusually large 
> "words" - text-encoded images in fact.
> Attempt to add document that contains field with such token produces 
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: term length 37944 exceeds max term length 
> 16383
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1492)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
> at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is expected, exception is caught and ignored. The problem is that after 
> this IndexWriter becomes somewhat corrupted and subsequent attempts to add 
> documents to the index fail as well, this time with NPE:
> java.lang.NullPointerException
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1497)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1321)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1247)
> at 
> org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:972)
> at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2202)
> at 
> org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2186)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1432)
> at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1411)
> This is 100% reproducible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about aborting background merges

2007-12-03 Thread Michael McCandless

"Michael Busch" <[EMAIL PROTECTED]> wrote:

> I have a question about IndexWriter.close(false) and background
> merges. I was going to take a look at the code, but I'm sure that Mike
> knows the answer :-). Let's assume that a long background merge is
> going on and close(false) is called. Then the merges are marked as
> aborted and IndexWriter.close() returns after flushing
> DocumentsWriter's buffers. The background merge threads keep going.
> Now a new IndexWriter is opened and optimize() is called. Can it
> happen that optimize() tries to create a segment with the same name
> the background threads are still working on? Then the new IndexWriter
> would probably hit an IOException? Or would the new IndexWriter use
> different file names for the merged segments?

We should be fine here.  When a merge kicks off, it gets a segment name
by calling newSegmentName().  That method gives the merge the next
segment name, and marks commitPending, for exactly this reason (actually
there's a comment in that method explaining this).

When you then close(false), the segments_N that's flushed records the
fact that this name is "in use" and will not re-assign that name when
you next open a writer on the index.

This was in fact a bug at one point (failing to mark commitPending on
giving out a new segmentName), which one of the unit tests exposed.

Hmm ... looking at the code, I think we should also check whether a
merge was aborted in mergeInit.  It looks like there is a window from
when a merge is handed out until when it is inited such that if the
writer is closed in that window it could result in incorrect re-use of a
segment name.  Michael are you seeing such a case?

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-03 Thread Otis Gospodnetic (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547754
 ] 

Otis Gospodnetic commented on LUCENE-1045:
--

Grant, any chance of you throwing in short support in there?


> SortField.AUTO doesn't work with long
> -
>
> Key: LUCENE-1045
> URL: https://issues.apache.org/jira/browse/LUCENE-1045
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.2
>Reporter: Daniel Naber
>Priority: Minor
> Fix For: 2.3
>
> Attachments: auto-long-sorting.diff, LUCENE-1045.patch, 
> TestDateSort.java
>
>
> This is actually the same as LUCENE-463 but I cannot find a way to re-open 
> that issue. I'm attaching a test case by dragon-fly999 at hotmail com that 
> shows the problem and a patch that seems to fix it.
> The problem is that a long (as used for dates) cannot be parsed as an 
> integer, and the next step is then to parse it as a float, which works but 
> which is not correct. With the patch the following parsers are used in this 
> order: int, long, float.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Resolved: (LUCENE-935) Improve maven artifacts

[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store

losing mails from user list?

[jira] Reopened: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

[jira] Assigned: (LUCENE-1045) SortField.AUTO doesn't work with long

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[jira] Commented: (LUCENE-935) Improve maven artifacts

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[jira] Commented: (LUCENE-935) Improve maven artifacts

Re: setRAMBufferSizeMB and DEFAULT_RAM_BUFFER_SIZE_MB missing from IndexWriter !

Re: setRAMBufferSizeMB and DEFAULT_RAM_BUFFER_SIZE_MB missing from IndexWriter !

Re: setRAMBufferSizeMB and DEFAULT_RAM_BUFFER_SIZE_MB missing from IndexWriter !

[jira] Issue Comment Edited: (LUCENE-935) Improve maven artifacts

[jira] Commented: (LUCENE-935) Improve maven artifacts

[jira] Updated: (LUCENE-1045) SortField.AUTO doesn't work with long

[jira] Commented: (LUCENE-935) Improve maven artifacts

Re: Question about aborting background merges

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

[jira] Updated: (LUCENE-1075) Possible thread hazard in IndexWriter.close(false)

[jira] Created: (LUCENE-1075) Possible thread hazard in IndexWriter.close(false)

Re: [jira] Resolved: (LUCENE-935) Improve maven artifacts

[jira] Issue Comment Edited: (LUCENE-935) Improve maven artifacts

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

[jira] Commented: (LUCENE-935) Improve maven artifacts

Re: Question about aborting background merges

Re: Question about aborting background merges

[jira] Resolved: (LUCENE-1072) NullPointerException during indexing in DocumentsWriter$ThreadState$FieldData.addPosition

Re: Question about aborting background merges

[jira] Commented: (LUCENE-1045) SortField.AUTO doesn't work with long

35 matches

Site Navigation

Mail list logo

Footer information