[jira] [Created] (LUCENE-3983) HTMLCharacterEntities.jflex uses String.toUpperCase without Locale

2012-04-14 Thread Uwe Schindler (Created) (JIRA)
HTMLCharacterEntities.jflex uses String.toUpperCase without Locale
--

 Key: LUCENE-3983
 URL: https://issues.apache.org/jira/browse/LUCENE-3983
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Steven Rowe


Is this expected?

{code:java}
  "xi", "\u03BE", "yacute", "\u00FD", "yen", "\u00A5", "yuml", "\u00FF",
  "zeta", "\u03B6", "zwj", "\u200D", "zwnj", "\u200C"
};
for (int i = 0 ; i < entities.length ; i += 2) {
  Character value = entities[i + 1].charAt(0);
  entityValues.put(entities[i], value);
  if (upperCaseVariantsAccepted.contains(entities[i])) {
entityValues.put(entities[i].toUpperCase(), value);
  }
}
{code}

In my opinion, this should look like:

{code:java}
  "xi", "\u03BE", "yacute", "\u00FD", "yen", "\u00A5", "yuml", "\u00FF",
  "zeta", "\u03B6", "zwj", "\u200D", "zwnj", "\u200C"
};
for (int i = 0 ; i < entities.length ; i += 2) {
  Character value = entities[i + 1].charAt(0);
  entityValues.put(entities[i], value);
  if (upperCaseVariantsAccepted.contains(entities[i])) {
entityValues.put(entities[i].toUpperCase(Locale.ENGLISH), value);
  }
}
{code}

(otherwise in the Turkish locale, the entities containing "i" (like "xi" -> 
'\u03BE') will not be detected correctly).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3962) Fix incorrect/missing CHANGES.txt entries

2012-04-06 Thread Uwe Schindler (Created) (JIRA)
Fix incorrect/missing CHANGES.txt entries
-

 Key: LUCENE-3962
 URL: https://issues.apache.org/jira/browse/LUCENE-3962
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 3.6, 4.0


While reviewing the release artifacts I found several issues with the 
CHANGES.txt file sin Lucene and Solr. Attached is an easy patch:

- we no longer JARJAR commons-csv
- Apache Ivy changes were missing in both CHANGES files
- Restructuring of build system by steven was not mentioned by Solr. This is 
important as it affects people working with the Solr source code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3949) Fix license headers in all Java files to not be in Javadocs /** format

2012-04-03 Thread Uwe Schindler (Created) (JIRA)
Fix license headers in all Java files to not be in Javadocs /** format
--

 Key: LUCENE-3949
 URL: https://issues.apache.org/jira/browse/LUCENE-3949
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


Our current License headers in all .java files are (for a reason I don't know) 
in Javadocs format. Means, when you have a class without javadocs, the License 
header is used as Javadocs.

I reviewed lots of other Apache projects, most of them use the correct /* 
header, but some (including Lucene+Solr) the Javadocs one. We should change 
this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3937) Workaround the XERCES-J bug in Benchmark

2012-03-29 Thread Uwe Schindler (Created) (JIRA)
Workaround the XERCES-J bug in Benchmark


 Key: LUCENE-3937
 URL: https://issues.apache.org/jira/browse/LUCENE-3937
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Uwe Schindler


In becnhmark we have a patched version of XERCES which is hard to compile from 
source. When looking at the code part patched and the source of 
EnwikiContentSource, to simply provide the XML parser a Reader instead of 
InputStream, so the broken code is not triggered. This assumes, that the 
XML-file is always UTF-8 If not it will no longer work (because the XML 
parser cannot switch encoding, if it only has a Reader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations

2012-03-26 Thread Uwe Schindler (Created) (JIRA)
Improve Javadocs of RAMDirectory to document its limitations


 Key: LUCENE-3926
 URL: https://issues.apache.org/jira/browse/LUCENE-3926
 Project: Lucene - Java
  Issue Type: Sub-task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0
 Attachments: LUCENE-3659.patch

Spinoff from several dev@lao issues:
- 
[http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
- issue LUCENE-3653

The use cases for RAMDirectory are very limited and to prevent users from using 
it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve 
the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3924) Optimize buffer size handling in RAMDirectory to make it more GC friendly

2012-03-26 Thread Uwe Schindler (Created) (JIRA)
Optimize buffer size handling in RAMDirectory to make it more GC friendly
-

 Key: LUCENE-3924
 URL: https://issues.apache.org/jira/browse/LUCENE-3924
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


RAMDirectory currently uses a fixed buffer size of 1024 bytes to allocate 
memory. This is very wasteful for large indexes. Improvements may be:
- per file buffer sizes based on IOContext and maximum segment size
- allocate only one buffer for files that are copied from another directory
- dynamically increae buffer size when files grow (makes seek() complicated)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3886) MemoryIndex memory estimation in toString inconsistent with getMemorySize()

2012-03-19 Thread Uwe Schindler (Created) (JIRA)
MemoryIndex memory estimation in toString inconsistent with getMemorySize()
---

 Key: LUCENE-3886
 URL: https://issues.apache.org/jira/browse/LUCENE-3886
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Uwe Schindler


After LUCENE-3867 was committed, there are some more minor problems with 
MemoryIndex's estimates. This patch will fix those and also add verbose test 
output of RAM needed for MemoryIndex vs. RAMDirectory.

Interestingly, the RAMDirectory always takes (according to estimates, so even 
with buffer overheads) only 2/3 of the MemoryIndex (excluding IndexReaders).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3866) Make CompositeReader.getSequntialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

2012-03-13 Thread Uwe Schindler (Created) (JIRA)
Make CompositeReader.getSequntialSubReaders() and the corresponding 
IndexReaderContext methods return unmodifiable List
--

 Key: LUCENE-3866
 URL: https://issues.apache.org/jira/browse/LUCENE-3866
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


Since Lucene 2.9 we have the method getSequentialSubReader() returning 
IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me 
realized that returning an array from a public API is an anti-pattern. If the 
array is intended to be modifiable (like byte[] in terms,...), it is fine to 
use arrays in public APIs, but not, if the array must be protected from 
modification. As IndexReaders are 100% unmodifiable in trunk code (no 
deletions,...), the only possibility to corrumpt the reader is by modifying the 
array returned by getSequentialSubReaders(). We should prevent this.

The same theoretically applies to FieldCache, too - but the party that is 
afraid of performance problems is too big to fight against that :(

For getSequentialSubReaders there is no performance problem at all. The binary 
search of reader-ids inside BaseCompositeReader can still be done with the 
internal protected array, but public APIs should expose only a unmodifiable 
List. The same applies to leaves() and children() in IndexReaderContext. This 
change to list would also allow to make CompositeReader and 
CompositeReaderContext Iterable, so some loops would 
look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3213) Upgrade to commons-csv once it is released

2012-03-07 Thread Uwe Schindler (Created) (JIRA)
Upgrade to commons-csv once it is released
--

 Key: SOLR-3213
 URL: https://issues.apache.org/jira/browse/SOLR-3213
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Uwe Schindler


Since SOLR-3159 we have a jarjar'ed apache-solr-commons-csv-SNAPSHOT.jar file 
in lib folder. Once version 1.0 of commons-csv is officially released, we 
should upgrade that to this version, remove maven publishing and change the 
import statements to the official package name in java files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3852) Rename BaseMultiReader class to BaseCompositeReader and make public

2012-03-06 Thread Uwe Schindler (Created) (JIRA)
Rename BaseMultiReader class to BaseCompositeReader and make public
---

 Key: LUCENE-3852
 URL: https://issues.apache.org/jira/browse/LUCENE-3852
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


Currently the abstract DirectoryReader and MultiReader and 
ParallelCompositeReader extend a package private class. Users that want to 
implement a composite reader, should be able to subclass this pkg-private 
class, as it implements lots of abstract methods, useful for own 
implementations. In fact MultiReader is a shallow subclass only implementing 
correct closing&refCounting.

By making it public after the rename, the generics problems (type parameter R 
is not correctly displayed) in the JavaDocs are solved, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3850) Fix rawtypes warnings for Java 7 compiler

2012-03-05 Thread Uwe Schindler (Created) (JIRA)
Fix rawtypes warnings for Java 7 compiler
-

 Key: LUCENE-3850
 URL: https://issues.apache.org/jira/browse/LUCENE-3850
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0


Java 7 changed the warnings a little bit:
- Java 6 only knew "unchecked" warning type, applying for all types of generics 
violations, like missing generics (raw types)
- Java 7 still knows "unchecked" but only emits warning if the call is really 
unchecked. Declaration of variables/fields or constructing instances without 
type param now emits "rawtypes" warning.

The changes above causes the Java 7 compile now emit lots of "rawtypes" 
warnings, where Java 6 is silent. The easy fix is to suppres both warning 
types: @SuppressWarnings({"unchecked","rawtypes"}) for all those places. 
Changes are easy to do, will provide patch later!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3844) Deprecate Token class and remove in trunk

2012-03-03 Thread Uwe Schindler (Created) (JIRA)
Deprecate Token class and remove in trunk
-

 Key: LUCENE-3844
 URL: https://issues.apache.org/jira/browse/LUCENE-3844
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler


After issues like LUCENE-3843, introducing new attributes, we should remove 
Token class in trunk, as it leads to code that ignores those new attributes 
(like PositionLengthAttribute, ScriptAttribute, KeywordAttribute,...). If you 
want a holder for all Attributes a TokenStream, use TS.cloneAttributes().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3823) Add a field-filtering FilterAtomicReader to 4.0 so ParallelReaders can be better tested (in LTC.maybeWrapReader)

2012-02-26 Thread Uwe Schindler (Created) (JIRA)
Add a field-filtering FilterAtomicReader to 4.0 so ParallelReaders can be 
better tested (in LTC.maybeWrapReader)


 Key: LUCENE-3823
 URL: https://issues.apache.org/jira/browse/LUCENE-3823
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index, general/test
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


In addition to the filters in contrib/misc for horizontally filtering (by 
doc-id) AtomicReader, it would be good to have the same vertically (by field). 
For now I will add this implementation to test-framework, as it cannot stay in 
contrib/misc, because LTC will need it for maybeWrapReader.

LTC will use this FilterAtomicReader to construct a ParallelAtomicReader out of 
two (or maybe more) FieldFilterAtomicReaders.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3822) Inner classes of FilterAtomicReader (trunk) / FilterIndexReader (3.x) do not override all methods to be filtered

2012-02-26 Thread Uwe Schindler (Created) (JIRA)
Inner classes of FilterAtomicReader (trunk) / FilterIndexReader (3.x) do not 
override all methods to be filtered


 Key: LUCENE-3822
 URL: https://issues.apache.org/jira/browse/LUCENE-3822
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0


This issue adds missing checks in the FilterReader test to also check 
overridden methods in the enum implementations (inner classes) similar to the 
checks added by Shai Erea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3800) Readers wrapping other readers don't prevent usage if any of their subreaders was closed

2012-02-19 Thread Uwe Schindler (Created) (JIRA)
Readers wrapping other readers don't prevent usage if any of their subreaders 
was closed


 Key: LUCENE-3800
 URL: https://issues.apache.org/jira/browse/LUCENE-3800
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


On recent trunk test we got this problem:
org.apache.lucene.index.TestReaderClosed.test
fails because the inner reader is closed but the wrapped outer ones are still 
open.

I fixed the issue partially for SlowCompositeReaderWrapper and 
ParallelAtomicReader but it failed again. The cool thing with this test is the 
following:

The test opens an DirectoryReader and then creates a searcher, closes the 
reader and executes a search. This is not an issue, if the reader is closed 
that the search is running on. This test uses LTC.newSearcher(wrap=true), which 
randomly wraps the passed Reader with SlowComposite or ParallelReader - or with 
both!!! If you then close the original inner reader, the close is not detected 
when excuting search. This can cause SIGSEGV when MMAP is used.

The problem in (in Slow* and Parallel*) is, that both have their own Fields 
instances thats are kept alive until the reader itsself is closed. If the child 
reader is closed, the wrapping reader does not know and still uses its own 
Fields instance that delegates to the inner readers. On this step no more 
ensureOpen checks are done, causing the failures.

The first fix done in Slow and Parallel was to call ensureOpen() on the 
subReader, too when rewquesting fields(). This works fine until you wrap two 
times: 
ParallelAtomicReader(SlowCompositeReaderWrapper(StandardDirectoryReader(segments_1:3:nrt
 _0(4.0):C42)))

One solution would be to make ensureOpen also check all subreaders, but that 
would do the volatile checks way too often (with n is the total number of 
subreaders and m is the number of hierarchical instances this is n^m) - we 
cannot do this. Currently we only have n*m which is fine.

The proposal how to solve this (closing subreaders under the hood of parent 
readers is to use the readerClosedListeners. Whenever a composite or slow 
reader wraps another readers, it registers itself as interested in readerClosed 
events. When a subreader is then forcefully closed (e.g by a programming error 
or this crazy test), we automatically close the parents, too.

We should also fix this in 3.x, if we have similar problems there (needs 
investigation).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3771) Rename some remaining tests for new IndexReader class hierarchy

2012-02-11 Thread Uwe Schindler (Created) (JIRA)
Rename some remaining tests for new IndexReader class hierarchy
---

 Key: LUCENE-3771
 URL: https://issues.apache.org/jira/browse/LUCENE-3771
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: general/test
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3770) Rename FilterIndexReader to FilterAtomicReader

2012-02-11 Thread Uwe Schindler (Created) (JIRA)
Rename FilterIndexReader to FilterAtomicReader
--

 Key: LUCENE-3770
 URL: https://issues.apache.org/jira/browse/LUCENE-3770
 Project: Lucene - Java
  Issue Type: Sub-task
Reporter: Uwe Schindler
Assignee: Uwe Schindler




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3764) Remove oal.util.MapBackedSet (Java 6 offsers Collections.newSetFromMap())

2012-02-09 Thread Uwe Schindler (Created) (JIRA)
Remove oal.util.MapBackedSet (Java 6 offsers Collections.newSetFromMap())
-

 Key: LUCENE-3764
 URL: https://issues.apache.org/jira/browse/LUCENE-3764
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler


Easy search and replace job. In 3.x we still need the class, as Java 5 does not 
have Collections.newSetFromMap().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3757) Change AtomicReaderContext.leaves() to return itsself as only leave to simplify code and remove an otherwise unneeded ReaderUtil method

2012-02-07 Thread Uwe Schindler (Created) (JIRA)
Change AtomicReaderContext.leaves() to return itsself as only leave to simplify 
code and remove an otherwise unneeded ReaderUtil method
---

 Key: LUCENE-3757
 URL: https://issues.apache.org/jira/browse/LUCENE-3757
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler


The documentation of IndexReaderContext.leaves() states that it returns (for 
convenience) all leave nodes, if the context is top-level (directly got from 
IndexReader), otherwise returns null. This is not correct for 
AtomicReaderContext, where it returns null always.

To make it consistent, the convenience method should simply return itsself as 
only leave for atomic contexts. This makes the utility method 
ReaderUtil.leaves() obsolete and simplifies code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3736) ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either "slow atomic" or "fast composite" way

2012-01-31 Thread Uwe Schindler (Created) (JIRA)
ParallelReader is now atomic, add convenience methods to wrap CompositeReaders 
in either "slow atomic" or "fast composite" way
--

 Key: LUCENE-3736
 URL: https://issues.apache.org/jira/browse/LUCENE-3736
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/index
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


ParallelReader is now atomic. We should add a sugar wrapper method to allow 
synchronized composite readers (with same segment sizes) to be aligned with 
MultiReaders or wrapped by Slow:
- one ParallelReader with Slow wrapped parallel readers, they only need same 
maxDoc() (and deletions)
- a MultiReader containing all sub-ParallelReaders. This needs CompositeReaders 
with same docStarts[]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3735) Fix PayloadProcessorProvider to no longer use Directory for lookup, instead AtomicReader

2012-01-31 Thread Uwe Schindler (Created) (JIRA)
Fix PayloadProcessorProvider to no longer use Directory for lookup, instead 
AtomicReader


 Key: LUCENE-3735
 URL: https://issues.apache.org/jira/browse/LUCENE-3735
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/index
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


The PayloadProcessorProvider has a broken API, this should be fixed. The 
current patch mimics the old behaviour, but not 100%.

The PayloadProcessorProvider API should return a PayloadProcessor based on the 
AtomicReader instance that gets merged. As AtomicReader do no longer know the 
directory they are reside (they could be e.g. FilterIndexReaders, 
MemoryIndexes,...) a selection by Directory is no longer possible.

The current code in Lucene trunk mimics the old behavior by doing an instanceof 
SegmentReader check and then asking for a DirProvider. If something else is 
merged in, Payload processing is not supported. This should be changed, the old 
API could be kept backwards compatible by moving the instanceof check in a 
"convenience class" DirPayloadProcessorProvider, extending 
PayloadProcessorProvider.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3734) Allow customizing/subclassing of DirectoryReader

2012-01-31 Thread Uwe Schindler (Created) (JIRA)
Allow customizing/subclassing of DirectoryReader


 Key: LUCENE-3734
 URL: https://issues.apache.org/jira/browse/LUCENE-3734
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/index
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


DirectoryReader is final and has only static factory methods. It is not 
possible to subclass it in any way. The problem is mainly Solr, as Solr 
accesses directory(), IndexCommits,... and therefore cannot work on abstract 
IndexReader anymore. This should be changed, by e.g. handling reopening in the 
IRFactory, also versions, commits,... Currently its not possible to implement 
any other IRFactory that returns something else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3733) Remaining TODOs of LUCENE-2858: Finalize AtomicReader/CompositeReader API

2012-01-31 Thread Uwe Schindler (Created) (JIRA)
Remaining TODOs of LUCENE-2858: Finalize AtomicReader/CompositeReader API
-

 Key: LUCENE-3733
 URL: https://issues.apache.org/jira/browse/LUCENE-3733
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
 Fix For: 4.0


This issue will handle the remaining issues in the commit last night 
(LUCENE-2858). A new branch will be created and several problems handled in 
sub-tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3716) Discussion topic: Move all Commit/Version&Reopen stuff from abstract IR to DirectoryReader

2012-01-22 Thread Uwe Schindler (Created) (JIRA)
Discussion topic: Move all Commit/Version&Reopen stuff from abstract IR to 
DirectoryReader
--

 Key: LUCENE-3716
 URL: https://issues.apache.org/jira/browse/LUCENE-3716
 Project: Lucene - Java
  Issue Type: Sub-task
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler


When implementing the parent issue, I noticed a lot of other stuff in 
IndexReader thats only implemented in DirectoryReader/SegmentReader and is not 
really related to IndexReader at all:

- getVersion (maybe also isCurrent) only affects DirectoryReaders, because of 
the commit-stuff there is no easy way for e.g. MultiReader to implement this
- reopen/openIfChanged cannot be implemented easily by most AtomicIndexReaders, 
but also CompositeIndexReader is the wrong place to define those methods

In the parant issue, I already let IndexReader.open() return DirectoryReader 
and I made this class public. We should move the whole stuff (including 
IR.open) to DirectoryReader. Reopening outside DirectoryReader is not really 
needed.

If some people think, it should maybe stay abstract (there are ways for other 
readers to implement it, but for sure its not specific to IR's in general). In 
that case I would decalre an interface that DirectoryReader implements. Code 
like SearcherManager/Solr could then instanceof the IR instance and find out if 
it's worth reopening/version checking).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3712) Remove unused (and untested) methods from ReaderUtil that are also veeeeery ineffective

2012-01-21 Thread Uwe Schindler (Created) (JIRA)
Remove unused (and untested) methods from ReaderUtil that are also very 
ineffective
---

 Key: LUCENE-3712
 URL: https://issues.apache.org/jira/browse/LUCENE-3712
 Project: Lucene - Java
  Issue Type: Task
  Components: core/other
Reporter: Uwe Schindler
Assignee: Uwe Schindler


ReaderUtil contains two methods that are nowhere used and not even tested. 
Additionally those are implemented with useless List->array copying; 
ineffective docStart calculation for a binary search later instead directly 
returning the reader while scanning -- and I am not sure if they really work as 
expected. As ReaderUtil is @lucene.internal we should remove them in 3.x and 
trunk, alternatively the useless array copy / docStarts handling should be 
removed and tests added:

{code:java}
public static IndexReader subReader(int doc, IndexReader reader)
public static IndexReader subReader(IndexReader reader, int subIndex)
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations

2011-12-20 Thread Uwe Schindler (Created) (JIRA)
Improve Javadocs of RAMDirectory to document its limitations


 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0


Spinoff from several dev@lao issues:
- [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/browser]
- issue LUCENE-3653

The use cases for RAMDirectory are very limited and to prevent users from using 
it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve 
the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3656) IndexReader's add/removeCloseListener should not use ConcurrentHashMap, just a synchronized set

2011-12-19 Thread Uwe Schindler (Created) (JIRA)
IndexReader's add/removeCloseListener should not use ConcurrentHashMap, just a 
synchronized set
---

 Key: LUCENE-3656
 URL: https://issues.apache.org/jira/browse/LUCENE-3656
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor


The use-case for ConcurrentHashMap is when many threads are reading and less 
writing to the structure. Here this is just funny: The only reader is close(). 
Here you can just use a synchronized HashSet. The complexity of CHM is making 
this just a joke :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3652) Move org.apache.lucene.messages to QueryParser module in Lucene trunk (maybe also in 3.x)

2011-12-16 Thread Uwe Schindler (Created) (JIRA)
Move org.apache.lucene.messages to QueryParser module in Lucene trunk (maybe 
also in 3.x)
-

 Key: LUCENE-3652
 URL: https://issues.apache.org/jira/browse/LUCENE-3652
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler


The package org.apache.lucene.messages as introduced by flexible QueryParser 
but is not used by any code in core. It should move to this module / this 
contrib (maybe even in 3.x).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3643) Improve FilteredQuery to shortcut on wrapped MatchAllDocsQuery, null Query or null Filter

2011-12-12 Thread Uwe Schindler (Created) (JIRA)
Improve FilteredQuery to shortcut on wrapped MatchAllDocsQuery, null Query or 
null Filter
-

 Key: LUCENE-3643
 URL: https://issues.apache.org/jira/browse/LUCENE-3643
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


Since the rewrite of Lucene trunk to delegate all Filter logic to 
FilteredQuery, by simply wrapping in IndexSearcher.wrapFilter(), we can do more 
short circuits and improve query execution. A common use case it to pass 
MatchAllDocsQuery as query to IndexSearcher and a filter. For the underlying 
hit collection this is stupid and slow, as MatchAllDocsQuery simply increments 
the docID and checks acceptDocs. If the filter is sparse, this is a big waste. 
This patch changes FilteredQuery.rewrite() to short circuit and return 
ConstantScoreQuery, if the query is null or MatchAllDocs. The same happens for 
filter==null, in this case FilteredQuery rewrites itsself to the inner query 
with modified boost.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3641) MultiReader does not propagate readerFinishedListeners to clones/reopened readers

2011-12-11 Thread Uwe Schindler (Created) (JIRA)
MultiReader does not propagate readerFinishedListeners to clones/reopened 
readers
-

 Key: LUCENE-3641
 URL: https://issues.apache.org/jira/browse/LUCENE-3641
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.5
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0


While working on refactoring MultiReader/DirectoryReader in trunk, I found out 
that MultiReader does not correctly pass readerFinishedListeners to its clones 
and reopened readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3633) Remove code duplication in MultiReader/DirectoryReader, make everything inside final

2011-12-10 Thread Uwe Schindler (Created) (JIRA)
Remove code duplication in MultiReader/DirectoryReader, make everything inside 
final


 Key: LUCENE-3633
 URL: https://issues.apache.org/jira/browse/LUCENE-3633
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


After making IndexReader readOnly (LUCENE-3606) there is no need to have 
completely different DirectoryReader and MultiReader, the current code is heavy 
code duplication and violations against finalness patterns. There are only few 
differences in reopen and things like isCurrent/getDirectory/...

This issue will clean this up by introducing a hidden package-private base 
class for both and only handling reopen and incRef/decRef different. 
DirectoryReader is now final and all fields in BaseMultiReader, MultiReader and 
DirectoryReader are final now. DirectoryReader has now only static factories, 
no public ctor anymore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3632) Fully support doOpenIfChanged(boolean readOnly)/clone(boolean readOnly) in MultiReader and ParallelReader

2011-12-10 Thread Uwe Schindler (Created) (JIRA)
Fully support doOpenIfChanged(boolean readOnly)/clone(boolean readOnly) in 
MultiReader and ParallelReader
-

 Key: LUCENE-3632
 URL: https://issues.apache.org/jira/browse/LUCENE-3632
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler


Followup from LUCENE-3630:
doOpenIfChanged is behaving incorrectly if you pass a boolean to 
openIfChanged/clone. A partial fix is in LUCENE-3630, but it's not complete.
This issue fully supports doOpenIfChanged/clone by conditionally passing the 
boolean down to the subreaders.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3631) Remove write access from SegmentReader and possibly move to separate class or IndexWriter/BufferedDeletes/...

2011-12-09 Thread Uwe Schindler (Created) (JIRA)
Remove write access from SegmentReader and possibly move to separate class or 
IndexWriter/BufferedDeletes/...
-

 Key: LUCENE-3631
 URL: https://issues.apache.org/jira/browse/LUCENE-3631
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler


After LUCENE-3606 is finished, there are some TODOs:

SegmentReader still contains (package-private) all delete logic including crazy 
copyOnWrite for validDocs Bits. It would be good, if SegmentReader itsself 
could be read-only like all other IndexReaders.

There are two possibilities to do this:
# the simple one: Subclass SegmentReader and make a RWSegmentReader that is 
only used by IndexWriter/BufferedDeletes/... DirectoryReader will only use the 
read-only SegmentReader. This would move all TODOs to a separate class. It's 
reopen/clone method would always create a RO-SegmentReader (for NRT).
# Remove all write and commit stuff from SegmentReader completely and move it 
to IndexWriter's readerPool (it must be in readerPool as deletions need a 
not-changing view on an index snapshot).

Unfortunately the code is so complicated and I have no real experience in those 
internals of IndexWriter so I did not want to do it with LUCENE-3606, I just 
separated the code in SegmentReader and marked with TODO. Maybe Mike McCandless 
can help :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3630) MultiReader and ParallelReader accidently override doOpenIfChanged(boolean readOnly) with doOpenIfChanged(boolean doClone)

2011-12-09 Thread Uwe Schindler (Created) (JIRA)
MultiReader and ParallelReader accidently override doOpenIfChanged(boolean 
readOnly) with doOpenIfChanged(boolean doClone)
--

 Key: LUCENE-3630
 URL: https://issues.apache.org/jira/browse/LUCENE-3630
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.5
Reporter: Uwe Schindler
 Fix For: 3.6


I found this during adding deprecations for RW access in LUCENE-3606:

the base class defines doOpenIfChanged(boolean readOnly), but MultiReader and 
ParallelReader "override" this method with a signature doOpenIfChanged(doClone) 
and missing @Override. This makes consumers calling IR.openIfChanged(boolean 
readOnly) do the wrong thing. Instead they should get UOE like for the other 
unimplemented doOpenIfChanged methods in MR and PR.

Easy fix is to rename and hide this internal "reopen" method, like 
DirectoryReader,...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3626) Make PKIndexSplitter and MultiPassIndexSplitter work per segment

2011-12-07 Thread Uwe Schindler (Created) (JIRA)
Make PKIndexSplitter and MultiPassIndexSplitter work per segment


 Key: LUCENE-3626
 URL: https://issues.apache.org/jira/browse/LUCENE-3626
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/other
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


Spinoff from LUCENE-3624: DocValuesw merger throws exception on 
IW.addIndexes(SlowMultiReaderWrapper) as string-index like docvalues cannot 
provide asSortedSource.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3614) Add a JUL/SLF4J example InfoStream implementation so IndexWriter can log to JUL/SLF4J

2011-12-01 Thread Uwe Schindler (Created) (JIRA)
Add a JUL/SLF4J example InfoStream implementation so IndexWriter can log to 
JUL/SLF4J
-

 Key: LUCENE-3614
 URL: https://issues.apache.org/jira/browse/LUCENE-3614
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


Followup to LUCENE-3598: Hoss suggested to add a default JUL/SLF4J 
implementation to contrib/misc (that can also be used by SOLR to log 
IndexWriter verbose messages to its logging framework).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3606) Make IndexReader really read-only in Lucene 4.0

2011-11-28 Thread Uwe Schindler (Created) (JIRA)
Make IndexReader really read-only in Lucene 4.0
---

 Key: LUCENE-3606
 URL: https://issues.apache.org/jira/browse/LUCENE-3606
 Project: Lucene - Java
  Issue Type: Task
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler


As we change API completely in Lucene 4.0 we are also free to remove read-write 
access and commits from IndexReader. This code is so hairy and buggy (as 
investigated by Robert and Mike today) when you work on SegmentReader level but 
forget to flush in the DirectoryReader, so its better to really make 
IndexReaders readonly.

Currently with IndexReader you can do things like:
- delete/undelete Documents -> Can be done by with IndexWriter, too (using 
deleteByQuery)
- change norms -> this is a bad idea in general, but when we remove norms at 
all and replace by DocValues this is obsolete already. Changing DocValues 
should also be done using IndexWriter in trunk (once it is ready)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3598) Improve InfoStream class in trunk to be more consistent with logging-frameworks like slf4j/log4j/commons-logging

2011-11-26 Thread Uwe Schindler (Created) (JIRA)
Improve InfoStream class in trunk to be more consistent with logging-frameworks 
like slf4j/log4j/commons-logging


 Key: LUCENE-3598
 URL: https://issues.apache.org/jira/browse/LUCENE-3598
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Uwe Schindler


Followup on a [thread by Shai Erea on 
java-dev@lao|http://lucene.472066.n3.nabble.com/IndexWriter-infoStream-is-final-td3537485.html]:
 I already discussed with Robert about that, that there is one thing missing. 
Currently the IW only checks if the infoStream!=null and then passes the 
message to the method, and that *may* ignore it. For your requirement it is the 
case that this is enabled or disabled dynamically. Unfortunately if the 
construction of the message is heavy, then this wastes resources.

I would like to add another method to this class: abstract boolean 
isMessageEnabled() that can also be implemented. I would then replace all null 
checks in IW by this method. The default config in IW would be changed to use a 
NoOutputInfoStream that returns false here and ignores the message.

A simple logger wrapper for e.g. log4j / slf4j then could look like (ignoring 
component, could be enabled):

Loger log = YourLoggingFramework.getLogger(IndexWriter.class);

{code:java}
public void message(String component, String message) {
  log.debug(component + ": " + message);
}

public boolean isMessageEnabled(String component) {
  return log.isDebugEnabled();
}
{code}

Using this you could enable/disable logging live by e.g. the log4j management 
console of your app server by enabling/disabling IndexWriter.class logging.

The changes are really simple:
- PrintStreamInfoStream returns true, always, mabye make it dynamically 
enable/disable to allow Shai's request
- infoStream.getDefault() is never null and can never be set to null. Instead 
the default is a singleton NoOutputInfoStream that returns false of 
isMessageEnabled().
- All null checks on infoStream should be replaced by 
infoStream.isMessageEanbled(component), this is possible as always != null. 
There are no slowdowns by this - it's like Collections.emptyList() instead 
stupid null checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3595) Refactor FieldCacheRangeFilter.FieldCacheDocIdSet to be separate class and fix the dangerous matchDoc() throws AIOOBE requirement

2011-11-24 Thread Uwe Schindler (Created) (JIRA)
Refactor FieldCacheRangeFilter.FieldCacheDocIdSet to be separate class and fix 
the dangerous matchDoc() throws AIOOBE requirement
-

 Key: LUCENE-3595
 URL: https://issues.apache.org/jira/browse/LUCENE-3595
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler


Followup from LUCENE-3593:
The FieldCacheRangeFilter.FieldCacheDocIdSet class has a strange requirement on 
the abstract matchDoc(): It should throw AIOOBE if the docId is > maxDoc. This 
check should be done by caller as especially on trunk, e.g. 
FieldCacheTermsFilter does not seem to always throw this exception correctly 
(getOrd() is a method and no array in TermsIndex cache).

Also in 3.x the Filter does not correctly respect deletions when a FieldCache 
based on a reopened reader is used.

This issue will refactor this and fix the bugs and moves the docId check up to 
the iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3594) Backport FieldCacheTermsFilter code duplication removal to 3.x

2011-11-24 Thread Uwe Schindler (Created) (JIRA)
Backport FieldCacheTermsFilter code duplication removal to 3.x
--

 Key: LUCENE-3594
 URL: https://issues.apache.org/jira/browse/LUCENE-3594
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.5
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6


In trunk I already cleaned up FieldCacheTermsFilter to not duplicate code of 
FieldCacheRangeFilter. This issue simply backports this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3588) Try harder to prevent SIGSEGV on cloned MMapIndexInputs

2011-11-22 Thread Uwe Schindler (Created) (JIRA)
Try harder to prevent SIGSEGV on cloned MMapIndexInputs
---

 Key: LUCENE-3588
 URL: https://issues.apache.org/jira/browse/LUCENE-3588
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 3.4, 3.5
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0


We are unmapping mmapped byte buffers which is disallowed by the JDK, because 
it has the risk of SIGSEGV when you access the mapped byte buffer after 
unmapping.

We currently prevent this for the main IndexInput by setting its buffer to 
null, so we NPE if somebody tries to access the underlying buffer. I recently 
fixed also the stupid curBuf (LUCENE-3200) by setting to null.

The big problem are cloned IndexInputs which are generally not closed. Those 
still contain references to the unmapped ByteBuffer, which lead to SIGSEGV 
easily. The patch from Mike in LUCENE-3439 prevents most of this in Lucene 3.5, 
but its still not 100% safe (as it uses non-volatiles).

This patch will fix the remaining issues by also setting the buffers of clones 
to null when the original is closed. The trick is to record weak references of 
all clones created and close them together with the original. This uses a 
ConcurrentHashMap,?> as store with the logic 
borrowed from WeakHashMap to cleanup the GCed references (using ReferenceQueue).

If we respin 3.5, we should maybe also get this in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3583) benchmark tests always fail on windows because directory cannot be removed

2011-11-19 Thread Uwe Schindler (Created) (JIRA)
benchmark tests always fail on windows because directory cannot be removed
--

 Key: LUCENE-3583
 URL: https://issues.apache.org/jira/browse/LUCENE-3583
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Uwe Schindler


This seems to be a bug recently introduced. I have no idea what's wrong. 
Attached is a log file, reproduces everytime.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.

2011-11-14 Thread Uwe Schindler (Created) (JIRA)
Add some more constants for newer Java versions to Constants.class, remove 
outdated ones.
-

 Key: LUCENE-3574
 URL: https://issues.apache.org/jira/browse/LUCENE-3574
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/other
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.5, 4.0


Preparation for LUCENE-3235:
This adds constants to quickly detect Java6 and Java7 to Constants.java. It 
also deprecated and removes the outdated historical Java versions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3561) Maven xxx-src.jar files do not contain resources

2011-11-05 Thread Uwe Schindler (Created) (JIRA)
Maven xxx-src.jar files do not contain resources


 Key: LUCENE-3561
 URL: https://issues.apache.org/jira/browse/LUCENE-3561
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/build
Affects Versions: 3.4, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.5, 4.0


When building src.jar files for maven deploy, they miss resources, so 
analyzers-sommon-src.jar is useless.

The attached patch will fix this. The only backside is: The globmapper hack 
does not work with , so i used the new ANT 1.7.1 attribute 
erroronmissingdir="no" on the s

I also upgraded BUILD.txt, which were missing even java 1.6 in trunk!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3540) In 3.x branch (starting with 3.4) the IndexFormatTooOldException was backported, but the error message was not modified for 3.x

2011-10-28 Thread Uwe Schindler (Created) (JIRA)
In 3.x branch (starting with 3.4) the IndexFormatTooOldException was 
backported, but the error message was not modified for 3.x
---

 Key: LUCENE-3540
 URL: https://issues.apache.org/jira/browse/LUCENE-3540
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.4
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.5


In 3.x branch (starting with 3.4) the IndexFormatTooOldException was 
backported, but the error message was not modified for 3.x:

bq. This version of Lucene only supports indexes created with release 3.0 and 
later.

In 3.x it must be:

bq. This version of Lucene only supports indexes created with release 1.9 and 
later.

Indexes before 1.9 will throw this exception on reading SegmentInfos 
(LUCENE-3255).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3537) Add note about Java 7u1 and 6u29 to Lucene/Solr sites

2011-10-26 Thread Uwe Schindler (Created) (JIRA)
Add note about Java 7u1 and 6u29 to Lucene/Solr sites
-

 Key: LUCENE-3537
 URL: https://issues.apache.org/jira/browse/LUCENE-3537
 Project: Lucene - Java
  Issue Type: Task
  Components: general/website
Reporter: Uwe Schindler


Oracle confirmed, that the bugs leading to index corruption and SIGSEGV are 
fixed in Java 7u1 and 6u29. We should post a message to the news sections 
revising the previous WARNING (LUCENE-3349). I prepared something, please 
comment before i commit:

{quote}
26 October 2011 - Java 7u1 fixes index corruption and crash 
bugs in Apache Lucene Core and Apache Solr
Oracle released http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html";>Java
 7u1 on October 19.
According to the release notes and tests done by the Lucene committers, all 
bugs reported on July 28 are fixed in this release,
so code using Porter stemmer no longer crashes with SIGSEGV. We 
were not able to experience any index corruption anymore,
so it is safe to use Java 7u1 with Lucene Core and Solr.
On the same day, Oracle released http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html";>Java
 6u29
fixing the same problems occurring with Java 6, if the JVM switches 
-XX:+AggressiveOpts
or -XX:+OptimizeStringConcat were used. Of course, you should 
not use experimental JVM options like
-XX:+AggressiveOpts in production environments! We recommend 
everybody to upgrade to this latest version 6u29.
In case you upgrade to Java 7, remember that you may have to reindex, as the 
unicode
version shipped with Java 7 changed and tokenization behaves differently
(e.g. lowercasing). For more information, read 
JRE_VERSION_MIGRATION.txt
in your distribution package!

{quote}

I plan to commit this later this afternoon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3534) Backport FilteredQuery/IndexSearcher changes to 3.x: Remove filter logic from IndexSearcher and delegate to FilteredQuery

2011-10-25 Thread Uwe Schindler (Created) (JIRA)
Backport FilteredQuery/IndexSearcher changes to 3.x: Remove filter logic from 
IndexSearcher and delegate to FilteredQuery
-

 Key: LUCENE-3534
 URL: https://issues.apache.org/jira/browse/LUCENE-3534
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Uwe Schindler


Spinoff from LUCENE-1536: We simplified the code in IndexSearcher to no longer 
do the filtering there, instead wrap all Query with FilteredQuery, if a 
non-null filter is given. The conjunction code would then only exist in 
FilteredQuery which makes it easier to maintain. Currently both implementations 
differ in 3.x, in trunk we used the more optimized IndexSearcher variant with 
addition of a simplified in-order conjunction code.

This issue will backport those changes (without random access bits).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3533) Nuke SpanFilters and CachingSpanFilter (maybe move to sandbox)

2011-10-25 Thread Uwe Schindler (Created) (JIRA)
Nuke SpanFilters and CachingSpanFilter (maybe move to sandbox)
--

 Key: LUCENE-3533
 URL: https://issues.apache.org/jira/browse/LUCENE-3533
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler


SpanFilters are inefficient and OOM easily (they don't scale at all: Create 
large Lists of Objects for every match, also filtering deleted docs is a pain). 
Some talks with Grant on Eurocon and also the fact that caching of them is 
still broken in 3.x (but fixed on trunk) - I assume nobody uses them, so let's 
nuke them. They are also in wrong package, so standard statement: "Die, 
SpanFilters, die!"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3532) Improve Weight.scorer() API to enforce consistent topScorer/outOfOrder parameters across segments

2011-10-25 Thread Uwe Schindler (Created) (JIRA)
Improve Weight.scorer() API to enforce consistent topScorer/outOfOrder 
parameters across segments
-

 Key: LUCENE-3532
 URL: https://issues.apache.org/jira/browse/LUCENE-3532
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.0
Reporter: Uwe Schindler


Spinoff from LUCENE-1536: In the past, when filters were applied, all scorers 
were forced to score in order. With random access DocIdSets, this is no longer 
needed. Some Weights (BooleanWeight) unfortunately return different scorers for 
in-order/out-of-order, leading to incompatible scores between segments.

For now we enforce in-order execution of scorers for FilteredQuery (as we do in 
3.x), but once we fix BooleanWeight or have some other good way to produce 
compatible scores, we can reenable random access. Maybe we should nuke 
BooleanScorer2... - Robert and Mike have some ideas how to do that :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3531) Improve CachingWrapperFilter to optionally also cache acceptDocs, if identical to liveDocs

2011-10-25 Thread Uwe Schindler (Created) (JIRA)
Improve CachingWrapperFilter to optionally also cache acceptDocs, if identical 
to liveDocs
--

 Key: LUCENE-3531
 URL: https://issues.apache.org/jira/browse/LUCENE-3531
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: 4.0
Reporter: Uwe Schindler


Spinoff from LUCENE-1536: This issue removed the different cache modes 
completely and always applies the acceptDocs using BitsFilteredDocIdSet.wrap(), 
the cache only contains raw DocIdSet without any deletions/acceptDocs. For 
IndexReaders that are seldom reopened, this might not be as performant as it 
could be. If the acceptDocs==IR.liveDocs, those DocIdSet could also be cached 
with liveDocs applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3530) Remove deprecated methods in CompoundTokenFilters

2011-10-25 Thread Uwe Schindler (Created) (JIRA)
Remove deprecated methods in CompoundTokenFilters
-

 Key: LUCENE-3530
 URL: https://issues.apache.org/jira/browse/LUCENE-3530
 Project: Lucene - Java
  Issue Type: Sub-task
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3489) Refactor test classes that use assumeFalse(codec != SimpleText, Memory) to use new annotation and move the expensive methods to separate classes

2011-10-05 Thread Uwe Schindler (Created) (JIRA)
Refactor test classes that use assumeFalse(codec != SimpleText, Memory) to use 
new annotation and move the expensive methods to separate classes


 Key: LUCENE-3489
 URL: https://issues.apache.org/jira/browse/LUCENE-3489
 Project: Lucene - Java
  Issue Type: Test
Reporter: Uwe Schindler


Folloup for LUCENE-3463.

TODO:
- Move test-methods that need the new @UseNoMemoryExpensiveCodec annotation to 
separate classes
- Eliminate the assumeFalse-calls that check the current codec and disable the 
test if SimpleText or Memory is used

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org