Re: IndexFileNames
Doug Cutting wrote: [EMAIL PROTECTED] wrote: --- lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java (original) +++ lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java Mon Jun 6 10:52:12 2005 @@ -52,8 +52,8 @@ if (name.endsWith(.+IndexReader.FILENAME_EXTENSIONS[i])) return true; } - if (name.equals(deletable)) return true; - else if (name.equals(segments)) return true; + if (name.equals(Constants.INDEX_DELETABLE_FILENAME)) return true; + else if (name.equals(Constants.INDEX_SEGMENTS_FILENAME)) return true; else if (name.matches(.+\\.f\\d+)) return true; return false; This really belongs in the index package. That way, when we change the set of files in an index, the changes will be localized. So this, LuceneFileFilter, Constants.INDEX_* and IndexReader.FILENAME_EXTENSIONS, should all be moved to a common home in the index package, like org.apache.lucene.index.IndexFileNames. Thoughts? yes, this makes sense. I will try to pack all the spreaded filenames and extensions together in a new class org.apache.lucene.index.IndexFileNames. So we have everything in one place and it will be much easier to maintain or even to change. Bernhard - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: IndexFileNames
Bernhard Messer wrote: Doug Cutting wrote: [EMAIL PROTECTED] wrote: --- lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java (original) +++ lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java Mon Jun 6 10:52:12 2005 @@ -52,8 +52,8 @@ if (name.endsWith(.+IndexReader.FILENAME_EXTENSIONS[i])) return true; } - if (name.equals(deletable)) return true; - else if (name.equals(segments)) return true; + if (name.equals(Constants.INDEX_DELETABLE_FILENAME)) return true; + else if (name.equals(Constants.INDEX_SEGMENTS_FILENAME)) return true; else if (name.matches(.+\\.f\\d+)) return true; return false; This really belongs in the index package. That way, when we change the set of files in an index, the changes will be localized. So this, LuceneFileFilter, Constants.INDEX_* and IndexReader.FILENAME_EXTENSIONS, should all be moved to a common home in the index package, like org.apache.lucene.index.IndexFileNames. Thoughts? sorry for the confusion. On the first look, i thought the new class IndexFileNames, containing the necessary constant values, fits perfect into org.apache.lucene.index. After a more detailed look, i get the feeling that it would be much better to place the new class into org.apache.store. If done, we can avoid all dependencies within FSDirectory to org.apache.lucene.index, which is very clean. Why not creating a new public final class org.apache.lucene.store.IndexFileNames and move LuceneFileFilter, Constants.INDEX_*, SegmentMerger.COMPOUND_EXTENSIONS, SegmentMerger.VECTOR_EXTENSIONS and IndexReader.FILENAME_EXTENSIONS to it. Does it sound ok ? Bernhard - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: IndexFileNames
Bernhard Messer wrote: sorry for the confusion. On the first look, i thought the new class IndexFileNames, containing the necessary constant values, fits perfect into org.apache.lucene.index. After a more detailed look, i get the feeling that it would be much better to place the new class into org.apache.store. If done, we can avoid all dependencies within FSDirectory to org.apache.lucene.index, which is very clean. I think that's an illusion: the store package would actually become more dependent on the index package. If someone changes the set of files in an index then the changes will not be localized to the index package. Nothing outside of the index package should know anything about the internal structure of an index. If insteaed the index package exposes a public API that permits other packages to inquire whether particular file names belong to an index then only a small dependency on what should be a stable API is exposed. Changes to index structure can be made without changing anything outside of the index package. Why not creating a new public final class org.apache.lucene.store.IndexFileNames and move LuceneFileFilter, Constants.INDEX_*, SegmentMerger.COMPOUND_EXTENSIONS, SegmentMerger.VECTOR_EXTENSIONS and IndexReader.FILENAME_EXTENSIONS to it. I still think this class should be in the index package. I'm not convinced that anything other than the FileFilter needs to be public. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: svn-commit: 168449 FSDirectory
On Tue, 07 Jun 2005 09:13:10 +0200, Bernhard Messer wrote: exactly, that's what i had in mind. I know that we have to allocate a new string object for every call, but this must be much cheaper than the current implementation which has to walk thru the whole array every time the method is called. If you're still concerned about performance you can use a trie data structure, which takes a little effort to build but can match very quickly (with no allocations). I've attached an example implementation that works for filename extensions which only use the letters 'a'-'z' (minimally tested): private static final ExtensionMatcher MATCHER = new ExtensionMatcher(new String[]{ cfs, fnm, fdx, fdt, tii, tis, frq, prx, del, tvx, tvd, tvf, tvp, }); MATCHER.match(foo.cfs); // returns true BTW, I think it is a bad idea to have FILENAME_EXTENSIONS as a public array. It being final does not prevent code from changing the contents of the array. If the extensions must be public I would recommend either an accessor that returns a copy of the array, or an unmodifiable collection/set/list. Chris import java.util.*; public class ExtensionMatcher { private Object[] tree; public ExtensionMatcher(String[] exts) { tree = create(Arrays.asList(exts), 0); } private static Object[] create(List exts, int index) { Object[] array = new Object[27]; Map byLetter = new HashMap(); for (Iterator it = exts.iterator(); it.hasNext();) { String ext = (String)it.next(); int length = ext.length(); if (length index) { Character c = new Character(ext.charAt(length - 1 - index)); List list = (List)byLetter.get(c); if (list == null) byLetter.put(c, list = new ArrayList()); list.add(ext); } else if (length == index) { array[26] = Boolean.TRUE; } } for (Iterator it = byLetter.keySet().iterator(); it.hasNext();) { Character c = (Character)it.next(); char val = c.charValue(); if (val 'a' || val 'z') throw new IllegalArgumentException(Extension must be between 'a' and 'z'); array[val - 'a'] = create((List)byLetter.get(c), index + 1); } return array; } public boolean match(String file) { int index = 0; int length = file.length(); Object[] array = tree; for (;;) { if (index = length) return false; char val = file.charAt(length - 1 - index); if (val == '.' array[26] != null) return true; if (val 'a' || val 'z') return false; array = (Object[])array[val - 'a']; if (array == null) return false; index++; } } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
FAQ entry about deletions
Hi, the FAQ contains this sentence: Document that are deleted really are in deleted (???) What is that supposed to mean? Could someone rephrase it? Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: svn commit: r180010 - in /lucene/java/trunk/src: java/org/apache/lucene/index/IndexReader.java java/org/apache/lucene/index/SegmentInfos.java test/org/apache/lucene/index/TestIndexReader.java
On Monday 06 June 2005 18:23, Bernhard Messer wrote: i would suggest to remove the deprecated flags from methods getCurrentVersion() and lastModified(). Okay, I just did that and rephrased the javadoc so that lastModified should not be used to check whether a reader is up-to-date. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]