[Lucene.Net] [jira] [Updated] (LUCENENET-444) Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish)
[ https://issues.apache.org/jira/browse/LUCENENET-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-444: --- Fix Version/s: (was: Lucene.Net 3.x) Lucene.Net 2.9.4g Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish) Key: LUCENENET-444 URL: https://issues.apache.org/jira/browse/LUCENENET-444 Project: Lucene.Net Issue Type: New Feature Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Some missing stemmers + a modified portuguese stemmer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-444) Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish)
[ https://issues.apache.org/jira/browse/LUCENENET-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-444: --- Attachment: PortugueseStemmer.cs TurkishStemmer.cs RomanianStemmer.cs HungarianStemmer.cs Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish) Key: LUCENENET-444 URL: https://issues.apache.org/jira/browse/LUCENENET-444 Project: Lucene.Net Issue Type: New Feature Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: HungarianStemmer.cs, PortugueseStemmer.cs, RomanianStemmer.cs, TurkishStemmer.cs Some missing stemmers + a modified portuguese stemmer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-444) Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish)
[ https://issues.apache.org/jira/browse/LUCENENET-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-444. Resolution: Fixed Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish) Key: LUCENENET-444 URL: https://issues.apache.org/jira/browse/LUCENENET-444 Project: Lucene.Net Issue Type: New Feature Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: HungarianStemmer.cs, PortugueseStemmer.cs, RomanianStemmer.cs, TurkishStemmer.cs Some missing stemmers + a modified portuguese stemmer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-444) Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish)
[ https://issues.apache.org/jira/browse/LUCENENET-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105418#comment-13105418 ] Digy commented on LUCENENET-444: committed to trunk 2.9.4g branch. Snowball stemmers (Portuguese, Hungarian, Romanian, Turkish) Key: LUCENENET-444 URL: https://issues.apache.org/jira/browse/LUCENENET-444 Project: Lucene.Net Issue Type: New Feature Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: HungarianStemmer.cs, PortugueseStemmer.cs, RomanianStemmer.cs, TurkishStemmer.cs Some missing stemmers + a modified portuguese stemmer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-443) SpellChecker finaliser calls close regardless of if closed already
[ https://issues.apache.org/jira/browse/LUCENENET-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105478#comment-13105478 ] Digy commented on LUCENENET-443: +1 for IDisposable in 2.9.4g (since Analyzers,Searchers,Directories,IndexReader,IndexWriter already implement it). SpellChecker finaliser calls close regardless of if closed already -- Key: LUCENENET-443 URL: https://issues.apache.org/jira/browse/LUCENENET-443 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.2 Reporter: Stuart Robinson Labels: lucene, spellcheck, spellchecker The SpellChecker Class currently has no publicly visible way of accessing the closed field. It also calls close in the finaliser killing the process it is in upon GC as this can throw an exceptin. I propose two changes: Change the already existing method IsClosed() to public: public bool IsClosed() { return closed; } and add a check on this in the finaliser: ~SpellChecker() { if (!IsClosed()) this.Close(); } Ideally this class should implement IDisposable but I think this would be a bigger job than this two line change. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
[ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-414. Resolution: Fixed Fixed. DIGY The definition of CharArraySet is dangerously confusing and leads to bugs when used. Key: LUCENENET-414 URL: https://issues.apache.org/jira/browse/LUCENENET-414 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Irrelevant Reporter: Vincent Van Den Berghe Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Right now, CharArraySet derives from System.Collections.Hashtable, but doesn't actually use this base type for storing elements. However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a System.Collections.Hashtable. The trivial code to build your own stopword set using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of stopwords like this: CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, ignoreCase: false); foreach (string domainSpecificStopWord in DomainSpecificStopWords) stopWords.Add(domainSpecificStopWord); ... will fail because the CharArraySet accepts an ICollection, which will be passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords will only contain the DomainSpecificStopWords, and not those from STOP_WORDS_SET. One workaround would be to replace the first line with this: CharArraySet stopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + DomainSpecificStopWords.Length, ignoreCase: false); foreach (string domainSpecificStopWord in (CharArraySet)StandardAnalyzer.STOP_WORDS_SET) stopWords.Add(domainSpecificStopWord); ... but this makes use of the implementation detail (the STOP_WORDS_SET is really an UnmodifiableCharArraySet which is itself a CharArraySet). It works because it forces the foreach() to use the correct CharArraySet.GetEnumerator(), which is defined as a new method (this has a bad code smell to it) At least 2 possibilities exist to solve this problem: - Make CharArraySet use the Hashtable instance and a custom comparator, instead of its own implementation. - Make CharArraySet use HashSetchar[], defined in .NET 4.0. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-442) ParallelMultiSearcher threads don't handle all exceptions
[ https://issues.apache.org/jira/browse/LUCENENET-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-442: --- Attachment: LUCENENET-442.patch Thanks Andy. Nice catch. I prepared a patch for 2.9.4g and will commit to 2.9.4g branch trunk soon. DIGY ParallelMultiSearcher threads don't handle all exceptions - Key: LUCENENET-442 URL: https://issues.apache.org/jira/browse/LUCENENET-442 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Reporter: Andy Twidle Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: LUCENENET-442.patch The ParallelMultiSearcher doesn't allow non-IOException exceptions to be managed by the calling application. LUCENENET-388 worked around one specific example of this, but any genuine Lucene exception (eg: BooleanQuery.TooManyClauses) will also fall foul of this pattern. In our specific instance we could treat the symptoms and up the max clause count, but I'm sure there will be more. Could the System.IOException be generalised to System.Exception? Or would that be too much deviation from the Java code base? -- Example stack trace of an exception thrown by a Searcher executed: Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: Lucene.Net.Search.BooleanQuery+TooManyClauses Stack: at Lucene.Net.Search.BooleanQuery.Add(Lucene.Net.Search.BooleanClause) at Lucene.Net.Search.BooleanQuery.Add(Lucene.Net.Search.Query, Occur) at Lucene.Net.Search.PrefixQuery.Rewrite(Lucene.Net.Index.IndexReader) at Lucene.Net.Search.BooleanQuery.Rewrite(Lucene.Net.Index.IndexReader) at Lucene.Net.Search.IndexSearcher.Rewrite(Lucene.Net.Search.Query) at Lucene.Net.Search.Query.Weight(Lucene.Net.Search.Searcher) at Lucene.Net.Search.Searcher.CreateWeight(Lucene.Net.Search.Query) at Lucene.Net.Search.Searcher.Search(Lucene.Net.Search.Query, Lucene.Net.Search.Filter, Lucene.Net.Search.HitCollector) at Lucene.Net.Search.Searcher.Search(Lucene.Net.Search.Query, Lucene.Net.Search.HitCollector) at Lucene.Net.Search.QueryWrapperFilter.Bits(Lucene.Net.Index.IndexReader) at Lucene.Net.Search.CachingWrapperFilter.Bits(Lucene.Net.Index.IndexReader) at Lucene.Net.Search.IndexSearcher.Search(Lucene.Net.Search.Weight, Lucene.Net.Search.Filter, Lucene.Net.Search.HitCollector) at Lucene.Net.Search.IndexSearcher.Search(Lucene.Net.Search.Weight, Lucene.Net.Search.Filter, Int32) at Lucene.Net.Search.MultiSearcherThread.Run() at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at System.Threading.ThreadHelper.ThreadStart() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-358) CloseableThreadLocal memory leak in LocalDataStoreSlot (with workaround)
[ https://issues.apache.org/jira/browse/LUCENENET-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092541#comment-13092541 ] Digy commented on LUCENENET-358: New CloseableThreadLocal implementation and its test case committed to trunk. DIGY CloseableThreadLocal memory leak in LocalDataStoreSlot (with workaround) - Key: LUCENENET-358 URL: https://issues.apache.org/jira/browse/LUCENENET-358 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Environment: Microsoft WIndows Server 2008 Enterprise x64. SP2. .NET Framework 4.0 Reporter: Rezgar Cadro Assignee: Digy Priority: Critical Labels: memory, CloseableThreadLocal, LocalDataStoreSlot, leak Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: CloseableThreadLocal MemoryLeak.patch, CloseableThreadLocal.diff, CloseableThreadLocal.diff, CloseableThreadLocal.patch, TestMemLeakage.zip Recently we have been suffering from a severe memory leak when executing intense open/close operations on IndexSearcher and IndexModifier. Memory profiling showed that memory is being held by LocalDataStore[] objects. After some digging, the root of the problem has been found in CloseableThreadLocal class: private System.LocalDataStoreSlot t = System.Threading.Thread.AllocateDataSlot(); What we see is that every instantiated object of CloseableThreadLocal causes new data slot allocation performed for every thread. Thread.AllocateDataSlot() does not simply allocate a new slot, replacing an old one, but enlarging an existing buffer in-thread, appending data to the end of internal LocalDataStore[] collection, which causes a severe memory leak . As long as t variable is instantiated on every object creation, and (in current class implementation) every object is used by a single thread, replacing private System.LocalDataStoreSlot t = System.Threading.Thread.AllocateDataSlot(); with simple private object dataSlot; and removing hardRefs Dictionary solves the problem and prevents memory leak. We have tried to implement the expected behavior by using [ThreadStatic] attribute instead of LocalDataStoreSlot, but the attempt failed because of unexpected exceptions being thrown. Patch can be found at Lucene.Net repository under -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-358) CloseableThreadLocal memory leak in LocalDataStoreSlot (with workaround)
[ https://issues.apache.org/jira/browse/LUCENENET-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-358: --- Attachment: TestMemLeakage.zip TestMemLeakage.zip shows that bug. CloseableThreadLocal memory leak in LocalDataStoreSlot (with workaround) - Key: LUCENENET-358 URL: https://issues.apache.org/jira/browse/LUCENENET-358 Project: Lucene.Net Issue Type: Bug Environment: Microsoft WIndows Server 2008 Enterprise x64. SP2. .NET Framework 4.0 Reporter: Rezgar Cadro Assignee: Digy Priority: Critical Labels: memory, CloseableThreadLocal, LocalDataStoreSlot, leak Attachments: CloseableThreadLocal MemoryLeak.patch, CloseableThreadLocal.diff, CloseableThreadLocal.diff, CloseableThreadLocal.patch, TestMemLeakage.zip Recently we have been suffering from a severe memory leak when executing intense open/close operations on IndexSearcher and IndexModifier. Memory profiling showed that memory is being held by LocalDataStore[] objects. After some digging, the root of the problem has been found in CloseableThreadLocal class: private System.LocalDataStoreSlot t = System.Threading.Thread.AllocateDataSlot(); What we see is that every instantiated object of CloseableThreadLocal causes new data slot allocation performed for every thread. Thread.AllocateDataSlot() does not simply allocate a new slot, replacing an old one, but enlarging an existing buffer in-thread, appending data to the end of internal LocalDataStore[] collection, which causes a severe memory leak . As long as t variable is instantiated on every object creation, and (in current class implementation) every object is used by a single thread, replacing private System.LocalDataStoreSlot t = System.Threading.Thread.AllocateDataSlot(); with simple private object dataSlot; and removing hardRefs Dictionary solves the problem and prevents memory leak. We have tried to implement the expected behavior by using [ThreadStatic] attribute instead of LocalDataStoreSlot, but the attempt failed because of unexpected exceptions being thrown. Patch can be found at Lucene.Net repository under -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-441) Encountered: EOF after : \\\\ during parsing a query
[ https://issues.apache.org/jira/browse/LUCENENET-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091175#comment-13091175 ] Digy commented on LUCENENET-441: What does your query look like? What is your question? Encountered: EOF after : during parsing a query -- Key: LUCENENET-441 URL: https://issues.apache.org/jira/browse/LUCENENET-441 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: .Net Framework 4.0 Reporter: Maverick904 Cannot parse '\': Lexical error at line 1, column 4. Encountered: EOF after : |at Lucene.Net.QueryParsers.QueryParser.Parse(String query) at Lucene.Net.QueryParsers.MultiFieldQueryParser.Parse(Version matchVersion, String query, String[] fields, Occur[] flags, Analyzer analyzer) at Lucene.Net.QueryParsers.MultiFieldQueryParser.Parse(String query, String[] fields, Occur[] flags, Analyzer analyzer) Lexical error at line 1, column 4. Encountered: EOF after : | at Lucene.Net.QueryParsers.QueryParserTokenManager.GetNextToken() at Lucene.Net.QueryParsers.QueryParser.Jj_ntk() at Lucene.Net.QueryParsers.QueryParser.Modifiers() at Lucene.Net.QueryParsers.QueryParser.Query(String field) at Lucene.Net.QueryParsers.QueryParser.Parse(String query) | -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Issue Comment Edited] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067389#comment-13067389 ] Digy edited comment on LUCENENET-437 at 7/18/11 11:28 PM: -- bq. It ensures equality, but does not ensure inequality. Sorry but I must object again. It ensures inequality, but doesn't ensure equality.(if hashcodes are not equal objects are not also, but having the same hashcode doesn't say anything about equality) was (Author: digydigy): bq. It ensures equality, but does not ensure inequality. Sorry but I must object again. It ensures inequality, but doesn't ensure equality. Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067409#comment-13067409 ] Digy commented on LUCENENET-437: Already fixed in 2.9.4g Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-437: --- Fix Version/s: Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067361#comment-13067361 ] Digy commented on LUCENENET-437: Java Docs says: public boolean equals(Object o) Compares the specified object with this list for equality. Returns true if and only if the specified object is also a list, both lists have the same size, and *all corresponding pairs of elements in the two lists are equal* [No reference for Hashcode - DIGY]. (Two elements e1 and e2 are equal if (e1==null ? e2==null : e1.equals(e2)).) In other words, two lists are defined to be equal if they contain the same elements in the same order. This definition ensures that the equals method works properly across different implementations of the List interface. Yes, the sample was from Eric Lippert's blog to show why GetHashCode should not be used for comparing objects. bq. The issue you're describing is more of a problem with the .NET implementation of GetHashcode() rather than the correctness of using hashcode for comparison. No, the problem is not in the implementation of GetHashCode. In any implementation, you may have some unexpected collisions(since it is a 4-byte number). GetHashCode isn't meant for uniqueness or object identification. It's meant to provide a random distribution. Therefore the problem really lies in using it for equality comparison. DIGY Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067366#comment-13067366 ] Digy commented on LUCENENET-437: See even with worst implementation, Equals method should work. {code} /// private void Form1_Load(object sender, EventArgs e) { Hashtable h = new Hashtable(); MyClass m1 = new MyClass() { I = 1 }; MyClass m2 = new MyClass() { I = 2 }; h.Add(m1,1); h.Add(m2,2); System.Diagnostics.Debug.Assert(h[m2].Equals(2)); } public class MyClass { public int I; public override int GetHashCode() { return 1; } } {code} Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067375#comment-13067375 ] Digy commented on LUCENENET-437: bq. This ensures that list1.equals(list2) implies that list1.hashCode()==list2.hashCode() for any two lists, list1 and list2, as required by the general contract of Object.hashCode. but it doesn't ensure that if list1.hashCode()==list2.hashCode() then list1.equals(list2) should be true, as I showed using Eric Lippert's sample. Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067389#comment-13067389 ] Digy commented on LUCENENET-437: bq. It ensures equality, but does not ensure inequality. Sorry but I must object again. It ensures inequality, but doesn't ensure equality. Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-437: --- Affects Version/s: Lucene.Net 2.9.4g Fix Version/s: Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066717#comment-13066717 ] Digy commented on LUCENENET-437: Using HashCode for equality comparison is not a good idea. {code} /// ListComparerstring comp = new ListComparerstring(); b = comp.Equals(new Liststring() { \uA0A2\uA0A2 }, new Liststring() { }); System.Diagnostics.Debug.Assert(b == false); b = comp.Equals(new Liststring() { \uA0A2\uA0A2 }, new Liststring() { \uA0A2\uA0A2\uA0A2\uA0A2 }); System.Diagnostics.Debug.Assert(b == false); b = new Lucene.Net.Support.EquatableListstring(){\uA0A2\uA0A2 }.Equals(new Lucene.Net.Support.EquatableListstring() {}); System.Diagnostics.Debug.Assert(b == false); new Lucene.Net.Support.EquatableListstring() { \uA0A2\uA0A2 }.Equals(new Lucene.Net.Support.EquatableListstring() { \uA0A2\uA0A2\uA0A2\uA0A2}); System.Diagnostics.Debug.Assert(b == false); /// {code} DIGY Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Reopened] (LUCENENET-437) Port Contrib.Shingle from Java
[ https://issues.apache.org/jira/browse/LUCENENET-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy reopened LUCENENET-437: Port Contrib.Shingle from Java -- Key: LUCENENET-437 URL: https://issues.apache.org/jira/browse/LUCENENET-437 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Port Contrib.Shingle from Java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-434) Remove AnonymousXXXX classes to increase readablity
[ https://issues.apache.org/jira/browse/LUCENENET-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062351#comment-13062351 ] Digy commented on LUCENENET-434: very nice. Remove Anonymous classes to increase readablity --- Key: LUCENENET-434 URL: https://issues.apache.org/jira/browse/LUCENENET-434 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4g Reporter: Scott Lombard Assignee: Scott Lombard Fix For: Lucene.Net 2.9.4g Attachments: TeeSinkTokenFilter.patch Original Estimate: 168h Time Spent: 5h Remaining Estimate: 163h Replace Anonymous classes inhereted from JLCA which make the code impossible to read. Follow Digy's template to replace the single abstract method with Func or Action like in FilterCacheT from: protected abstract object MergeDeletes(IndexReader reader, object value); to: FuncIndexReader, object, object MergeDeletes; Determine a solution to the classes with more than 1 abstract method without diverging much from Java. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-432) Concurrency issues in SegmentInfo.Files() (LUCENE-2584)
[ https://issues.apache.org/jira/browse/LUCENENET-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-432. Resolution: Fixed Fix Version/s: Lucene.Net 2.9.4 Lucene.Net 2.9.2 Assignee: Digy Patch committed to trunk 2.9.4g branch Concurrency issues in SegmentInfo.Files() (LUCENE-2584) --- Key: LUCENENET-432 URL: https://issues.apache.org/jira/browse/LUCENENET-432 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Digy Assignee: Digy Fix For: Lucene.Net 2.9.2, Lucene.Net 2.9.4 Attachments: SegmentInfo.patch The multi-threaded call of the files() in SegmentInfo could lead to the ConcurrentModificationException if one thread is not finished additions to the ArrayList (files) yet while the other thread already obtained it as cached. https://issues.apache.org/jira/browse/LUCENE-2584 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-430) Contrib.ChainedFilter
[ https://issues.apache.org/jira/browse/LUCENENET-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-430. Resolution: Fixed Instead of creating a small project, I put it into Contrib.Analyzers. Contrib.ChainedFilter - Key: LUCENENET-430 URL: https://issues.apache.org/jira/browse/LUCENENET-430 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4g Attachments: ChainedFilter.cs, ChainedFilterTest.cs Port of lucene.Java 3.0.3's ChainedFilter its test cases. See the StackOverflow question: How to combine multiple filters within one search? http://stackoverflow.com/questions/6570477/multiple-filters-in-lucene-net -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-418) LuceneTestCase should not have a static method could throw exceptions.
[ https://issues.apache.org/jira/browse/LUCENENET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061125#comment-13061125 ] Digy commented on LUCENENET-418: It works! Thanks. DIGY LuceneTestCase should not have a static method could throw exceptions. Key: LUCENENET-418 URL: https://issues.apache.org/jira/browse/LUCENENET-418 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Test Affects Versions: Lucene.Net 3.x Environment: Linux, OSX, etc Reporter: michael herndon Assignee: michael herndon Labels: test Fix For: Lucene.Net 2.9.4g Original Estimate: 2m Remaining Estimate: 2m Throwing an exception in a base classes for 90% tests in a static method makes it hard to debug the issue in nunit. The test results came back saying that TestFixtureSetup was causing an issue even though it was the Static Constructor causing problems and this then propagates to all the tests that stem from LuceneTestCase. The TEMP_DIR needs to be moved to a static util class as a property or even a mixin method. This caused me hours to debug and figure out the real issue as the underlying exception method never bubbled up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-433) AttributeSource can have an invalid computed state (LUCENE-3042)
AttributeSource can have an invalid computed state (LUCENE-3042) Key: LUCENENET-433 URL: https://issues.apache.org/jira/browse/LUCENENET-433 Project: Lucene.Net Issue Type: Bug Reporter: Digy Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g If you work a tokenstream, consume it, then reuse it and add an attribute to it, the computed state is wrong. thus for example, clearAttributes() will not actually clear the attribute added. So in some situations, addAttribute is not actually clearing the computed state when it should. https://issues.apache.org/jira/browse/LUCENE-3042 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-433) AttributeSource can have an invalid computed state (LUCENE-3042)
[ https://issues.apache.org/jira/browse/LUCENENET-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061214#comment-13061214 ] Digy commented on LUCENENET-433: Here is the test case {code} [Test] public void Test_LUCENE_3042_LUCENENET_433() { String testString = t; Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(); TokenStream stream = analyzer.ReusableTokenStream(dummy, new System.IO.StringReader(testString)); stream.Reset(); while (stream.IncrementToken()) { // consume } stream.End(); stream.Close(); AssertAnalyzesToReuse(analyzer, testString, new String[] { t }); } {code} AttributeSource can have an invalid computed state (LUCENE-3042) Key: LUCENENET-433 URL: https://issues.apache.org/jira/browse/LUCENENET-433 Project: Lucene.Net Issue Type: Bug Reporter: Digy Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g If you work a tokenstream, consume it, then reuse it and add an attribute to it, the computed state is wrong. thus for example, clearAttributes() will not actually clear the attribute added. So in some situations, addAttribute is not actually clearing the computed state when it should. https://issues.apache.org/jira/browse/LUCENE-3042 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-172) This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass
[ https://issues.apache.org/jira/browse/LUCENENET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-172. Resolution: Fixed Assignee: Digy (was: Scott Lombard) Fixed in 2.9.4g. No fix for 2.9.4 This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass --- Key: LUCENENET-172 URL: https://issues.apache.org/jira/browse/LUCENENET-172 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Affects Versions: Lucene.Net 2.3.1, Lucene.Net 2.3.2 Reporter: Ben Martz Assignee: Digy Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: lucene_2.3.1_exceptions_fix.patch, lucene_2.9.4g_exceptions_fix The java version of Lucene handles end-of-file in FastCharStream by throwing an exception. This behavior has been ported to .NET but the behavior carries an unacceptable cost in the .NET environment. This patch is based on the prior work in LUCENENET-8 and LUCENENET-11, which I gratefully acknowledge for the solution. While I understand that this patch is outside of the current project specification in that it deviates from the pure nature of the port, I believe that it is very important to make the patch available to any developer looking to leverage Lucene.Net in their project. Thanks for your consideration. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-433) AttributeSource can have an invalid computed state (LUCENE-3042)
[ https://issues.apache.org/jira/browse/LUCENENET-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061304#comment-13061304 ] Digy commented on LUCENENET-433: Committed to 2.9.4g branch AttributeSource can have an invalid computed state (LUCENE-3042) Key: LUCENENET-433 URL: https://issues.apache.org/jira/browse/LUCENENET-433 Project: Lucene.Net Issue Type: Bug Reporter: Digy Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: LUCENENET-433.patch If you work a tokenstream, consume it, then reuse it and add an attribute to it, the computed state is wrong. thus for example, clearAttributes() will not actually clear the attribute added. So in some situations, addAttribute is not actually clearing the computed state when it should. https://issues.apache.org/jira/browse/LUCENE-3042 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-172) This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass
[ https://issues.apache.org/jira/browse/LUCENENET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061595#comment-13061595 ] Digy commented on LUCENENET-172: Already fixed for 2.9.4g This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass --- Key: LUCENENET-172 URL: https://issues.apache.org/jira/browse/LUCENENET-172 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Affects Versions: Lucene.Net 2.3.1, Lucene.Net 2.3.2 Reporter: Ben Martz Assignee: Scott Lombard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: lucene_2.3.1_exceptions_fix.patch, lucene_2.9.4g_exceptions_fix The java version of Lucene handles end-of-file in FastCharStream by throwing an exception. This behavior has been ported to .NET but the behavior carries an unacceptable cost in the .NET environment. This patch is based on the prior work in LUCENENET-8 and LUCENENET-11, which I gratefully acknowledge for the solution. While I understand that this patch is outside of the current project specification in that it deviates from the pure nature of the port, I believe that it is very important to make the patch available to any developer looking to leverage Lucene.Net in their project. Thanks for your consideration. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-172) This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass
[ https://issues.apache.org/jira/browse/LUCENENET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-172: --- Fix Version/s: Lucene.Net 2.9.4g This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass --- Key: LUCENENET-172 URL: https://issues.apache.org/jira/browse/LUCENENET-172 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Affects Versions: Lucene.Net 2.3.1, Lucene.Net 2.3.2 Reporter: Ben Martz Assignee: Scott Lombard Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: lucene_2.3.1_exceptions_fix.patch, lucene_2.9.4g_exceptions_fix The java version of Lucene handles end-of-file in FastCharStream by throwing an exception. This behavior has been ported to .NET but the behavior carries an unacceptable cost in the .NET environment. This patch is based on the prior work in LUCENENET-8 and LUCENENET-11, which I gratefully acknowledge for the solution. While I understand that this patch is outside of the current project specification in that it deviates from the pure nature of the port, I believe that it is very important to make the patch available to any developer looking to leverage Lucene.Net in their project. Thanks for your consideration. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases
[ https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060709#comment-13060709 ] Digy commented on LUCENENET-431: Thanks Olle and Matt, I committed the LUCENE-1930 patch to the 2.9.4g branch (+ added Olle's test case). (Another divergence from lucene.java; since this patch is still waiting to be applied). DIGY Spatial.Net Cartesian won't find docs in radius in certain cases Key: LUCENENET-431 URL: https://issues.apache.org/jira/browse/LUCENENET-431 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4 Environment: Windows 7 x64 Reporter: Olle Jacobsen Labels: spatialsearch To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the following witch should return 3 results. Line 42: private double _lat = 55.6880508001; 43: private double _lng = 13.5871808352; // This passes: 13.6271808352 73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673); 74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965); 75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607); 76: AddPoint(writer, Close but not in radius, 55.8634157297, 13.5497731987); 77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936); 130: const double miles = 5.0; 156: Console.WriteLine(Distances should be 3 + distances.Count); 157: Console.WriteLine(Results should be 3 + results); 159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed distances 160: Assert.AreEqual(3, results); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases
[ https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-431. Resolution: Fixed Fix Version/s: Lucene.Net 2.9.4g Assignee: Digy Spatial.Net Cartesian won't find docs in radius in certain cases Key: LUCENENET-431 URL: https://issues.apache.org/jira/browse/LUCENENET-431 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4 Environment: Windows 7 x64 Reporter: Olle Jacobsen Assignee: Digy Labels: spatialsearch Fix For: Lucene.Net 2.9.4g To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the following witch should return 3 results. Line 42: private double _lat = 55.6880508001; 43: private double _lng = 13.5871808352; // This passes: 13.6271808352 73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673); 74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965); 75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607); 76: AddPoint(writer, Close but not in radius, 55.8634157297, 13.5497731987); 77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936); 130: const double miles = 5.0; 156: Console.WriteLine(Distances should be 3 + distances.Count); 157: Console.WriteLine(Results should be 3 + results); 159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed distances 160: Assert.AreEqual(3, results); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-418) LuceneTestCase should not have a static method could throw exceptions.
[ https://issues.apache.org/jira/browse/LUCENENET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059955#comment-13059955 ] Digy commented on LUCENENET-418: It fails in both builds. LuceneTestCase should not have a static method could throw exceptions. Key: LUCENENET-418 URL: https://issues.apache.org/jira/browse/LUCENENET-418 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Test Affects Versions: Lucene.Net 3.x Environment: Linux, OSX, etc Reporter: michael herndon Assignee: michael herndon Labels: test Fix For: Lucene.Net 2.9.4g Original Estimate: 2m Remaining Estimate: 2m Throwing an exception in a base classes for 90% tests in a static method makes it hard to debug the issue in nunit. The test results came back saying that TestFixtureSetup was causing an issue even though it was the Static Constructor causing problems and this then propagates to all the tests that stem from LuceneTestCase. The TEMP_DIR needs to be moved to a static util class as a property or even a mixin method. This caused me hours to debug and figure out the real issue as the underlying exception method never bubbled up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-430) Contrib.ChainedFilter
[ https://issues.apache.org/jira/browse/LUCENENET-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-430: --- Attachment: ChainedFilterTest.cs ChainedFilter.cs Contrib.ChainedFilter - Key: LUCENENET-430 URL: https://issues.apache.org/jira/browse/LUCENENET-430 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4g Attachments: ChainedFilter.cs, ChainedFilterTest.cs Port of lucene.Java 3.0.3's ChainedFilter its test cases. See the StackOverflow question: How to combine multiple filters within one search? http://stackoverflow.com/questions/6570477/multiple-filters-in-lucene-net -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-430) Contrib.ChainedFilter
Contrib.ChainedFilter - Key: LUCENENET-430 URL: https://issues.apache.org/jira/browse/LUCENENET-430 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4g Attachments: ChainedFilter.cs, ChainedFilterTest.cs Port of lucene.Java 3.0.3's ChainedFilter its test cases. See the StackOverflow question: How to combine multiple filters within one search? http://stackoverflow.com/questions/6570477/multiple-filters-in-lucene-net -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Closed] (LUCENENET-428) How to do that the results are displayed in the first original tokens and them with synonyms?
[ https://issues.apache.org/jira/browse/LUCENENET-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy closed LUCENENET-428. -- Resolution: Invalid Please post questions to the mailing list, not in JIRA How to do that the results are displayed in the first original tokens and them with synonyms? - Key: LUCENENET-428 URL: https://issues.apache.org/jira/browse/LUCENENET-428 Project: Lucene.Net Issue Type: Wish Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.4 Environment: .net 4.0 Reporter: Vladimir How to do that the results are displayed in the first original tokens and them with synonyms? My Analyzer(part) : public override TokenStream TokenStream(string fieldName, TextReader reader) { TokenStream result = new StandardTokenizer(reader); result = new LowerCaseFilter(result); result = new StopFilter(result, stoptable); result = new SynonymFilter(result, synonymEngine); result = new ExtendedRussianStemFilter(result, charset); return result; } My SynonymFilter : internal class SynonymFilter : TokenFilter { private readonly ISynonymEngine engine; private readonly QueueToken synonymTokenQueue = new QueueToken(); public SynonymFilter(TokenStream tokenStream, ISynonymEngine engine) : base(tokenStream) { this.engine = engine; } public override Token Next() { if (synonymTokenQueue.Count 0) { return synonymTokenQueue.Dequeue(); } Token t = input.Next(); if (t == null) return null; if (t.Type() == SYNONYM) return t; IEnumerablestring synonyms = engine.GetSynonyms(t.TermText()); if (synonyms == null) { return t; } foreach (string syn in synonyms) { if (!t.TermText().Equals(syn)) { var synToken = new Token(syn, t.StartOffset(), t.EndOffset(), SYNONYM); synToken.SetPositionIncrement(0); synonymTokenQueue.Enqueue(synToken); } } return t; } } Thanks! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-429) Corrupted segment file not detected and wipes index contents (LUCENE-3255)
Corrupted segment file not detected and wipes index contents (LUCENE-3255) -- Key: LUCENENET-429 URL: https://issues.apache.org/jira/browse/LUCENENET-429 Project: Lucene.Net Issue Type: New Feature Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4g https://issues.apache.org/jira/browse/LUCENE-3255 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-429) Corrupted segment file not detected and wipes index contents (LUCENE-3255)
[ https://issues.apache.org/jira/browse/LUCENENET-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-429: --- Attachment: LUCENENET-429.patch Corrupted segment file not detected and wipes index contents (LUCENE-3255) -- Key: LUCENENET-429 URL: https://issues.apache.org/jira/browse/LUCENENET-429 Project: Lucene.Net Issue Type: New Feature Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4g Attachments: LUCENENET-429.patch https://issues.apache.org/jira/browse/LUCENE-3255 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-427) Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234)
[ https://issues.apache.org/jira/browse/LUCENENET-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-427: --- Attachment: FastVectorHighlighter.patch Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234) --- Key: LUCENENET-427 URL: https://issues.apache.org/jira/browse/LUCENENET-427 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4g Attachments: FastVectorHighlighter.patch https://issues.apache.org/jira/browse/LUCENE-3234 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-427) Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234)
[ https://issues.apache.org/jira/browse/LUCENENET-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-427. Resolution: Fixed Committed Provide limit on phrase analysis in FastVectorHighlighter (LUCENE-3234) --- Key: LUCENENET-427 URL: https://issues.apache.org/jira/browse/LUCENENET-427 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4g Attachments: FastVectorHighlighter.patch https://issues.apache.org/jira/browse/LUCENE-3234 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3234) Provide limit on phrase analysis in FastVectorHighlighter
[ https://issues.apache.org/jira/browse/LUCENE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054935#comment-13054935 ] Digy commented on LUCENE-3234: -- I am not sure how much it is related to this issue but there was a similar issue in Lucene.Net. https://issues.apache.org/jira/browse/LUCENENET-350 Provide limit on phrase analysis in FastVectorHighlighter - Key: LUCENE-3234 URL: https://issues.apache.org/jira/browse/LUCENE-3234 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3 Reporter: Mike Sokolov Assignee: Koji Sekiguchi Fix For: 3.4, 4.0 Attachments: LUCENE-3234.patch, LUCENE-3234.patch, LUCENE-3234.patch With larger documents, FVH can spend a lot of time trying to find the best-scoring snippet as it examines every possible phrase formed from matching terms in the document. If one is willing to accept less-than-perfect scoring by limiting the number of phrases that are examined, substantial speedups are possible. This is analogous to the Highlighter limit on the number of characters to analyze. The patch includes an artifical test case that shows 1000x speedup. In a more normal test environment, with English documents and random queries, I am seeing speedups of around 3-10x when setting phraseLimit=1, which has the effect of selecting the first possible snippet in the document. Most of our sites operate in this way (just show the first snippet), so this would be a big win for us. With phraseLimit = -1, you get the existing FVH behavior. At larger values of phraseLimit, you may not get substantial speedup in the normal case, but you do get the benefit of protection against blow-up in pathological cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual
[ https://issues.apache.org/jira/browse/LUCENENET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053722#comment-13053722 ] Digy commented on LUCENENET-426: 10 min. work done. DIGY Mark BaseFragmentsBuilder methods as virtual Key: LUCENENET-426 URL: https://issues.apache.org/jira/browse/LUCENENET-426 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, Lucene.Net 2.9.4g Reporter: Itamar Syn-Hershko Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: fvh.patch Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless to have FragmentsBuilder deriving from a class named Base, since most of its functionality cannot be overridden. Attached is a patch for marking the important methods virtual. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-426) Mark BaseFragmentsBuilder methods as virtual
[ https://issues.apache.org/jira/browse/LUCENENET-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-426. Resolution: Fixed Fix Version/s: Lucene.Net 2.9.4g Lucene.Net 2.9.4 Thanks Itamar. Fixed in trunk 2.9.4g branch. DIGY Mark BaseFragmentsBuilder methods as virtual Key: LUCENENET-426 URL: https://issues.apache.org/jira/browse/LUCENENET-426 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x, Lucene.Net 2.9.4g Reporter: Itamar Syn-Hershko Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: fvh.patch Without marking methods in BaseFragmentsBuilder as virtual, it is meaningless to have FragmentsBuilder deriving from a class named Base, since most of its functionality cannot be overridden. Attached is a patch for marking the important methods virtual. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values
[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049937#comment-13049937 ] Digy commented on LUCENENET-417: Maybe something like this doc.Add(new Field(name,- doc.Add(new Field(metadata,- doc.Add(new Field(content,part1- doc.Add(new Field(content,part2- doc.Add(new Field(content,partN- DIGY implement streams as field values - Key: LUCENENET-417 URL: https://issues.apache.org/jira/browse/LUCENENET-417 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Christopher Currens Attachments: StreamValues.patch Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. -Java lucene has the ability to use a TextReader the both analyze and store text in the index.- Lucene.NET lacks the ability to store string data in the index via streams. This should be a feature added into Lucene .NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049322#comment-13049322 ] Digy commented on LUCENENET-425: I would like to compare search speeds of FSDirectory and MMapDirectory (if possible, with a big index) 1- IndexReader.Open( FSDirectory.Open(new System.IO.FileInfo(INDEX)) ,true ); 2- IndexReader.Open( new MMapDirectory(new System.IO.FileInfo(INDEX)) ,true ); Something like {code} public class TestSearchSpeed { string INDEX = @Path to Index; // string[] _words = new string[]{}; //Some words to search Directory _dir; public long TestFSDir() { _dir = FSDirectory.Open(new System.IO.FileInfo(INDEX)); return Test(); } public long TestMMapDir() { _dir = new MMapDirectory(new System.IO.FileInfo(INDEX)); return Test(); } long Test() { IndexReader reader = IndexReader.Open(_dir, true); Search(reader, sometext); Stopwatch sw = new Stopwatch(); sw.Start(); for (int i = 0; i 5; i++) { Parallel.For(0, 50, j = { Search(reader, _words[j % _words.Length]); } ); } long dur = sw.ElapsedMilliseconds; sw.Stop(); reader.Close(); return dur; } void Search(IndexReader reader,string criteria) { IndexSearcher src = new IndexSearcher(reader); Query q = new QueryParser(field, new WhitespaceAnalyzer()).Parse(criteria); TopDocs hits = src.Search(q, 100); for (int i = 0; i hits.ScoreDocs.Length; i++) { Document doc = reader.Document(hits.ScoreDocs[i].doc); string s = doc.GetField(field).StringValue(); } } } {code} DIGY MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-423) QueryParser differences between Java and .NET
[ https://issues.apache.org/jira/browse/LUCENENET-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049377#comment-13049377 ] Digy commented on LUCENENET-423: I don't think there is an inconsistency between the Java version and .NET. If you know that the field is indexed as date, then you should give your date-string (while searching) in the form the language can parse. (And both languages UIs return datetime string parseble by other libraries. It is not common that the user types the datetime string in a textbox) DIGY QueryParser differences between Java and .NET - Key: LUCENENET-423 URL: https://issues.apache.org/jira/browse/LUCENENET-423 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Christopher Currens When trying to do a RangeQuery that uses dates in a certain format, .NET behaves differently from its Java counterpart. The code is the same between them, but as far as I can tell, it appears that it is a difference in the way Java parses dates vs how .NET parses dates. To reproduce: {code:java} var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, FullText, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29)); var query = queryParser.Parse(Field:[2001-01-17 TO 2001-01-20]); {code} You'll notice that query looks like the old DateField format (eg 0g1d64542). If you do the same query in Java (or Luke), you'll notice the query gets parsed as if it were a RangeQuery of string. AFAIK, Java cannot parse a string formatted in that way. If you change the string to use / instead of - in the java, you'll get one that uses DateResolutions and DateTools.DateToString(). It seems an appropriate fix for this, if we wanted to keep this behavior similar to Java, would be to write our own DateTime parser that behaved the same way to Java's parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values
[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049350#comment-13049350 ] Digy commented on LUCENENET-417: Maybe, this is a stupid question but, what is the reason to index a very large doc? If I indexed a whole book as single document, It would appear in almost every kind of search's result sets. search computer -- this book. search sport -- this book. search politics -- this book. DIGY implement streams as field values - Key: LUCENENET-417 URL: https://issues.apache.org/jira/browse/LUCENENET-417 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Core Reporter: Christopher Currens Attachments: StreamValues.patch Adding binary values to a field is an expensive operation, as the whole binary data must be loaded into memory and then written to the index. Adding the ability to use a stream instead of a byte array could not only speed up the indexing process, but reducing the memory footprint as well. -Java lucene has the ability to use a TextReader the both analyze and store text in the index.- Lucene.NET lacks the ability to store string data in the index via streams. This should be a feature added into Lucene .NET as well. My thoughts are to add another Field constructor, that is Field(string name, System.IO.Stream stream, System.Text.Encoding encoding), that will allow the text to be analyzed and stored into the index. Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049473#comment-13049473 ] Digy commented on LUCENENET-425: OK, I think it will be better to mark MMapDirectory as unimplemented like NIOFSDirectory. DIGY MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Closed] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy closed LUCENENET-425. -- Resolution: Won't Fix MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-425: --- Description: Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY was: Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-425: --- Attachment: MMapDirectory.patch MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-425) MMapDirectory implementation
MMapDirectory implementation Key: LUCENENET-425 URL: https://issues.apache.org/jira/browse/LUCENENET-425 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4g Reporter: Digy Priority: Trivial Fix For: Lucene.Net 2.9.4g Attachments: MMapDirectory.patch Since this is not a direct port of MMapDirectory.java, I'll put it under Support and implement MMapDirectory as {code} public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory:Lucene.Net.Support.MemoryMappedDirectory { } {code} If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 bit address range), it will default to FSDirectory.FSIndexInput In my tests, I didn't see any performance gain in 32bit environment and I consider it as better then nothing. I would be happy if someone could send test results on 64bit platform. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-424) IsolatedStorage Support for Windows Phone 7
[ https://issues.apache.org/jira/browse/LUCENENET-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-424: --- Attachment: TestIsolatedStorageDirectory.cs Test cases for IsolatedStorageDirectory. (Doesn't IsolatedStorageDirectory have a public constructor?) IsolatedStorage Support for Windows Phone 7 --- Key: LUCENENET-424 URL: https://issues.apache.org/jira/browse/LUCENENET-424 Project: Lucene.Net Issue Type: Task Components: Lucene.Net Contrib, Lucene.Net Test Reporter: Prescott Nasser Assignee: Prescott Nasser Priority: Minor Labels: wp7 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: TestIsolatedStorageDirectory.cs Create IsolatedStorage Store to support windows phone 7 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-423) QueryParser differences between Java and .NET
[ https://issues.apache.org/jira/browse/LUCENENET-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047089#comment-13047089 ] Digy commented on LUCENENET-423: Maybe I am missing something, but I run your code both in .NET Java(not Luke) and printed query.ToString(). Same Result(in base36). DIGY QueryParser differences between Java and .NET - Key: LUCENENET-423 URL: https://issues.apache.org/jira/browse/LUCENENET-423 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Christopher Currens When trying to do a RangeQuery that uses dates in a certain format, .NET behaves differently from its Java counterpart. The code is the same between them, but as far as I can tell, it appears that it is a difference in the way Java parses dates vs how .NET parses dates. To reproduce: {code:java} var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, FullText, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29)); var query = queryParser.Parse(Field:[2001-01-17 TO 2001-01-20]); {code} You'll notice that query looks like the old DateField format (eg 0g1d64542). If you do the same query in Java (or Luke), you'll notice the query gets parsed as if it were a RangeQuery of string. AFAIK, Java cannot parse a string formatted in that way. If you change the string to use / instead of - in the java, you'll get one that uses DateResolutions and DateTools.DateToString(). It seems an appropriate fix for this, if we wanted to keep this behavior similar to Java, would be to write our own DateTime parser that behaved the same way to Java's parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-423) QueryParser differences between Java and .NET
[ https://issues.apache.org/jira/browse/LUCENENET-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047179#comment-13047179 ] Digy commented on LUCENENET-423: You are right, I used a different date string. .Net seems to parse the date-strings better. I would leave it as is. DIGY QueryParser differences between Java and .NET - Key: LUCENENET-423 URL: https://issues.apache.org/jira/browse/LUCENENET-423 Project: Lucene.Net Issue Type: Bug Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g Reporter: Christopher Currens When trying to do a RangeQuery that uses dates in a certain format, .NET behaves differently from its Java counterpart. The code is the same between them, but as far as I can tell, it appears that it is a difference in the way Java parses dates vs how .NET parses dates. To reproduce: {code:java} var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, FullText, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29)); var query = queryParser.Parse(Field:[2001-01-17 TO 2001-01-20]); {code} You'll notice that query looks like the old DateField format (eg 0g1d64542). If you do the same query in Java (or Luke), you'll notice the query gets parsed as if it were a RangeQuery of string. AFAIK, Java cannot parse a string formatted in that way. If you change the string to use / instead of - in the java, you'll get one that uses DateResolutions and DateTools.DateToString(). It seems an appropriate fix for this, if we wanted to keep this behavior similar to Java, would be to write our own DateTime parser that behaved the same way to Java's parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Closed] (LUCENENET-421) Segment files ocasionaly disappearing making index corrupted
[ https://issues.apache.org/jira/browse/LUCENENET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy closed LUCENENET-421. -- Resolution: Invalid Seems like reporter isn't interested any more in this issue. DIGY Segment files ocasionaly disappearing making index corrupted Key: LUCENENET-421 URL: https://issues.apache.org/jira/browse/LUCENENET-421 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Media Chase ECF50 in the MastermindToys.com online toy store, IIS 7 under Win 2008 R2, index on RAID 1 Reporter: Fedor Taiakin IIS 7 under Win 2008 R2, index located on RAID 1 The only operations Add Document and Delete Document, optimize = false. Ocasionally the segment files disappear, corrupting index. No other exceptions prior to inability to open index: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. --- System.IO.FileNotFoundException: Could not find file 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. File name: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs' at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-422) Custom tokenizers may fail when indexing due to ReusableStringReader not implement some method of TextReader
[ https://issues.apache.org/jira/browse/LUCENENET-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044524#comment-13044524 ] Digy commented on LUCENENET-422: percyboy, Thanks for your bug fix. I commited the fix to trunk(2.9.4) to 2.9.4g branch. PS: None of the TextReader's methods like ReadBlock, ReadLine, Peek, ReadToEnd were implemented in ReusableStringReader. And calling these methods returned only empty strings without giving any info to the users. Therefore I added these NotImplementedExceptions in LUCENENET-150 and implemented just ReadToEnd (the only method I used in my custom analyzer). DIGY Custom tokenizers may fail when indexing due to ReusableStringReader not implement some method of TextReader Key: LUCENENET-422 URL: https://issues.apache.org/jira/browse/LUCENENET-422 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: from Lucene 2.3.x to current trunk Reporter: percyboy Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: ReusableStringReader.cs Lucene.Net.Index.ReusableStringReader is inherited from TextReader, but marks some methods as Not Implemented. Some custom tokenizers who call these unfinished methods will meet an error. It is, somewhat, like a trap. LUCENENET-150 is a similar issue to this one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-422) Custom tokenizers may fail when indexing due to ReusableStringReader not implement some method of TextReader
[ https://issues.apache.org/jira/browse/LUCENENET-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-422. Resolution: Fixed Custom tokenizers may fail when indexing due to ReusableStringReader not implement some method of TextReader Key: LUCENENET-422 URL: https://issues.apache.org/jira/browse/LUCENENET-422 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: from Lucene 2.3.x to current trunk Reporter: percyboy Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: ReusableStringReader.cs Lucene.Net.Index.ReusableStringReader is inherited from TextReader, but marks some methods as Not Implemented. Some custom tokenizers who call these unfinished methods will meet an error. It is, somewhat, like a trap. LUCENENET-150 is a similar issue to this one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
[ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-414: --- Fix Version/s: (was: Lucene.Net 2.9.2) Lucene.Net 2.9.4g Lucene.Net 2.9.4 The definition of CharArraySet is dangerously confusing and leads to bugs when used. Key: LUCENENET-414 URL: https://issues.apache.org/jira/browse/LUCENENET-414 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Irrelevant Reporter: Vincent Van Den Berghe Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Right now, CharArraySet derives from System.Collections.Hashtable, but doesn't actually use this base type for storing elements. However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a System.Collections.Hashtable. The trivial code to build your own stopword set using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of stopwords like this: CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, ignoreCase: false); foreach (string domainSpecificStopWord in DomainSpecificStopWords) stopWords.Add(domainSpecificStopWord); ... will fail because the CharArraySet accepts an ICollection, which will be passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords will only contain the DomainSpecificStopWords, and not those from STOP_WORDS_SET. One workaround would be to replace the first line with this: CharArraySet stopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + DomainSpecificStopWords.Length, ignoreCase: false); foreach (string domainSpecificStopWord in (CharArraySet)StandardAnalyzer.STOP_WORDS_SET) stopWords.Add(domainSpecificStopWord); ... but this makes use of the implementation detail (the STOP_WORDS_SET is really an UnmodifiableCharArraySet which is itself a CharArraySet). It works because it forces the foreach() to use the correct CharArraySet.GetEnumerator(), which is defined as a new method (this has a bad code smell to it) At least 2 possibilities exist to solve this problem: - Make CharArraySet use the Hashtable instance and a custom comparator, instead of its own implementation. - Make CharArraySet use HashSetchar[], defined in .NET 4.0. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-422) Custom tokenizers may fail when indexing due to ReusableStringReader not implement some method of TextReader
[ https://issues.apache.org/jira/browse/LUCENENET-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-422: --- Fix Version/s: Lucene.Net 2.9.4g Lucene.Net 2.9.4 Custom tokenizers may fail when indexing due to ReusableStringReader not implement some method of TextReader Key: LUCENENET-422 URL: https://issues.apache.org/jira/browse/LUCENENET-422 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: from Lucene 2.3.x to current trunk Reporter: percyboy Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: ReusableStringReader.cs Lucene.Net.Index.ReusableStringReader is inherited from TextReader, but marks some methods as Not Implemented. Some custom tokenizers who call these unfinished methods will meet an error. It is, somewhat, like a trap. LUCENENET-150 is a similar issue to this one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042963#comment-13042963 ] Digy commented on LUCENENET-415: According to my last tests, SFS searches cost only an additional 60-80 ms compared to standard searches(~3GB index, 1M docs, 342 facets). (Assuming that the same # of documents are read from the index). Some other features like - Faceting by query (can SFS be named as Faceting by field?) - Range faceting (e.g., monthly facets on fields like 20110602) (again correct terminology?) - Disk cache for large # of BitSets etc. can be added in the future. I think this is enough for *Simple*FacetedSearch. I will commit it to trunk. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-415. Resolution: Fixed Assignee: Digy Committed to trunk 2.9.4g branch Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Assignee: Digy Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043011#comment-13043011 ] Digy commented on LUCENENET-415: Thanks M.Herndon for this wiki page https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Assignee: Digy Priority: Minor Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-421) Segment files ocasionaly disappearing making index corrupted
[ https://issues.apache.org/jira/browse/LUCENENET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041604#comment-13041604 ] Digy commented on LUCENENET-421: This may happen when two processes/threads access the same index *simultaneously* for writing . IndexWriter doesn't allow it by default but can be bypassed with IndexWriter.Unlock. Also, might there be other processes accessing the index such as virus scanners etc.? DIGY Segment files ocasionaly disappearing making index corrupted Key: LUCENENET-421 URL: https://issues.apache.org/jira/browse/LUCENENET-421 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Media Chase ECF50 in the MastermindToys.com online toy store, IIS 7 under Win 2008 R2, index on RAID 1 Reporter: Fedor Taiakin IIS 7 under Win 2008 R2, index located on RAID 1 The only operations Add Document and Delete Document, optimize = false. Ocasionally the segment files disappear, corrupting index. No other exceptions prior to inability to open index: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. --- System.IO.FileNotFoundException: Could not find file 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. File name: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs' at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Closed] (LUCENENET-416) IndexWriter.Init may orphan its write lock in case of exception
[ https://issues.apache.org/jira/browse/LUCENENET-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy closed LUCENENET-416. -- Resolution: Not A Problem Fix Version/s: Lucene.Net 2.9.4 Fixed in 2.9.4 2.9.4g DIGY IndexWriter.Init may orphan its write lock in case of exception --- Key: LUCENENET-416 URL: https://issues.apache.org/jira/browse/LUCENENET-416 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: .NET 4 Reporter: HÃ¥kan Lindqvist Fix For: Lucene.Net 2.9.4 In IndexWriter.Init, if an exception other than IOException is thrown after the write lock has been acquired, the lock is not released. (See Index\IndexWriter.cs:1922 for a starting point.) Specifically, the exception we have seen occuring is UnauthorizedAccessException, eg Access to the path 'C:\foo\bar\segments.gen' is denied. Stack trace from the UnauthorizedAccessException as mentioned above: at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share) at Lucene.Net.Store.SimpleFSDirectory.SimpleFSIndexInput.Descriptor..ctor(FileInfo file, FileAccess mode) at Lucene.Net.Store.SimpleFSDirectory.OpenInput(String name, Int32 bufferSize) at Lucene.Net.Store.FSDirectory.OpenInput(String name) at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit) at Lucene.Net.Index.SegmentInfos.Read(Directory directory) at Lucene.Net.Index.IndexWriter.Init(Directory d, Analyzer a, Boolean create, Boolean closeDir, IndexDeletionPolicy deletionPolicy, Boolean autoCommit, Int32 maxFieldLength, IndexingChain indexingChain, IndexCommit commit) at Lucene.Net.Index.IndexWriter..ctor(Directory d, Analyzer a, Boolean create, MaxFieldLength mfl) I do not know under what circumstances that initial exception occurred but after this has happened all subsequent attempts at accessing the index will fail. It seems that changing the catch statement to release the writelock regardless of exception type should solve this -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-421) Segment files ocasionaly disappearing making index corrupted
[ https://issues.apache.org/jira/browse/LUCENENET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041634#comment-13041634 ] Digy commented on LUCENENET-421: Could you try 2.9.4 in https://svn.apache.org/repos/asf/incubator/lucene.net/trunk or 2.9.4g in https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g ? Maybe, it is a bug like LUCENENET-416 fixed in these versions. DIGY Segment files ocasionaly disappearing making index corrupted Key: LUCENENET-421 URL: https://issues.apache.org/jira/browse/LUCENENET-421 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Media Chase ECF50 in the MastermindToys.com online toy store, IIS 7 under Win 2008 R2, index on RAID 1 Reporter: Fedor Taiakin IIS 7 under Win 2008 R2, index located on RAID 1 The only operations Add Document and Delete Document, optimize = false. Ocasionally the segment files disappear, corrupting index. No other exceptions prior to inability to open index: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. --- System.IO.FileNotFoundException: Could not find file 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs'. File name: 'C:\Projects\MMT\ECF50\main\src\PublicLayer\SearchIndex\eCommerceFramework\CatalogEntryIndexer\_b6k.cfs' at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() at Lucene.Net.Index.IndexReader.Open(Directory directory) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-420) String.StartsWith has culture in it.
[ https://issues.apache.org/jira/browse/LUCENENET-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040701#comment-13040701 ] Digy commented on LUCENENET-420: And you are using WildcardQuery like sometext* ? DIGY String.StartsWith has culture in it. Key: LUCENENET-420 URL: https://issues.apache.org/jira/browse/LUCENENET-420 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x Environment: .NET under (at least) da-DK culture Reporter: Niels Kühnel Fix For: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x Original Estimate: 4h Remaining Estimate: 4h I've been hunting a weird bug for a long time. I finally found it's cause. I'm Danish, thus my .NET culture is da-DK. In this culture Gaard, doesn't start with Ga because it thinks that aa is å (in Danish it was before 1948). That gives some unexpected results when doing prefix queries. The solution is to add StringComparison.InvariantCulture in all StartsWith comparisons. To verify my claim, try running: Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo(da-DK); Assert.IsFalse(Gaard.StartsWith(Ga)); Assert.IsTrue(Gaard.StartsWith(Ga, StringComparison.InvariantCulture)); Cheers, Niels Kühnel -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-420) String.StartsWith has culture in it.
[ https://issues.apache.org/jira/browse/LUCENENET-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040712#comment-13040712 ] Digy commented on LUCENENET-420: because it thinks that aa is å In my test case, Gaard.StartsWith(Gå) also returns false. I am still not sure, whether it is a Lucene.Net bug, or something that should be handled by the user. I'll think about it. DIGY String.StartsWith has culture in it. Key: LUCENENET-420 URL: https://issues.apache.org/jira/browse/LUCENENET-420 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x Environment: .NET under (at least) da-DK culture Reporter: Niels Kühnel Fix For: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 3.x Original Estimate: 4h Remaining Estimate: 4h I've been hunting a weird bug for a long time. I finally found it's cause. I'm Danish, thus my .NET culture is da-DK. In this culture Gaard, doesn't start with Ga because it thinks that aa is å (in Danish it was before 1948). That gives some unexpected results when doing prefix queries. The solution is to add StringComparison.InvariantCulture in all StartsWith comparisons. To verify my claim, try running: Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo(da-DK); Assert.IsFalse(Gaard.StartsWith(Ga)); Assert.IsTrue(Gaard.StartsWith(Ga, StringComparison.InvariantCulture)); Cheers, Niels Kühnel -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: TestSimpleFacetedSearch2.cs SimpleFacetedSearch2.cs Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039259#comment-13039259 ] Digy commented on LUCENENET-415: With the increasing number of attached files, it is getting hard to trace the changes. I created a contrib project(SimpleFacetedSearch) under 2.9.4g branch https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039272#comment-13039272 ] Digy commented on LUCENENET-415: Hi Ben, Do you think we still need IndexSearcher UseCache? DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039289#comment-13039289 ] Digy commented on LUCENENET-415: I'll wait a few days before closing this issue commiting to 2.9.4 Here are the sources: Source: https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/src/contrib/SimpleFacetedSearch/ Readme: https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/src/contrib/SimpleFacetedSearch/README.txt Test Usage: https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g/test/contrib/SimpleFacetedSearch Any comments on class/variable names, APIs etc. since I've never been good in them? DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: TestSimpleFacetedSearch2.cs SimpleFacetedSearch2.cs I take one step further. Multi-field faceting. It requires many code cleanups, but works. SimpleFacetedSeach2 DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: TestSimpleFacetedSearch2.cs SimpleFacetedSearch2.cs Some comments. Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch2.cs, SimpleFacetedSearch2.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch2.cs, TestSimpleFacetedSearch2.cs, facet performance.xls, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038063#comment-13038063 ] Digy commented on LUCENENET-415: Hi Ben, Thanks for your comments test code. {code} sfs = new SimpleFacetedSearch(reader, category); sfs.Search(query) // + fetch {code} is roughly equal to {code} foreach(cat in GetGroups(category)) { BooleanQuery bq = BooleanQuery(); bg.Add(query , Lucene.Net.Search.BooleanClause.Occur.MUST) bg.Add(queryParser.Parse(category: + cat) , Lucene.Net.Search.BooleanClause.Occur.MUST); indexSearcher.Search(bg); // + fetch } {code} It would be good to compare these two codes too. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038228#comment-13038228 ] Digy commented on LUCENENET-415: But BitSet+Caching is still faster than BooleanQuery, if don't misinterpret your numbers. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, PerformanceTest.cs, PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, facet performance.xls, facet performance.xls Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: TestSimpleFacetedSearch.cs SimpleFacetedSearch.cs Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: SimpleFacetedSearch.cs TestSimpleFacetedSearch.cs Hi Ben, There is a maxItemPerGroup parameter in the constructor. But It will be better to move it to search method. Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: TestSimpleFacetedSearch.cs SimpleFacetedSearch.cs some performance improvements. Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: (was: TestSimpleFacetedSearch.cs) Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: (was: SimpleFacetedSearch.cs) Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: (was: SimpleFacetedSearch.cs) Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: (was: TestSimpleFacetedSearch.cs) Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: (was: TestSimpleFacetedSearch.cs) Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: (was: TestSimpleFacetedSearch.cs) Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: (was: SimpleFacetedSearch.cs) Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037478#comment-13037478 ] Digy commented on LUCENENET-415: Hi Ben, About performance test: - One of the costly ops in this faceted-search is the creation of SimpleFacetedSearch. It creates the bit sets for all of the group members. Since it should be created only once when a new IndexReader is opened(if some documents are added or deleted), its creation time should be excluded from the test. - Another costly op is the fetching data from index. After each search, some data should be read and this duration should be included in the test. Eg. {code} TopDocs hits = sfs.Search(q, 100); for (int j = 0; j hits.ScoreDocs.Length; j++) { Document doc = reader.Document(hits.ScoreDocs[j].doc); Fieldable f = doc.GetField(title); } SimpleFacetedSearch.Hits hits = sfs.Search(q,maxDocPerGroup); foreach (var h in hits.HitsPerGroup) { foreach (Document doc in h.Documents) { Fieldable f = doc.GetField(title); } } {code} - Hits is a deprecated class and it repeates the search every N (AFAIK 100) document access. It is not a normal search and should be excluded from the test. Thanks, DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: PerformanceTest.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: SimpleFacetedSearch.cs Of course, Ben. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: SimpleFacetedSearch.cs, SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-415) Contrib/Faceted Search
Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-415: --- Attachment: TestSimpleFacetedSearch.cs SimpleFacetedSearch.cs Just a draft. Needs your contribution. DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-415) Contrib/Faceted Search
[ https://issues.apache.org/jira/browse/LUCENENET-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037061#comment-13037061 ] Digy commented on LUCENENET-415: Here is the documentation of the code:) {code} SimpleFacetedSearch sfs = new SimpleFacetedSearch(_Reader, cat); Query query = new QueryParser(text, new StandardAnalyzer()).Parse(block*); SimpleFacetedSearch.Hits hits = sfs.Search(query); long totalHits = hits.TotalHitCount; foreach (SimpleFacetedSearch.HitsPerGroup hpg in hits.HitsPerGroup) { long hitCountPerGroup = hpg.HitCount; foreach (Document doc in hpg) { string text = doc.GetField(text).StringValue(); } } {code} DIGY Contrib/Faceted Search -- Key: LUCENENET-415 URL: https://issues.apache.org/jira/browse/LUCENENET-415 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Attachments: SimpleFacetedSearch.cs, TestSimpleFacetedSearch.cs Since I see a lot of questions about faceted search in these days, I plan to add a Faceted-Search project to contrib. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.
[ https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035795#comment-13035795 ] Digy commented on LUCENENET-412: Hi All, Lucene.Net 2.9.4g is almost ready for testing feedbacks. While injecting generics making some clean up in code, I tried to be close to lucene 3.0.3 as much as possible. Therefore it's position is somewhere between lucene.Java 2.9.4 3.0.3 DIGY PS: For those who might want to try this version: It won't probably be a drop-in replacement since there are a few API changes like - StopAnalyzer(Liststring stopWords) - Query.ExtractTerms(ICollectionstring) - TopDocs.*TotalHits*, TopDocs.*ScoreDocs* and some removed methods/classes like - Filter.Bits - JustCompileSearch - Contrib/Similarity.Net Replacing ArrayLists, Hashtables etc. with appropriate Generics. Key: LUCENENET-412 URL: https://issues.apache.org/jira/browse/LUCENENET-412 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 Attachments: IEquatable for QuerySubclasses.patch, LUCENENET-412.patch, lucene_2.9.4g_exceptions_fix This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some performance gains. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
[ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035817#comment-13035817 ] Digy commented on LUCENENET-414: Fixed in 2.9.4g DIGY The definition of CharArraySet is dangerously confusing and leads to bugs when used. Key: LUCENENET-414 URL: https://issues.apache.org/jira/browse/LUCENENET-414 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Irrelevant Reporter: Vincent Van Den Berghe Priority: Minor Fix For: Lucene.Net 2.9.2 Right now, CharArraySet derives from System.Collections.Hashtable, but doesn't actually use this base type for storing elements. However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a System.Collections.Hashtable. The trivial code to build your own stopword set using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of stopwords like this: CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, ignoreCase: false); foreach (string domainSpecificStopWord in DomainSpecificStopWords) stopWords.Add(domainSpecificStopWord); ... will fail because the CharArraySet accepts an ICollection, which will be passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords will only contain the DomainSpecificStopWords, and not those from STOP_WORDS_SET. One workaround would be to replace the first line with this: CharArraySet stopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + DomainSpecificStopWords.Length, ignoreCase: false); foreach (string domainSpecificStopWord in (CharArraySet)StandardAnalyzer.STOP_WORDS_SET) stopWords.Add(domainSpecificStopWord); ... but this makes use of the implementation detail (the STOP_WORDS_SET is really an UnmodifiableCharArraySet which is itself a CharArraySet). It works because it forces the foreach() to use the correct CharArraySet.GetEnumerator(), which is defined as a new method (this has a bad code smell to it) At least 2 possibilities exist to solve this problem: - Make CharArraySet use the Hashtable instance and a custom comparator, instead of its own implementation. - Make CharArraySet use HashSetchar[], defined in .NET 4.0. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.
[ https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035092#comment-13035092 ] Digy commented on LUCENENET-412: One more sample {code} From: class AnonymousFilterCache : FilterCache { class AnonymousFilteredDocIdSet : FilteredDocIdSet { IndexReader r; public AnonymousFilteredDocIdSet(DocIdSet innerSet, IndexReader r) : base(innerSet) { this.r = r; } public override bool Match(int docid) { return !r.IsDeleted(docid); } } public AnonymousFilterCache(DeletesMode deletesMode) : base(deletesMode) { } protected override object MergeDeletes(IndexReader reader, object docIdSet) { return new AnonymousFilteredDocIdSet((DocIdSet)docIdSet, reader); } } ... cache = new AnonymousFilterCache(deletesMode); To: cache = new FilterCacheDocIdSet(deletesMode, (reader,docIdSet)={ return new FilteredDocIdSet((DocIdSet)docIdSet, (docid) = { return !reader.IsDeleted(docid); }); }); {code} DIGY Replacing ArrayLists, Hashtables etc. with appropriate Generics. Key: LUCENENET-412 URL: https://issues.apache.org/jira/browse/LUCENENET-412 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 Attachments: IEquatable for QuerySubclasses.patch, LUCENENET-412.patch, lucene_2.9.4g_exceptions_fix This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some performance gains. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Resolved] (LUCENENET-405) Port: contrib/Analysis.NGram
[ https://issues.apache.org/jira/browse/LUCENENET-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-405. Resolution: Fixed committed to trunk 2.9.4g branch Port: contrib/Analysis.NGram Key: LUCENENET-405 URL: https://issues.apache.org/jira/browse/LUCENENET-405 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4 Reporter: Digy Assignee: Digy Priority: Trivial Attachments: NGram.patch NGramTokenizer EdgeNGramTokenizer + Test cases. DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
[ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032982#comment-13032982 ] Digy commented on LUCENENET-414: Hi Vincent, I changed the CharArraySet implementation. Can you take a look at 2.9.4g branch? ( https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_9_4g ) DIGY The definition of CharArraySet is dangerously confusing and leads to bugs when used. Key: LUCENENET-414 URL: https://issues.apache.org/jira/browse/LUCENENET-414 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Core Affects Versions: Lucene.Net 2.9.2 Environment: Irrelevant Reporter: Vincent Van Den Berghe Priority: Minor Fix For: Lucene.Net 2.9.2 Right now, CharArraySet derives from System.Collections.Hashtable, but doesn't actually use this base type for storing elements. However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a System.Collections.Hashtable. The trivial code to build your own stopword set using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of stopwords like this: CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, ignoreCase: false); foreach (string domainSpecificStopWord in DomainSpecificStopWords) stopWords.Add(domainSpecificStopWord); ... will fail because the CharArraySet accepts an ICollection, which will be passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords will only contain the DomainSpecificStopWords, and not those from STOP_WORDS_SET. One workaround would be to replace the first line with this: CharArraySet stopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + DomainSpecificStopWords.Length, ignoreCase: false); foreach (string domainSpecificStopWord in (CharArraySet)StandardAnalyzer.STOP_WORDS_SET) stopWords.Add(domainSpecificStopWord); ... but this makes use of the implementation detail (the STOP_WORDS_SET is really an UnmodifiableCharArraySet which is itself a CharArraySet). It works because it forces the foreach() to use the correct CharArraySet.GetEnumerator(), which is defined as a new method (this has a bad code smell to it) At least 2 possibilities exist to solve this problem: - Make CharArraySet use the Hashtable instance and a custom comparator, instead of its own implementation. - Make CharArraySet use HashSetchar[], defined in .NET 4.0. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-266) Putting support classes in separate files and in a separate directory
[ https://issues.apache.org/jira/browse/LUCENENET-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030356#comment-13030356 ] Digy commented on LUCENENET-266: Hi Prescott, Thank you for refactoring the SupportClass. Nice work, no failing tests. DIGY Putting support classes in separate files and in a separate directory -- Key: LUCENENET-266 URL: https://issues.apache.org/jira/browse/LUCENENET-266 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Core Reporter: Andrei Iliev Assignee: Prescott Nasser Labels: refactoring Fix For: Lucene.Net 2.9.4 I am going to add some new classes (nio support, IdentityHashMap, ...) What is the best place to put it in? SuportClass is getting bigger and bigger. I think it is time to put all support classes in separate files and in a separate directory (ex. JavaSupport). Any comments? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.
[ https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-412: --- Attachment: IEquatable for QuerySubclasses.patch I am not sure about committing this IEquatable patch. To gain a slight performance improvement all Equals codes are dublicated. Here is the list of affected files: ConstantScoreQuery.cs DisjunctionMaxQuery.cs FilteredQuery.cs Function/CustomScoreQuery.cs Function/ValueSourceQuery.cs MatchAllDocsQuery.cs MultiPhraseQuery.cs MultiTermQuery.cs Payloads/PayloadNearQuery.cs Payloads/PayloadTermQuery.cs PhraseQuery.cs RangeQuery.cs Spans/SpanFirstQuery.cs Spans/SpanNearQuery.cs Spans/SpanNotQuery.cs Spans/SpanOrQuery.cs Spans/SpanTermQuery.cs TermQuery.cs DIGY Replacing ArrayLists, Hashtables etc. with appropriate Generics. Key: LUCENENET-412 URL: https://issues.apache.org/jira/browse/LUCENENET-412 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 Attachments: IEquatable for QuerySubclasses.patch, LUCENENET-412.patch, lucene_2.9.4g_exceptions_fix This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some performance gains. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Updated] (LUCENENET-413) Medium trust security issue
[ https://issues.apache.org/jira/browse/LUCENENET-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-413: --- Attachment: MediumTrust.2.9.4.patch constants.cs fix added into patch Medium trust security issue - Key: LUCENENET-413 URL: https://issues.apache.org/jira/browse/LUCENENET-413 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Environment: Lucene.Net 2.9.4, Lucene.Net 2.9.4g , .Net 4.0 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 Attachments: MediumTrust.2.9.4.patch, MediumTrust.2.9.4.patch, MediumTrust.2.9.4g.patch On behalf of Richard Wilde: Exceptions in Medium Trust(.NET 4.0) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Created] (LUCENENET-413) Medium trust security issue
Medium trust security issue - Key: LUCENENET-413 URL: https://issues.apache.org/jira/browse/LUCENENET-413 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Environment: Lucene.Net 2.9.4, Lucene.Net 2.9.4g , .Net 4.0 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 On behalf of Richard Wilde: Exceptions in Medium Trust(.NET 4.0) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira