[jira] [Created] (SOLR-2516) Solr should not cache Searchers
Solr should not cache Searchers --- Key: SOLR-2516 URL: https://issues.apache.org/jira/browse/SOLR-2516 Project: Solr Issue Type: Bug Components: search Reporter: John Wang only IndexReaders should be cached (where data resides) Searcher is a thin execution wrapper around it and thus should not be cached. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2515) Custom written Similarity class does not read solr parameter values from schema.xml
Custom written Similarity class does not read solr parameter values from schema.xml --- Key: SOLR-2515 URL: https://issues.apache.org/jira/browse/SOLR-2515 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Pradeep Priority: Minor Writing new custom written similarity class extending DefaultSimilarty class does not set parameter values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange
MultiSearcher does not work correctly with Not on NumericRange -- Key: LUCENE-3096 URL: https://issues.apache.org/jira/browse/LUCENE-3096 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.2 Reporter: John Wang Hi, Keith My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular, If we search with Not on NumericRange and we use MultiSearcher, we will wrong search results (However, if we use IndexSearcher, the result is correct). Basically the NotOfNumericRange does not have impact on multisearcher. We suspect it is because the createWeight() function in MultiSearcher and hope you can help us to fix this bug of lucene. I attached the code to reproduce this case. Please check it out. In the attached code, I have two separate functions : (1) testNumericRangeSingleSearcher(Query query) where I create 6 documents, with a field called "id"= 1,2,3,4,5,6 respectively . Then I search by the query which is +MatchAllDocs -NumericRange(3,3). The expected result then should be 5 hits since the document 3 is MUST_NOT. (2) testNumericRangeMultiSearcher(Query query) where i create 2 RamDirectory(), each of which has 3 documents, 1,2,3; and 4,5,6. Then I search by the same query as above using multiSearcher. The expected result should also be 5 hits. However, from (1), we get 5 hits = expected results, while in (2) we get 6 hits != expected results. We also experimented this with our zoie/bobo open source tools and get the same results because our multi-bobo-browser is built on multi-searcher in lucene. I already emailed the lucene community group. Hopefully we can get some feedback soon. If you have any further concern, pls let me know! Thank you very much! Code: (based on lucene 3.0.x) import java.io.IOException; import java.io.PrintStream; import java.text.DecimalFormat; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FieldCache; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.store.Directory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; import com.convertlucene.ConvertFrom2To3; public class TestNumericRange { public final static void main(String[] args) { try { BooleanQuery query = new BooleanQuery(); query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true, true), Occur.MUST_NOT); query.add(new MatchAllDocsQuery(), Occur.MUST); testNumericRangeSingleSearcher(query); testNumericRangeMultiSearcher(query); } catch(Exception e) { e.printStackTrace(); } } public static void testNumericRangeSingleSearcher(Query query) throws CorruptIndexException, LockObtainFailedException, IOException { String[] ids = {"1", "2", "3", "4", "5", "6"}; Directory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED); for (int i = 0; i < ids.length; i++) { Document doc = new Document(); doc.add(new Field("id", ids[i], Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i]))); writer.addDocument(doc); } writer.close(); IndexSearcher searcher = new IndexSearcher(directory); TopDocs docs = searcher.search(query, 10); System.out.println("SingleSearcher: testNumericRange: hitNum: " + docs.totalHits); for(ScoreDoc doc : docs.scoreDocs) { System.out.println(searcher.explain(query, doc.doc)); } searcher.close(); directory.close(); } public static void testNumericRangeMultiSearcher(Query query) throws CorruptIndexException, LockObtainFailedException, IOException { String[] ids1 = {"1", "2", "3"}; Directory directory1 = new RAMDirectory(); IndexWriter writer1 = new IndexWriter(directory1, new WhitespaceAnalyzer(), IndexWriter.MaxFi
Re: 3.2.0 (or 3.1.1)
+1 for 3.2! And also, we should adopt that approach going forward (no more bug fix releases for the stable branch, except for the last release before 4.0 is out). That means updating the release TODO with e.g., not creating a branch for 3.2.x, only tag it. When 4.0 is out, we branch 3.x.y out of the last 3.x tag. Shai On Saturday, May 14, 2011, Ryan McKinley wrote: > On Fri, May 13, 2011 at 6:40 PM, Grant Ingersoll wrote: >> It's been just over 1 month since the last release. We've all said we want >> to get to about a 3 month release cycle (if not more often). I think this >> means we should start shooting for a next release sometime in June. Which, >> in my mind, means we should start working on wrapping up issues now, IMO. >> >> Here's what's open for 3.2 against: >> Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070 >> Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172 >> >> Thoughts? >> > > +1 for 3.2 with a new feature freeze pretty soon > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2480) Text extraction of password protected files
[ https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2480: - Attachment: password-is-solrcell.docx SOLR-2480.patch Attached the next patch and password protected word file that is used for test. I added test cases for ignoreTikaException=true|false cases. I think this is ready to commit. > Text extraction of password protected files > --- > > Key: SOLR-2480 > URL: https://issues.apache.org/jira/browse/SOLR-2480 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 1.4.1, 3.1 >Reporter: Shinichiro Abe >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2480-idea1.patch, SOLR-2480.patch, SOLR-2480.patch, > password-is-solrcell.docx > > > Proposal: > There are password-protected files. PDF, Office documents in 2007 format/97 > format. > These files are posted using SolrCell. > We do not have to read these files if we do not know the reading password of > files. > So, these files may not be extracted text. > My requirement is that these files should be processed normally without > extracting text, and without throwing exception. > This background: > Now, when you post a password-protected file, solr returns 500 server error. > Solr catches the error in ExtractingDocumentLoader and throws TikException. > I use ManifoldCF. > If the solr server responds 500, ManifoldCF judge is that "this > document should be retried because I have absolutely no idea what > happened". > And it attempts to retry posting many times without getting the password. > In the other case, my customer posts the files with embedded images. > Sometimes it seems that solr throws TikaException of unknown cause. > He wants to post just metadata without extracting text, but makes him stop > posting by the exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2480) Text extraction of password protected files
[ https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2480: - Attachment: SOLR-2480.patch A patch that introduces ignoreTikaException flag. > Text extraction of password protected files > --- > > Key: SOLR-2480 > URL: https://issues.apache.org/jira/browse/SOLR-2480 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 1.4.1, 3.1 >Reporter: Shinichiro Abe >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2480-idea1.patch, SOLR-2480.patch > > > Proposal: > There are password-protected files. PDF, Office documents in 2007 format/97 > format. > These files are posted using SolrCell. > We do not have to read these files if we do not know the reading password of > files. > So, these files may not be extracted text. > My requirement is that these files should be processed normally without > extracting text, and without throwing exception. > This background: > Now, when you post a password-protected file, solr returns 500 server error. > Solr catches the error in ExtractingDocumentLoader and throws TikException. > I use ManifoldCF. > If the solr server responds 500, ManifoldCF judge is that "this > document should be retried because I have absolutely no idea what > happened". > And it attempts to retry posting many times without getting the password. > In the other case, my customer posts the files with embedded images. > Sometimes it seems that solr throws TikaException of unknown cause. > He wants to post just metadata without extracting text, but makes him stop > posting by the exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: GSoC: LUCENE-2308: Separately specify a field's type
2011/5/14 Nikola Tanković > 2011/5/12 Michael McCandless > >> 2011/5/9 Nikola Tanković : >> >> >> >> > Introduction of an FieldType class that will hold all the extra >> >> > properties >> >> > now stored inside Field instance other than field value itself. >> >> >> >> Seems like this is an easy first baby step -- leave current Field >> >> class, but break out the "type" details into a separate class that can >> >> be shared across Field instances. >> > >> > Yes, I agree, this could be a good first step. Mike submitted a patch on >> > issue #2308. I think it's a solid base for this. >> >> Make that Chris. >> > > Ouch, sorry! > > >> >> >> > New FieldTypeAttribute interface will be added to handle extension >> with >> >> > new >> >> > field properties inspired by IndexWriterConfig. >> >> >> >> How would this work? What's an example compelling usage? An app >> >> could use this for extensibility, and then make a matching codec that >> >> picks up this attr? EG, say, maybe for marking that a field is a >> >> "primary key field" and then codec could optimize accordingly...? >> > >> > Well that could be very interesting scenario. It didn't rang a bell to >> me >> > for possible codec usage, but it seems very reasonable. Attributes >> otherwise >> > don't make much sense, unless propertly used in custom codecs. >> > >> > How will we ensure attribute and codec compatibility? >> >> I'm just thinking we should have concrete reasons in mind for cutting >> over to attributes here... I'd rather see a fixed, well thought out >> concrete FieldType hierarchy first... >> > > Yes, I couldn't agree more, and I also think Chris has some great ideas on > this field, given his work on Spatial indexing which tends to have use of > this additional attributes. > I think Attributes should be used sparingly, but I do think they make sense. I do use a similar idea in some spatial work where different fields have different requirements but need to work with the same set of strategies. I feel this is metadata and doesn't belong in an extension to Field. But equally its not 'core' to FieldType either, which is why I added the FieldTypeAttribute idea. In the end I feel we should provide maximum flexibility here, especially if we are going to move over to a more minimal API for the indexer. We need to allow custom extensions to FieldType and I'm not sure having 'instanceof' statements everytime I need to something specific to a subtype, is the best way to go. > > >> >> >> > Refactoring and dividing of settings for term frequency and >> positioning >> >> > can >> >> > also be done (LUCENE-2048) >> >> >> >> Ahh great! So we can omit-positions-but-not-TF. >> >> >> >> > Discuss possible effects of completion of LUCENE-2310 on this project >> >> >> >> This one is badly needed... but we should keep your project focused. >> > >> > >> > We'll tackle this one afterwards. >> >> Good. >> >> >> >> > Adequate Factory class for easier configuration of new Field >> instances >> >> > together with manually added new FieldTypeAttributes >> >> > FieldType, once instantiated is read-only. Only fields value can be >> >> > changed. >> >> >> >> OK. >> >> >> >> > Simple hierarchy of Field classes with core properties logically >> >> > predefaulted. E.g.: >> >> > >> >> > NumberField, >> >> >> >> Can't this just be our existing NumericField? >> > >> > Yes, this is classic NumericField with changes proposed in LUCENE-2310. >> Tim >> > Smith mentioned that Fieldable class should be kept for custom >> > implementations to reduce number of setters (for defaults). >> > Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it >> > should be implemented instead of Fieldable for custom implementations, >> so >> > both Fieldable and AbstractField are not needed anymore. >> > In my opinion Field shoud become abstract extended with others. >> > Another proposal: how about keeping only Field (with no hierarchy) and >> move >> > hierarchy to FieldType, such as NumericFieldType, StringFieldType since >> this >> > hierarchy concerns type information only? >> >> I think hierarchy of both types and the "value containers" that hold >> the corresponding values could make sense? >> > > Hmm, I think we should get more opinions on this one also. > I'm unsure about this. What information would a StringFieldType have over a NumericFieldType? I can imagine NumericFieldType maybe having precision step. Couldn't that be an Attribute? I can see the benefit of a StringField though, and a NumericField, since they are providing different implementations of the same fundamental needs of a Field; its name, its value, its type and its tokenstream. I think we should use hierarchies sparingly as well, since really we want to make this as simple as possible. But we should also keep our eye on those fundamental needs of the indexer. > > >> >> > e.g. Usage: >> > FieldType number = new NumericFieldType(); >> > Field price = new Field(); >> > price.setType(number); >
[jira] [Commented] (SOLR-2480) Text extraction of password protected files
[ https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033429#comment-13033429 ] Koji Sekiguchi commented on SOLR-2480: -- bq. And I think SOLR-445 can resolve improvement ideas(2). No. You should consider the difference between this issue and SOLR-445. (see my comment above) As I understand your requirement that was described in Description, and it is quite similar SOLR-2512 that has been resolved, I'll try a patch that has ignoreErrors flag for TikaException. I added an ability to ignore exceptions when trying to extract mata data from text in SOLR-2512, i.g. Solr indexed the text but gave up meta data. On the other hand, the ignore flag in this ticket is for giving up text but indexing meta data. It cannot be resolved by SOLR-445. > Text extraction of password protected files > --- > > Key: SOLR-2480 > URL: https://issues.apache.org/jira/browse/SOLR-2480 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 1.4.1, 3.1 >Reporter: Shinichiro Abe >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2480-idea1.patch > > > Proposal: > There are password-protected files. PDF, Office documents in 2007 format/97 > format. > These files are posted using SolrCell. > We do not have to read these files if we do not know the reading password of > files. > So, these files may not be extracted text. > My requirement is that these files should be processed normally without > extracting text, and without throwing exception. > This background: > Now, when you post a password-protected file, solr returns 500 server error. > Solr catches the error in ExtractingDocumentLoader and throws TikException. > I use ManifoldCF. > If the solr server responds 500, ManifoldCF judge is that "this > document should be retried because I have absolutely no idea what > happened". > And it attempts to retry posting many times without getting the password. > In the other case, my customer posts the files with embedded images. > Sometimes it seems that solr throws TikaException of unknown cause. > He wants to post just metadata without extracting text, but makes him stop > posting by the exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2480) Text extraction of password protected files
[ https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2480: - Affects Version/s: 1.4.1 Fix Version/s: 4.0 3.2 > Text extraction of password protected files > --- > > Key: SOLR-2480 > URL: https://issues.apache.org/jira/browse/SOLR-2480 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 1.4.1, 3.1 >Reporter: Shinichiro Abe >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2480-idea1.patch > > > Proposal: > There are password-protected files. PDF, Office documents in 2007 format/97 > format. > These files are posted using SolrCell. > We do not have to read these files if we do not know the reading password of > files. > So, these files may not be extracted text. > My requirement is that these files should be processed normally without > extracting text, and without throwing exception. > This background: > Now, when you post a password-protected file, solr returns 500 server error. > Solr catches the error in ExtractingDocumentLoader and throws TikException. > I use ManifoldCF. > If the solr server responds 500, ManifoldCF judge is that "this > document should be retried because I have absolutely no idea what > happened". > And it attempts to retry posting many times without getting the password. > In the other case, my customer posts the files with embedded images. > Sometimes it seems that solr throws TikaException of unknown cause. > He wants to post just metadata without extracting text, but makes him stop > posting by the exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-2113) Create TermsQParser that deals with toInternal() conversion of external terms
[ https://issues.apache.org/jira/browse/SOLR-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reopened SOLR-2113: Assignee: Hoss Man > Create TermsQParser that deals with toInternal() conversion of external terms > - > > Key: SOLR-2113 > URL: https://issues.apache.org/jira/browse/SOLR-2113 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 3.2, 4.0 > > Attachments: SOLR-2113.patch > > > For converting facet.field response constraints into filter queries, it would > be helpful to have a QParser that generated a TermQuery using the > toInternal() converted result of the raw "q" param -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2113) Create TermsQParser that deals with toInternal() conversion of external terms
[ https://issues.apache.org/jira/browse/SOLR-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2113. Resolution: Fixed Fix Version/s: 3.2 Committed revision 1102922. - 3x backport > Create TermsQParser that deals with toInternal() conversion of external terms > - > > Key: SOLR-2113 > URL: https://issues.apache.org/jira/browse/SOLR-2113 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 3.2, 4.0 > > Attachments: SOLR-2113.patch > > > For converting facet.field response constraints into filter queries, it would > be helpful to have a QParser that generated a TermQuery using the > toInternal() converted result of the raw "q" param -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.2.0 (or 3.1.1)
On Fri, May 13, 2011 at 6:40 PM, Grant Ingersoll wrote: > It's been just over 1 month since the last release. We've all said we want > to get to about a 3 month release cycle (if not more often). I think this > means we should start shooting for a next release sometime in June. Which, > in my mind, means we should start working on wrapping up issues now, IMO. > > Here's what's open for 3.2 against: > Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070 > Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172 > > Thoughts? > +1 for 3.2 with a new feature freeze pretty soon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.2.0 (or 3.1.1)
On Fri, May 13, 2011 at 6:40 PM, Grant Ingersoll wrote: > It's been just over 1 month since the last release. We've all said we want > to get to about a 3 month release cycle (if not more often). I think this > means we should start shooting for a next release sometime in June. Which, > in my mind, means we should start working on wrapping up issues now, IMO. > > Here's what's open for 3.2 against: > Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070 > Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172 > > Thoughts? > > -Grant My vote would be to just spend our time on 3.2. people get bugfixes, better test coverage, and a couple of new features and optimizations, too. Is it really going to be harder to release 3.2 than to release 3.1.1? we could just announce in advance we'd like to feature freeze 3.2 on ? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-139: --- Fix Version/s: (was: 3.2) > Support updateable/modifiable documents > --- > > Key: SOLR-139 > URL: https://issues.apache.org/jira/browse/SOLR-139 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Ryan McKinley > Attachments: Eriks-ModifiableDocument.patch, > Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, > Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, > Eriks-ModifiableDocument.patch, SOLR-139-IndexDocumentCommand.patch, > SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, > SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, > SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, > SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, > SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, > SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, > SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, > SOLR-139-XmlUpdater.patch, > SOLR-269+139-ModifiableDocumentUpdateProcessor.patch, getStoredFields.patch, > getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, > getStoredFields.patch > > > It would be nice to be able to update some fields on a document without > having to insert the entire document. > Given the way lucene is structured, (for now) one can only modify stored > fields. > While we are at it, we can support incrementing an existing value - I think > this only makes sense for numbers. > for background, see: > http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
3.2.0 (or 3.1.1)
It's been just over 1 month since the last release. We've all said we want to get to about a 3 month release cycle (if not more often). I think this means we should start shooting for a next release sometime in June. Which, in my mind, means we should start working on wrapping up issues now, IMO. Here's what's open for 3.2 against: Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070 Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172 Thoughts? -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles
[ https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2451. Resolution: Fixed Fix Version/s: 3.2 Committed revision 1102910. - 3x > Enhance SolrTestCaseJ4 to allow tests to account for small deltas when > comparing floats/doubles > --- > > Key: SOLR-2451 > URL: https://issues.apache.org/jira/browse/SOLR-2451 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: Hoss Man >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2451.patch, SOLR-2451.patch, > SOLR-2451_assertQScore.patch > > > Attached is a patch that adds the following method to SolrTestCaseJ4: (just > javadoc & signature shown) > {code:java} > /** >* Validates that the document at the specified index in the results has > the specified score, within 0.0001. >*/ > public static void assertQScore(SolrQueryRequest req, int docIdx, float > targetScore) { > {code} > This is especially useful for geospatial in which slightly different > precision deltas might occur when trying different geospatial indexing > strategies are used, assuming the score is some geospatial distance. This > patch makes a simple modification to DistanceFunctionTest to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles
[ https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reopened SOLR-2451: forgot i was in the middle of backporting > Enhance SolrTestCaseJ4 to allow tests to account for small deltas when > comparing floats/doubles > --- > > Key: SOLR-2451 > URL: https://issues.apache.org/jira/browse/SOLR-2451 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: Hoss Man >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2451.patch, SOLR-2451.patch, > SOLR-2451_assertQScore.patch > > > Attached is a patch that adds the following method to SolrTestCaseJ4: (just > javadoc & signature shown) > {code:java} > /** >* Validates that the document at the specified index in the results has > the specified score, within 0.0001. >*/ > public static void assertQScore(SolrQueryRequest req, int docIdx, float > targetScore) { > {code} > This is especially useful for geospatial in which slightly different > precision deltas might occur when trying different geospatial indexing > strategies are used, assuming the score is some geospatial distance. This > patch makes a simple modification to DistanceFunctionTest to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2510) Proximity search is not symmetric
[ https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2510. Resolution: Not A Problem This is the expected behavior for Phrase queries. "slop" is specified as an edit distance... http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/PhraseQuery.html#setSlop%28int%29 These two queries are not equivalent... {noformat} "WORD_D WORD_G"~3 "WORD_G WORD_D"~3 {noformat} the order of the terms as specified in the PhrasQuery matters for determining the edit distance. > Proximity search is not symmetric > - > > Key: SOLR-2510 > URL: https://issues.apache.org/jira/browse/SOLR-2510 > Project: Solr > Issue Type: Bug > Components: search, web gui >Affects Versions: 3.1 > Environment: Ubuntu 10.04 >Reporter: mark risher > > The proximity search is incorrect on words occurring *before* the matching > term. It matches documents that are _less-than_ N words before and > _less-than-or-equal-to_ N words after. > For example, use the following document: >{{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}} > *Expected result:* Both of the following queries should match: > 1) {{"WORD_D WORD_G"~3}} > 2) {{"WORD_G WORD_D"~3}} > *Actual result:* Only #1 matches. For some reason, it thinks the distance > from D to G is 3, but from G to D is 4. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles
[ https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2451: --- Affects Version/s: (was: 3.2) Fix Version/s: 4.0 Assignee: Hoss Man Summary: Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles (was: Add assertQScore() to SolrTestCaseJ4 to account for small deltas ) Committed revision 1102907. > Enhance SolrTestCaseJ4 to allow tests to account for small deltas when > comparing floats/doubles > --- > > Key: SOLR-2451 > URL: https://issues.apache.org/jira/browse/SOLR-2451 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: Hoss Man >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2451.patch, SOLR-2451.patch, > SOLR-2451_assertQScore.patch > > > Attached is a patch that adds the following method to SolrTestCaseJ4: (just > javadoc & signature shown) > {code:java} > /** >* Validates that the document at the specified index in the results has > the specified score, within 0.0001. >*/ > public static void assertQScore(SolrQueryRequest req, int docIdx, float > targetScore) { > {code} > This is especially useful for geospatial in which slightly different > precision deltas might occur when trying different geospatial indexing > strategies are used, assuming the score is some geospatial distance. This > patch makes a simple modification to DistanceFunctionTest to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles
[ https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2451. Resolution: Fixed thanks for bringing this up david > Enhance SolrTestCaseJ4 to allow tests to account for small deltas when > comparing floats/doubles > --- > > Key: SOLR-2451 > URL: https://issues.apache.org/jira/browse/SOLR-2451 > Project: Solr > Issue Type: Improvement >Reporter: David Smiley >Assignee: Hoss Man >Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-2451.patch, SOLR-2451.patch, > SOLR-2451_assertQScore.patch > > > Attached is a patch that adds the following method to SolrTestCaseJ4: (just > javadoc & signature shown) > {code:java} > /** >* Validates that the document at the specified index in the results has > the specified score, within 0.0001. >*/ > public static void assertQScore(SolrQueryRequest req, int docIdx, float > targetScore) { > {code} > This is especially useful for geospatial in which slightly different > precision deltas might occur when trying different geospatial indexing > strategies are used, assuming the score is some geospatial distance. This > patch makes a simple modification to DistanceFunctionTest to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM
[ https://issues.apache.org/jira/browse/LUCENE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3095: --- Attachment: LUCENE-3095.patch Nice catch selckin! I was finally able to repro the OOME. I think the attached patch should fix it. > TestIndexWriter#testThreadInterruptDeadlock fails with OOM > --- > > Key: LUCENE-3095 > URL: https://issues.apache.org/jira/browse/LUCENE-3095 > Project: Lucene - Java > Issue Type: Bug > Components: Index, Tests >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3095.patch > > > Selckin reported a repeatedly failing test that throws OOM Exceptions. > According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes > about 400MB heapspace containing 4194304 entries. Seems kind of way too many > though :) > {noformat} > [junit] java.lang.OutOfMemoryError: Java heap space > [junit] Dumping heap to /tmp/java_pid25990.hprof ... > [junit] Heap dump file created [520807744 bytes in 4.250 secs] > [junit] Testsuite: org.apache.lucene.index.TestIndexWriter > [junit] Testcase: > testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED > [junit] > [junit] junit.framework.AssertionFailedError: > [junit] at > org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) > [junit] > [junit] > [junit] Testcase: > testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED > [junit] Some threads threw uncaught exceptions! > [junit] junit.framework.AssertionFailedError: Some threads threw uncaught > exceptions! > [junit] at > org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) > [junit] > [junit] > [junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec > [junit] > [junit] - Standard Output --- > [junit] FAILED; unexpected exception > [junit] java.lang.OutOfMemoryError: Java heap space > [junit] at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85) > [junit] at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58) > [junit] at > org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132) > [junit] at > org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171) > [junit] at > org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155) > [junit] at > org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223) > [junit] at > org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189) > [junit] at > org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138) > [junit] at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344) > [junit] at > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959) > [junit] at > org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373) > [junit] at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230) > [junit] at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) > [junit] at > org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154) > [junit] - --- > [junit] - Standard Error - > [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter > -Dtestmethod=testThreadInterruptDeadlock > -Dtests.seed=7183538093651149:3431510331342554160 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter > -Dtestmethod=testThreadInterruptDeadlock > -Dtests.seed=7183538093651149:3431510331342554160 > [ju
[jira] [Assigned] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM
[ https://issues.apache.org/jira/browse/LUCENE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-3095: -- Assignee: Michael McCandless > TestIndexWriter#testThreadInterruptDeadlock fails with OOM > --- > > Key: LUCENE-3095 > URL: https://issues.apache.org/jira/browse/LUCENE-3095 > Project: Lucene - Java > Issue Type: Bug > Components: Index, Tests >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Michael McCandless > Fix For: 4.0 > > > Selckin reported a repeatedly failing test that throws OOM Exceptions. > According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes > about 400MB heapspace containing 4194304 entries. Seems kind of way too many > though :) > {noformat} > [junit] java.lang.OutOfMemoryError: Java heap space > [junit] Dumping heap to /tmp/java_pid25990.hprof ... > [junit] Heap dump file created [520807744 bytes in 4.250 secs] > [junit] Testsuite: org.apache.lucene.index.TestIndexWriter > [junit] Testcase: > testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED > [junit] > [junit] junit.framework.AssertionFailedError: > [junit] at > org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) > [junit] > [junit] > [junit] Testcase: > testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED > [junit] Some threads threw uncaught exceptions! > [junit] junit.framework.AssertionFailedError: Some threads threw uncaught > exceptions! > [junit] at > org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) > [junit] > [junit] > [junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec > [junit] > [junit] - Standard Output --- > [junit] FAILED; unexpected exception > [junit] java.lang.OutOfMemoryError: Java heap space > [junit] at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85) > [junit] at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58) > [junit] at > org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132) > [junit] at > org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171) > [junit] at > org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155) > [junit] at > org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223) > [junit] at > org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189) > [junit] at > org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138) > [junit] at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344) > [junit] at > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959) > [junit] at > org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373) > [junit] at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230) > [junit] at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) > [junit] at > org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154) > [junit] - --- > [junit] - Standard Error - > [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter > -Dtestmethod=testThreadInterruptDeadlock > -Dtests.seed=7183538093651149:3431510331342554160 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter > -Dtestmethod=testThreadInterruptDeadlock > -Dtests.seed=7183538093651149:3431510331342554160 > [junit] The following exceptions were thrown by threads: > [junit] *** Thread: Thread-379 *** > [junit] java.lang.RuntimeException: Mock
[jira] [Resolved] (LUCENE-3058) FST should allow more than one output for the same input
[ https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3058. Resolution: Fixed > FST should allow more than one output for the same input > > > Key: LUCENE-3058 > URL: https://issues.apache.org/jira/browse/LUCENE-3058 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch > > > For the block tree terms dict, it turns out I need this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3058) FST should allow more than one output for the same input
[ https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033314#comment-13033314 ] Michael McCandless commented on LUCENE-3058: OK thanks Uwe... I'll commit. > FST should allow more than one output for the same input > > > Key: LUCENE-3058 > URL: https://issues.apache.org/jira/browse/LUCENE-3058 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch > > > For the block tree terms dict, it turns out I need this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3058) FST should allow more than one output for the same input
[ https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033309#comment-13033309 ] Uwe Schindler commented on LUCENE-3058: --- After reviewing, this seems the only solution. The cast is guarded by the instanceof check, but compiler does not know this. Only the (Object) cast in second param is not needed: {code} @SuppressWarnings("unchecked") final Builder b = (Builder) builder; b.add(pair.input, _outputs.get(twoLongs.first)); b.add(pair.input, _outputs.get(twoLongs.second)); {code} > FST should allow more than one output for the same input > > > Key: LUCENE-3058 > URL: https://issues.apache.org/jira/browse/LUCENE-3058 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch > > > For the block tree terms dict, it turns out I need this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM
[ https://issues.apache.org/jira/browse/LUCENE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033296#comment-13033296 ] selckin commented on LUCENE-3095: - I believe the test is wrong, and it can get back into the thread's while(true) before setting the finish flag and after the last interrupt and therefor never end, the inner while(true) should probably be a while(!finish) aswel. > TestIndexWriter#testThreadInterruptDeadlock fails with OOM > --- > > Key: LUCENE-3095 > URL: https://issues.apache.org/jira/browse/LUCENE-3095 > Project: Lucene - Java > Issue Type: Bug > Components: Index, Tests >Affects Versions: 4.0 >Reporter: Simon Willnauer > Fix For: 4.0 > > > Selckin reported a repeatedly failing test that throws OOM Exceptions. > According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes > about 400MB heapspace containing 4194304 entries. Seems kind of way too many > though :) > {noformat} > [junit] java.lang.OutOfMemoryError: Java heap space > [junit] Dumping heap to /tmp/java_pid25990.hprof ... > [junit] Heap dump file created [520807744 bytes in 4.250 secs] > [junit] Testsuite: org.apache.lucene.index.TestIndexWriter > [junit] Testcase: > testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED > [junit] > [junit] junit.framework.AssertionFailedError: > [junit] at > org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) > [junit] > [junit] > [junit] Testcase: > testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED > [junit] Some threads threw uncaught exceptions! > [junit] junit.framework.AssertionFailedError: Some threads threw uncaught > exceptions! > [junit] at > org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) > [junit] > [junit] > [junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec > [junit] > [junit] - Standard Output --- > [junit] FAILED; unexpected exception > [junit] java.lang.OutOfMemoryError: Java heap space > [junit] at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85) > [junit] at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58) > [junit] at > org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132) > [junit] at > org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171) > [junit] at > org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155) > [junit] at > org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223) > [junit] at > org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189) > [junit] at > org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138) > [junit] at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344) > [junit] at > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959) > [junit] at > org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758) > [junit] at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373) > [junit] at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230) > [junit] at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) > [junit] at > org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154) > [junit] - --- > [junit] - Standard Error - > [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter > -Dtestmethod=testThreadInterruptDeadlock > -Dtests.seed=7183538093651149:3431510331342554160 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter > -Dtestmethod=testThreadInterruptDeadlock > -Dtests.seed=71
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8027 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8027/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033279#comment-13033279 ] Earwin Burrfoot commented on LUCENE-2793: - As mentioned @LUCENE-3092, it would be nice not to include the OneMerge, but some meaningful value like 'expectedSize', 'expectedSegmentSize' or whatnot, that would work both for merges *and* flushes, and also won't introduce needless dependency on MergePolicy. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3058) FST should allow more than one output for the same input
[ https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3058: --- Attachment: LUCENE-3058.patch I think we should just suppress the warning? > FST should allow more than one output for the same input > > > Key: LUCENE-3058 > URL: https://issues.apache.org/jira/browse/LUCENE-3058 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch > > > For the block tree terms dict, it turns out I need this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8030 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8030/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-3058) FST should allow more than one output for the same input
[ https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-3058: Reopening -- this commit has generics violations in TestFSTs. > FST should allow more than one output for the same input > > > Key: LUCENE-3058 > URL: https://issues.apache.org/jira/browse/LUCENE-3058 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-3058.patch, LUCENE-3058.patch > > > For the block tree terms dict, it turns out I need this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8026 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8026/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1113 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1113/ No tests ran. Build Log (for compile errors): [...truncated 26 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8029 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8029/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8025 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8025/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2514) Upgrade velocity-tools to released version
Upgrade velocity-tools to released version -- Key: SOLR-2514 URL: https://issues.apache.org/jira/browse/SOLR-2514 Project: Solr Issue Type: Task Components: web gui Affects Versions: 3.1 Environment: JBoss 6.0.0.Final on FreeBSD 8.2 Reporter: Craig Lewis Priority: Minor I'm deploying Solr 3.1.0 in JBoss 6.0.0.Final In JBoss, I'm trying to deploy apache-solr-3.1.0/example/webapps/solr.war as a Web Application. During deployment, JBoss returns an error: Deployment "vfs:///usr/local/jboss-6.0.0.Final/server/default/deploy/solr.war" is in error due to the following reason(s): org.xml.sax.SAXException: Element type "tlibversion" must be declared. @ vfs:///usr/local/jboss-6.0.0.Final/server/default/deploy/solr.war/WEB-INF/lib/velocity-tools-2.0-beta3.jar/META-INF/velocity-view.tld[22,16] at org.rhq.plugins.jbossas5.util.DeploymentUtils.deployArchive(DeploymentUtils.java:146) [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0] at org.rhq.plugins.jbossas5.deploy.AbstractDeployer.deploy(AbstractDeployer.java:119) [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0] at org.rhq.plugins.jbossas5.helper.CreateChildResourceFacetDelegate.createContentBasedResource(CreateChildResourceFacetDelegate.java:124) [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0] at org.rhq.plugins.jbossas5.helper.CreateChildResourceFacetDelegate.createResource(CreateChildResourceFacetDelegate.java:56) [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0] at org.rhq.plugins.jbossas5.ApplicationServerComponent.createResource(ApplicationServerComponent.java:304) [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [:1.6.0] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [:1.6.0] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [:1.6.0] at java.lang.reflect.Method.invoke(Method.java:616) [:1.6.0] at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocationThread.call(ResourceContainer.java:525) [:3.0.0] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [:1.6.0] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [:1.6.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [:1.6.0] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [:1.6.0] at java.lang.Thread.run(Thread.java:679) [:1.6.0] After a bit of digging, I found that there was a bug in solr.war/WEB-INF/lib/velocity-tools-2.0-beta3.jar/META-INF/velocity-view.tld, [https://issues.apache.org/jira/browse/VELTOOLS-120] The latest version of velocity-tools, velocity-tools-2.0.jar (available at [http://velocity.apache.org/download.cgi] ) include this bugfix. To test, I unziped solr.war, deleted solr.war/WEB-INF/lib/velocity-tools-2.0-beta3.jar, added solr.war/WEB-INF/lib/velocity-tools-2.0.jar, and re-zipped solr.war. I am able to deploy this new .war file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8028 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8028/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir
[ https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033240#comment-13033240 ] Michael McCandless commented on LUCENE-3092: {quote} IOCtx should have a value 'expectedSize', or 'priority', or something similar. This does not introduce a transitive dependency of Directory from MergePolicy (to please you once more - a true WTF), {quote} Ahh, good point. So, for this dir impl I want to say "if net seg size is < X MB, cache it in RAM", so I guess we could have something like "expectedSizeOfSegmentMB" (covers all files that will be flushed for this segment, hmm minus the doc stores) in the IOCtx. > NRTCachingDirectory, to buffer small segments in a RAMDir > - > > Key: LUCENE-3092 > URL: https://issues.apache.org/jira/browse/LUCENE-3092 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch > > > I created this simply Directory impl, whose goal is reduce IO > contention in a frequent reopen NRT use case. > The idea is, when reopening quickly, but not indexing that much > content, you wind up with many small files created with time, that can > possibly stress the IO system eg if merges, searching are also > fighting for IO. > So, NRTCachingDirectory puts these newly created files into a RAMDir, > and only when they are merged into a too-large segment, does it then > write-through to the real (delegate) directory. > This lets you spend some RAM to reduce I0. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.
[ https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033235#comment-13033235 ] Digy commented on LUCENENET-412: Some samples to show the diffs of 2.9.4 & 2.9.4g on Readability. {code} From: ((System.Collections.IList) ((System.Collections.ArrayList) segmentInfos).GetRange(start, start + merge.segments.Count - start)).Clear(); To: segmentInfos.RemoveRange(start, start + merge.segments.Count - start); - From: System.Collections.IEnumerator it = ((System.Collections.ICollection) readerToFields[reader]).GetEnumerator(); while (it.MoveNext()) { if (fieldSelector.Accept((System.String) it.Current) != FieldSelectorResult.NO_LOAD) { include = true; break; } } To: foreach (string x in readerToFields[reader]) { if (fieldSelector.Accept(x) != FieldSelectorResult.NO_LOAD) { include = true; break; } } - From: for (System.Collections.IEnumerator iter = weights.GetEnumerator(); iter.MoveNext(); ) { ((Weight) iter.Current).Normalize(norm); } To: foreach(Weight w in weights) { w.Normalize(norm); } - From: public virtual System.Collections.IList GetTermArrays() { return (System.Collections.IList) System.Collections.ArrayList.ReadOnly(new System.Collections.ArrayList(termArrays)); } To: public virtual List GetTermArrays() { return new List(termArrays); } - From: System.Collections.ArrayList results = new System.Collections.ArrayList(); return (TermFreqVector[]) results.ToArray(typeof(TermFreqVector)); To: List results = new List(); ... return results.ToArray(); {code} DIGY > Replacing ArrayLists, Hashtables etc. with appropriate Generics. > > > Key: LUCENENET-412 > URL: https://issues.apache.org/jira/browse/LUCENENET-412 > Project: Lucene.Net > Issue Type: Improvement >Affects Versions: Lucene.Net 2.9.4 >Reporter: Digy >Priority: Minor > Fix For: Lucene.Net 2.9.4 > > Attachments: IEquatable for Query&Subclasses.patch, > LUCENENET-412.patch, lucene_2.9.4g_exceptions_fix > > > This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some > performance gains. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8024 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8024/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3094. - Resolution: Fixed Committed revision 1102875. > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033229#comment-13033229 ] Robert Muir commented on LUCENE-3094: - Thanks guys, I'll add a test for this and commit. > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter
[ https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033227#comment-13033227 ] Gabriele Kahlout commented on SOLR-2513: +1 for @lucene.internal. I'd use language features such as final only if there was a 'technical' reason (@see http://download.oracle.com/javase/tutorial/java/IandI/final.html). > Allow to subclass org.apache.solr.response.XMLWriter > - > > Key: SOLR-2513 > URL: https://issues.apache.org/jira/browse/SOLR-2513 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Reporter: Gabriele Kahlout >Assignee: Ryan McKinley >Priority: Trivial > Attachments: SOLR-2513.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Hacking/debugging/extending Solr with one's own ResponseWriter one might want > to inherit functionality from XMLWriter. A trivial example is overriding > writeDate(..) to use a different calendar/format. > I asked about why it's made final on the mailing list[1]. > [1] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8027 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8027/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter
[ https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033215#comment-13033215 ] Ryan McKinley commented on SOLR-2513: - ResponseWriters in general are pretty ugly maybe just mark @lucene.internal and let people subclass at their own risk. The likelyhood of this getting a real cleanup soon is pretty low > Allow to subclass org.apache.solr.response.XMLWriter > - > > Key: SOLR-2513 > URL: https://issues.apache.org/jira/browse/SOLR-2513 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Reporter: Gabriele Kahlout >Assignee: Ryan McKinley >Priority: Trivial > Attachments: SOLR-2513.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Hacking/debugging/extending Solr with one's own ResponseWriter one might want > to inherit functionality from XMLWriter. A trivial example is overriding > writeDate(..) to use a different calendar/format. > I asked about why it's made final on the mailing list[1]. > [1] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033210#comment-13033210 ] Michael McCandless commented on LUCENE-3094: +1 -- we shouldn't create these scary states. > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8023 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8023/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033209#comment-13033209 ] Dawid Weiss commented on LUCENE-3094: - This looks good to me. And even if it doesn't affect performance it definitely should help those poor souls wishing to actually understand this algorithm :) > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter
[ https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033207#comment-13033207 ] Hoss Man commented on SOLR-2513: there was some discussion about this in the past. the crux of the concern was that the API is really ugly and wide open and never really intended for general use (i think the class was initially a static private inner class of the XmlResponseWriter before some refactoring) and it should probably be cleaned up before encouraging general use from people who write plugins > Allow to subclass org.apache.solr.response.XMLWriter > - > > Key: SOLR-2513 > URL: https://issues.apache.org/jira/browse/SOLR-2513 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Reporter: Gabriele Kahlout >Assignee: Ryan McKinley >Priority: Trivial > Attachments: SOLR-2513.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Hacking/debugging/extending Solr with one's own ResponseWriter one might want > to inherit functionality from XMLWriter. A trivial example is overriding > writeDate(..) to use a different calendar/format. > I asked about why it's made final on the mailing list[1]. > [1] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter
[ https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-2513: Attachment: SOLR-2513.patch trivial path (but added some cleanup whie we are at it) > Allow to subclass org.apache.solr.response.XMLWriter > - > > Key: SOLR-2513 > URL: https://issues.apache.org/jira/browse/SOLR-2513 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Reporter: Gabriele Kahlout >Priority: Trivial > Attachments: SOLR-2513.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Hacking/debugging/extending Solr with one's own ResponseWriter one might want > to inherit functionality from XMLWriter. A trivial example is overriding > writeDate(..) to use a different calendar/format. > I asked about why it's made final on the mailing list[1]. > [1] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter
[ https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-2513: Assignee: Ryan McKinley any objections? > Allow to subclass org.apache.solr.response.XMLWriter > - > > Key: SOLR-2513 > URL: https://issues.apache.org/jira/browse/SOLR-2513 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Reporter: Gabriele Kahlout >Assignee: Ryan McKinley >Priority: Trivial > Attachments: SOLR-2513.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > Hacking/debugging/extending Solr with one's own ResponseWriter one might want > to inherit functionality from XMLWriter. A trivial example is overriding > writeDate(..) to use a different calendar/format. > I asked about why it's made final on the mailing list[1]. > [1] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8026 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8026/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter
Allow to subclass org.apache.solr.response.XMLWriter - Key: SOLR-2513 URL: https://issues.apache.org/jira/browse/SOLR-2513 Project: Solr Issue Type: Improvement Components: Response Writers Reporter: Gabriele Kahlout Priority: Trivial Hacking/debugging/extending Solr with one's own ResponseWriter one might want to inherit functionality from XMLWriter. A trivial example is overriding writeDate(..) to use a different calendar/format. I asked about why it's made final on the mailing list[1]. [1] http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8022 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8022/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8025 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8025/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8021 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8021/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8024 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8024/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: GSoC: LUCENE-2308: Separately specify a field's type
2011/5/12 Michael McCandless > 2011/5/9 Nikola Tanković : > > >> > Introduction of an FieldType class that will hold all the extra > >> > properties > >> > now stored inside Field instance other than field value itself. > >> > >> Seems like this is an easy first baby step -- leave current Field > >> class, but break out the "type" details into a separate class that can > >> be shared across Field instances. > > > > Yes, I agree, this could be a good first step. Mike submitted a patch on > > issue #2308. I think it's a solid base for this. > > Make that Chris. > Ouch, sorry! > > >> > New FieldTypeAttribute interface will be added to handle extension > with > >> > new > >> > field properties inspired by IndexWriterConfig. > >> > >> How would this work? What's an example compelling usage? An app > >> could use this for extensibility, and then make a matching codec that > >> picks up this attr? EG, say, maybe for marking that a field is a > >> "primary key field" and then codec could optimize accordingly...? > > > > Well that could be very interesting scenario. It didn't rang a bell to me > > for possible codec usage, but it seems very reasonable. Attributes > otherwise > > don't make much sense, unless propertly used in custom codecs. > > > > How will we ensure attribute and codec compatibility? > > I'm just thinking we should have concrete reasons in mind for cutting > over to attributes here... I'd rather see a fixed, well thought out > concrete FieldType hierarchy first... > Yes, I couldn't agree more, and I also think Chris has some great ideas on this field, given his work on Spatial indexing which tends to have use of this additional attributes. > > >> > Refactoring and dividing of settings for term frequency and > positioning > >> > can > >> > also be done (LUCENE-2048) > >> > >> Ahh great! So we can omit-positions-but-not-TF. > >> > >> > Discuss possible effects of completion of LUCENE-2310 on this project > >> > >> This one is badly needed... but we should keep your project focused. > > > > > > We'll tackle this one afterwards. > > Good. > > >> > Adequate Factory class for easier configuration of new Field instances > >> > together with manually added new FieldTypeAttributes > >> > FieldType, once instantiated is read-only. Only fields value can be > >> > changed. > >> > >> OK. > >> > >> > Simple hierarchy of Field classes with core properties logically > >> > predefaulted. E.g.: > >> > > >> > NumberField, > >> > >> Can't this just be our existing NumericField? > > > > Yes, this is classic NumericField with changes proposed in LUCENE-2310. > Tim > > Smith mentioned that Fieldable class should be kept for custom > > implementations to reduce number of setters (for defaults). > > Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it > > should be implemented instead of Fieldable for custom implementations, so > > both Fieldable and AbstractField are not needed anymore. > > In my opinion Field shoud become abstract extended with others. > > Another proposal: how about keeping only Field (with no hierarchy) and > move > > hierarchy to FieldType, such as NumericFieldType, StringFieldType since > this > > hierarchy concerns type information only? > > I think hierarchy of both types and the "value containers" that hold > the corresponding values could make sense? > Hmm, I think we should get more opinions on this one also. > > > e.g. Usage: > > FieldType number = new NumericFieldType(); > > Field price = new Field(); > > price.setType(number); > > // but this is much cleaner... > > Field price = new NumericField(); > > so maybe whe should have paraller XYZField with XYZFieldType... > > Am I complicating? > >> > >> > StringField, > >> > >> This would be like NOT_ANALYZED? > > > > Yes, strings are often one word only. Or maybe we can name it NameField, > > NonAnalyzedField or something. > > StringField sounds good actually... > > >> > TextField, > >> > >> This would be ANALYZED? > > > > Yes. > > > > OK. > > >> > What is the best way to break this into small baby steps? > >> > >> Hopefully this becomes clearer as we iterate. > > > > Well, we know the first step: moving type details into FieldType class. > > Yes! > > Somehow tying into this as well is a stronger decoupling of the > indexer from analysis/document. Ie, what indexer needs of a document > is very minimal -- just an iterable over indexed & stored values. > Separately we can still provide a "full featured" Document class w/ > add, get, remove, etc., but that's "outside" of the indexer. > I'll get back to this one after additional research. Maybe we should do couple of more interactions, then I'll summarize the conclusions. > > Mike > > http://blog.mikemccandless.com Nikola
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8020 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8020/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3040) analysis consumers should use reusable tokenstreams
[ https://issues.apache.org/jira/browse/LUCENE-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3040. - Resolution: Fixed Committed revision 1102817, 1102820 > analysis consumers should use reusable tokenstreams > --- > > Key: LUCENE-3040 > URL: https://issues.apache.org/jira/browse/LUCENE-3040 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3040.patch, LUCENE-3040.patch > > > Some analysis consumers (highlighter, more like this, memory index, contrib > queryparser, ...) are using Analyzer.tokenStream but should be using > Analyzer.reusableTokenStream instead for better performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8023 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8023/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3064) add checks to MockTokenizer to enforce proper consumption
[ https://issues.apache.org/jira/browse/LUCENE-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3064. - Resolution: Fixed backported to 3.x in revision 1102812 > add checks to MockTokenizer to enforce proper consumption > - > > Key: LUCENE-3064 > URL: https://issues.apache.org/jira/browse/LUCENE-3064 > Project: Lucene - Java > Issue Type: Test >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3064.patch, LUCENE-3064.patch, LUCENE-3064.patch, > LUCENE-3064.patch > > > we can enforce things like consumer properly iterates through tokenstream > lifeycle > via MockTokenizer. this could catch bugs in consumers that don't call > reset(), etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8019 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8019/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8022 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8022/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8018 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8018/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1112 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1112/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8021 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8021/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8017 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8017/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8020 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8020/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush
[ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033080#comment-13033080 ] Simon Willnauer commented on LUCENE-3090: - bq. But shouldn't stallControl kick in in that case? Ie, we stall all indexing if the number of flush-pending DWPTs is >= the number of active DWPTs, I think? Right so lets say we have two active thread states: 1. thread 1 starts indexing (max ram is 16M) it indexes n docs and has 15.9 MB ram used. Now n+1 doc comes in has 5MB (active mem= 20.9M flush Mem: 0M) 2. take it out for flush (active mem=0M flush Mem: 20.9M) 3. thread 2 starts indexing and fills ram quickly ending up with 18M memory (active mem=18M flush Mem: 20.9M) 4. take thread 2 out for flush (active mem=0M flush Mem: 38.9M) 5. thread 3 has already started indexing and reaches the RAM threshold (16M) so we have: (active mem=16M flush Mem: 38.9M) 6. take it out for flushing (now we stall currently) (active mem=0M flush Mem: 54.9M) - this is more than 3x max ram buffer. we currently stall at flush-pending DWPTs is > (num active DWPT + 1) we can reduce that though but maybe we should swap back to ram based stalling? > DWFlushControl does not take active DWPT out of the loop on fullFlush > - > > Key: LUCENE-3090 > URL: https://issues.apache.org/jira/browse/LUCENE-3090 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Critical > Fix For: 4.0 > > Attachments: LUCENE-3090.patch, LUCENE-3090.patch > > > We have seen several OOM on TestNRTThreads and all of them are caused by > DWFlushControl missing DWPT that are set as flushPending but can't full due > to a full flush going on. Yet that means that those DWPT are filling up in > the background while they should actually be checked out and blocked until > the full flush finishes. Even further we currently stall on the > maxNumThreadStates while we should stall on the num of active thread states. > I will attach a patch tomorrow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8016 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8016/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-2512. -- Resolution: Fixed trunk: Committed revision 1102785. 3x: Committed revision 1102789. > uima: add an ability to skip runtime error in AnalysisEngine > > > Key: SOLR-2512 > URL: https://issues.apache.org/jira/browse/SOLR-2512 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, > SOLR-2512.patch, SOLR-2512.patch > > > Currently, if AnalysisEngine throws an exception during processing a text, > whole adding docs go fail. Because online NLP services are error-prone, users > should be able to choose whether solr skips the text processing (but source > text can be indexed) for the document or throws a runtime exception so that > solr can stop adding documents entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033072#comment-13033072 ] Simon Willnauer edited comment on LUCENE-3094 at 5/13/11 3:19 PM: -- Display the attached images before: !before.png! after: !after.png! was (Author: simonw): Display the attached images before: !before.png|thumbnail! after: !after.png|thumbnail! > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush
[ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033073#comment-13033073 ] Michael McCandless commented on LUCENE-3090: {quote} bq. Could we add an assert that net flushPending + active RAM never exceeds some multiplier (2X?) of the configured max RAM? net flush pending means? we only differ between flushing ram and active ram so flushing ram can easily get above such a limit if IO is slow... {quote} But shouldn't stallControl kick in in that case? Ie, we stall all indexing if the number of flush-pending DWPTs is >= the number of active DWPTs, I think? > DWFlushControl does not take active DWPT out of the loop on fullFlush > - > > Key: LUCENE-3090 > URL: https://issues.apache.org/jira/browse/LUCENE-3090 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Critical > Fix For: 4.0 > > Attachments: LUCENE-3090.patch, LUCENE-3090.patch > > > We have seen several OOM on TestNRTThreads and all of them are caused by > DWFlushControl missing DWPT that are set as flushPending but can't full due > to a full flush going on. Yet that means that those DWPT are filling up in > the background while they should actually be checked out and blocked until > the full flush finishes. Even further we currently stall on the > maxNumThreadStates while we should stall on the num of active thread states. > I will attach a patch tomorrow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033072#comment-13033072 ] Simon Willnauer edited comment on LUCENE-3094 at 5/13/11 3:19 PM: -- Display the attached images before: !before.png|thumbnail! after: !after.png|thumbnail! was (Author: simonw): Display the attached images before: !before.png! after: !after.png! > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM
TestIndexWriter#testThreadInterruptDeadlock fails with OOM --- Key: LUCENE-3095 URL: https://issues.apache.org/jira/browse/LUCENE-3095 Project: Lucene - Java Issue Type: Bug Components: Index, Tests Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 Selckin reported a repeatedly failing test that throws OOM Exceptions. According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes about 400MB heapspace containing 4194304 entries. Seems kind of way too many though :) {noformat} [junit] java.lang.OutOfMemoryError: Java heap space [junit] Dumping heap to /tmp/java_pid25990.hprof ... [junit] Heap dump file created [520807744 bytes in 4.250 secs] [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED [junit] [junit] junit.framework.AssertionFailedError: [junit] at org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) [junit] [junit] [junit] Testcase: testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED [junit] Some threads threw uncaught exceptions! [junit] junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) [junit] [junit] [junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec [junit] [junit] - Standard Output --- [junit] FAILED; unexpected exception [junit] java.lang.OutOfMemoryError: Java heap space [junit] at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85) [junit] at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58) [junit] at org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132) [junit] at org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171) [junit] at org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155) [junit] at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223) [junit] at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189) [junit] at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138) [junit] at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344) [junit] at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959) [junit] at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) [junit] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763) [junit] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758) [junit] at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373) [junit] at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230) [junit] at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) [junit] at org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154) [junit] - --- [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testThreadInterruptDeadlock -Dtests.seed=7183538093651149:3431510331342554160 [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testThreadInterruptDeadlock -Dtests.seed=7183538093651149:3431510331342554160 [junit] The following exceptions were thrown by threads: [junit] *** Thread: Thread-379 *** [junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_3r1n_0.tib=1, _3r1n_0.frq=1, _3r1n_0.pos=1, _3r1m.cfs=1, _3r1n_0.doc=1, _3r1n.tvf=1, _3r1n.tvd=1, _3r1n.tvx=1, _3r1n.fdx=1, _3r1n.fdt=1, _3r1q.cfs=1, _3r1o.cfs=1, _3r1n_0.skp=1, _3r1n_0.pyl=1} [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:448) [junit] at org.apache.lucene.index.TestIndexWriter$In
[jira] [Commented] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033072#comment-13033072 ] Simon Willnauer commented on LUCENE-3094: - Display the attached images before: !before.png! after: !after.png! > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8019 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8019/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8015 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8015/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3094: Attachment: after.png before.png > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch, after.png, before.png > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8018 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8018/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2511) Make it easier to override SolrContentHandler newDocument
[ https://issues.apache.org/jira/browse/SOLR-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-2511. --- Resolution: Fixed Fix Version/s: 4.0 3.2 > Make it easier to override SolrContentHandler newDocument > - > > Key: SOLR-2511 > URL: https://issues.apache.org/jira/browse/SOLR-2511 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2511.patch > > > The SolrContentHandler's newDocument method does a variety of things: adds > metadata, literals, content and catpured content. We could split this out > into protected methods for each that makes it easier to override. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033060#comment-13033060 ] Tommaso Teofili commented on SOLR-2512: --- +1 > uima: add an ability to skip runtime error in AnalysisEngine > > > Key: SOLR-2512 > URL: https://issues.apache.org/jira/browse/SOLR-2512 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, > SOLR-2512.patch, SOLR-2512.patch > > > Currently, if AnalysisEngine throws an exception during processing a text, > whole adding docs go fail. Because online NLP services are error-prone, users > should be able to choose whether solr skips the text processing (but source > text can be indexed) for the document or throws a runtime exception so that > solr can stop adding documents entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
Hi Vincent, My first goal was to replace ArrayList, Hashtables, Enumerators etc. as quickly as possible. Applying best practices could wait till a more cleaner code . The purpose for Support.Set was to have a collection that can be accessed with indexer and also implements the method "Contains". It was a quick solution to the problem. Similarly, Support.Dictionary was just to be able to return null when a collection didn't contain the item(without exception). Changing zillions of lines with if(coll.ContainsKey(...)) seemed too hard to me at that time(forgetting one results in weird effects at runtime not at compile time). DIGY On Fri, May 13, 2011 at 4:22 PM, Van Den Berghe, Vincent (JIRA) < j...@apache.org> wrote: > >[ > https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033031#comment-13033031] > > Van Den Berghe, Vincent commented on LUCENENET-414: > --- > > Hello Digy, > > Thanks for your response. > I don't want to sound overly pedantic (but please tell me if I do), but > this changed implementation solves only part of the problem. > Now, CharArraySet derives from Set, which itself derives from List. > Items are now stored both in this base class, as in the private > HashSet _Set. > However, because List doesn't define its modifiers Add(T), Clear() and > Remove(T) as virtual, the derived implementation defines them as "new". > This violates a variant of the Liskov substitution principle: an operation > on the derived type has not the same effect as the same operation on the > base type. > In this case, it means that the following code will cause the items in the > List base type and in the _Set to be desynchronized: > >CharArraySet set=... >List same=set; >same.Add("whatever"); >// at this point, same.Contains("whatever")==true but > set.Contains("whatever")==false even though it's the same instance. > > You might rightfully retort that this never happens and I should mind my > own business, but I know at least one poor soul who did just that: me :-(. > > On a completely unrelated matter, the new implementation has 2 methods: > >public void Add(System.Collections.Generic.IList > items) >public void Add(Support.Set items) > > .. which can be collapsed into one, since the only thing used in both cases > is the enumerator: > >public void Add(IEnumerable items) > > I don't recall the design rule, but it's something like "to increase reuse, > make your function parameters are general as possible, but their return > value as specific as possible". > I am unable to get 2.9.4g to investigate further, but if you are moving > towards the Generic collections in Lucene, the following implementation > should be a drop-in replacement, without suffering from the aforementioned > quirks: > >[Serializable] >public class Set : ICollection >{ >private readonly > System.Collections.Generic.HashSet _Set = new > System.Collections.Generic.HashSet(); >bool _ReadOnly = false; > >public Set() >{ >} > >public Set(bool readOnly) >{ >this._ReadOnly = readOnly; >} > >public bool ReadOnly >{ >set >{ >_ReadOnly = value; >} >get >{ >return _ReadOnly; >} >} > >public virtual void Add(T item) >{ >if (_ReadOnly) throw new > NotSupportedException(); >if (_Set.Contains(item)) return; >_Set.Add(item); >} > >public void Add(IEnumerable items) >{ >if (_ReadOnly) throw new > NotSupportedException(); >foreach (T item in items) >{ >if (_Set.Contains(item)) continue; >_Set.Add(item); >} >} > > >public void Clear() >{ >if (_ReadOnly) throw new > NotSupportedException(); >_Set.Clear(); >} > >public b
[jira] [Commented] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033058#comment-13033058 ] Koji Sekiguchi commented on SOLR-2512: -- I'll commit soon. > uima: add an ability to skip runtime error in AnalysisEngine > > > Key: SOLR-2512 > URL: https://issues.apache.org/jira/browse/SOLR-2512 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, > SOLR-2512.patch, SOLR-2512.patch > > > Currently, if AnalysisEngine throws an exception during processing a text, > whole adding docs go fail. Because online NLP services are error-prone, users > should be able to choose whether solr skips the text processing (but source > text can be indexed) for the document or throws a runtime exception so that > solr can stop adding documents entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-2512: Assignee: Koji Sekiguchi > uima: add an ability to skip runtime error in AnalysisEngine > > > Key: SOLR-2512 > URL: https://issues.apache.org/jira/browse/SOLR-2512 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, > SOLR-2512.patch, SOLR-2512.patch > > > Currently, if AnalysisEngine throws an exception during processing a text, > whole adding docs go fail. Because online NLP services are error-prone, users > should be able to choose whether solr skips the text processing (but source > text can be indexed) for the document or throws a runtime exception so that > solr can stop adding documents entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8014 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8014/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3094) optimize lev automata construction
[ https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3094: Attachment: LUCENE-3094.patch > optimize lev automata construction > -- > > Key: LUCENE-3094 > URL: https://issues.apache.org/jira/browse/LUCENE-3094 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 4.0 > > Attachments: LUCENE-3094.patch > > > in our lev automata algorithm, we compute an upperbound of the maximum > possible states (not the true number), and > create some "useless" unconnected states "floating around". > this isn't harmful, in the original impl we did the Automaton is simply a > pointer to the initial state, and all algorithms > traverse this list, so effectively the useless states were dropped > immediately. But recently we changed automaton to > cache its numberedStates, and we set them here, so these useless states are > being kept around. > it has no impact on performance, but can be really confusing if you are > debugging (e.g. toString). Thanks to Dawid Weiss > for noticing this. > at the same time, forcing an extra traversal is a bit scary, so i did some > benchmarking with really long strings and found > that actually its helpful to reduce() the number of transitions (typically > cuts them in half) for these long strings, as it > speeds up some later algorithms. > won't see any speedup for short terms, but I think its easier to work with > these simpler automata anyway, and it eliminates > the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3094) optimize lev automata construction
optimize lev automata construction -- Key: LUCENE-3094 URL: https://issues.apache.org/jira/browse/LUCENE-3094 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 in our lev automata algorithm, we compute an upperbound of the maximum possible states (not the true number), and create some "useless" unconnected states "floating around". this isn't harmful, in the original impl we did the Automaton is simply a pointer to the initial state, and all algorithms traverse this list, so effectively the useless states were dropped immediately. But recently we changed automaton to cache its numberedStates, and we set them here, so these useless states are being kept around. it has no impact on performance, but can be really confusing if you are debugging (e.g. toString). Thanks to Dawid Weiss for noticing this. at the same time, forcing an extra traversal is a bit scary, so i did some benchmarking with really long strings and found that actually its helpful to reduce() the number of transitions (typically cuts them in half) for these long strings, as it speeds up some later algorithms. won't see any speedup for short terms, but I think its easier to work with these simpler automata anyway, and it eliminates the confusion of seeing the redundant states without slowing anything down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8017 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8017/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #119: POMs out of sync
Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-trunk/119/ No tests ran. Build Log (for compile errors): [...truncated 40 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush
[ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033043#comment-13033043 ] Robert Muir commented on LUCENE-3090: - bq. net flush pending means? we only differ between flushing ram and active ram so flushing ram can easily get above such a limit if IO is slow... I/O or just "O"? Should we add a ThrottledIndexInput too? :) > DWFlushControl does not take active DWPT out of the loop on fullFlush > - > > Key: LUCENE-3090 > URL: https://issues.apache.org/jira/browse/LUCENE-3090 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Critical > Fix For: 4.0 > > Attachments: LUCENE-3090.patch, LUCENE-3090.patch > > > We have seen several OOM on TestNRTThreads and all of them are caused by > DWFlushControl missing DWPT that are set as flushPending but can't full due > to a full flush going on. Yet that means that those DWPT are filling up in > the background while they should actually be checked out and blocked until > the full flush finishes. Even further we currently stall on the > maxNumThreadStates while we should stall on the num of active thread states. > I will attach a patch tomorrow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8013 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8013/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8016 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8016/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8012 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8012/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
[ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033031#comment-13033031 ] Van Den Berghe, Vincent commented on LUCENENET-414: --- Hello Digy, Thanks for your response. I don't want to sound overly pedantic (but please tell me if I do), but this changed implementation solves only part of the problem. Now, CharArraySet derives from Set, which itself derives from List. Items are now stored both in this base class, as in the private HashSet _Set. However, because List doesn't define its modifiers Add(T), Clear() and Remove(T) as virtual, the derived implementation defines them as "new". This violates a variant of the Liskov substitution principle: an operation on the derived type has not the same effect as the same operation on the base type. In this case, it means that the following code will cause the items in the List base type and in the _Set to be desynchronized: CharArraySet set=... List same=set; same.Add("whatever"); // at this point, same.Contains("whatever")==true but set.Contains("whatever")==false even though it's the same instance. You might rightfully retort that this never happens and I should mind my own business, but I know at least one poor soul who did just that: me :-(. On a completely unrelated matter, the new implementation has 2 methods: public void Add(System.Collections.Generic.IList items) public void Add(Support.Set items) .. which can be collapsed into one, since the only thing used in both cases is the enumerator: public void Add(IEnumerable items) I don't recall the design rule, but it's something like "to increase reuse, make your function parameters are general as possible, but their return value as specific as possible". I am unable to get 2.9.4g to investigate further, but if you are moving towards the Generic collections in Lucene, the following implementation should be a drop-in replacement, without suffering from the aforementioned quirks: [Serializable] public class Set : ICollection { private readonly System.Collections.Generic.HashSet _Set = new System.Collections.Generic.HashSet(); bool _ReadOnly = false; public Set() { } public Set(bool readOnly) { this._ReadOnly = readOnly; } public bool ReadOnly { set { _ReadOnly = value; } get { return _ReadOnly; } } public virtual void Add(T item) { if (_ReadOnly) throw new NotSupportedException(); if (_Set.Contains(item)) return; _Set.Add(item); } public void Add(IEnumerable items) { if (_ReadOnly) throw new NotSupportedException(); foreach (T item in items) { if (_Set.Contains(item)) continue; _Set.Add(item); } } public void Clear() { if (_ReadOnly) throw new NotSupportedException(); _Set.Clear(); } public bool Contains(T item) { return _Set.Contains(item); } public void CopyTo(T[] array, int arrayIndex) { _Set.CopyTo(array, arrayIndex); } public int Count { get { return _Set.Count; } } public bool IsReadOnly { get { return _ReadOnly; } } public bool Remove(T item) { if (_ReadOnly) throw new NotSupportedException(); return _Set.Remove(item);
[jira] [Updated] (SOLR-2511) Make it easier to override SolrContentHandler newDocument
[ https://issues.apache.org/jira/browse/SOLR-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-2511: -- Attachment: SOLR-2511.patch Going to commit this > Make it easier to override SolrContentHandler newDocument > - > > Key: SOLR-2511 > URL: https://issues.apache.org/jira/browse/SOLR-2511 > Project: Solr > Issue Type: Improvement >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: SOLR-2511.patch > > > The SolrContentHandler's newDocument method does a variety of things: adds > metadata, literals, content and catpured content. We could split this out > into protected methods for each that makes it easier to override. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8015 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8015/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush
[ https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033024#comment-13033024 ] Simon Willnauer commented on LUCENE-3090: - bq. Could we add an assert that net flushPending + active RAM never exceeds some multiplier (2X?) of the configured max RAM? net flush pending means? we only differ between flushing ram and active ram so flushing ram can easily get above such a limit if IO is slow... > DWFlushControl does not take active DWPT out of the loop on fullFlush > - > > Key: LUCENE-3090 > URL: https://issues.apache.org/jira/browse/LUCENE-3090 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Critical > Fix For: 4.0 > > Attachments: LUCENE-3090.patch, LUCENE-3090.patch > > > We have seen several OOM on TestNRTThreads and all of them are caused by > DWFlushControl missing DWPT that are set as flushPending but can't full due > to a full flush going on. Yet that means that those DWPT are filling up in > the background while they should actually be checked out and blocked until > the full flush finishes. Even further we currently stall on the > maxNumThreadStates while we should stall on the num of active thread states. > I will attach a patch tomorrow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8011 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8011/ No tests ran. Build Log (for compile errors): [...truncated 45 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org