[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753391#comment-13753391 ] Robert Muir commented on LUCENE-5189: - In the case of old codecs: what we do is pretty tricky for testing: * we make them read-only officially for the user (so that new segments are written in the latest format, but old segments can still be read). * this has the additional caveat they are not purely read-only, because actually we allow liveDocs updates (deletes) against the old formats. so they are mostly read-only. * tests have read-write versions (like in branch4x: PreFlexRWCodec). These allow in tests for us to override the read-only-ness, and write like the old formats did and read them in transparently in tests. * Of course they cannot support the newest features with this impersonator testing we do, but in general we get a lot more test coverage than if we relied solely upon TestBackwardsCompatibility. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5201) UIMAUpdateRequestProcessor should cache the AnalysisEngine
Tommaso Teofili created SOLR-5201: - Summary: UIMAUpdateRequestProcessor should cache the AnalysisEngine Key: SOLR-5201 URL: https://issues.apache.org/jira/browse/SOLR-5201 Project: Solr Issue Type: Improvement Components: contrib - UIMA Affects Versions: 4.4 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 4.5, 5.0 As reported in http://markmail.org/thread/2psiyl4ukaejl4fx UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request which is bad for performance therefore that should be cached in the URP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated SOLR-5201: -- Description: As reported in http://markmail.org/thread/2psiyl4ukaejl4fx UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request which is bad for performance therefore it'd be nice if such AEs could be reused whenever that's possible. (was: As reported in http://markmail.org/thread/2psiyl4ukaejl4fx UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request which is bad for performance therefore that should be cached in the URP.) Summary: UIMAUpdateRequestProcessor should reuse the AnalysisEngine (was: UIMAUpdateRequestProcessor should cache the AnalysisEngine) UIMAUpdateRequestProcessor should reuse the AnalysisEngine -- Key: SOLR-5201 URL: https://issues.apache.org/jira/browse/SOLR-5201 Project: Solr Issue Type: Improvement Components: contrib - UIMA Affects Versions: 4.4 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 4.5, 5.0 As reported in http://markmail.org/thread/2psiyl4ukaejl4fx UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request which is bad for performance therefore it'd be nice if such AEs could be reused whenever that's possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
[ https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753455#comment-13753455 ] Uwe Schindler commented on LUCENE-5191: --- We have a variant of this code, recently added by Robert Muir into PostingsHighlighter's DefaultPassageFormatter. This escapes a little bit more chars, with a reference to OWASP: [https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content] and [https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.232_-_Attribute_Escape_Before_Inserting_Untrusted_Data_into_HTML_Common_Attributes] The code used here escapes any charis 127 and 255 according to the second rule, which is not needed here, because the escaped data is not included into HTML attributes which may be unquoted. So for this only the first rule applies, in which it is enough to escape the 4 well-known escapes and also the forward slash + single quote ('). The latter two ones do not need to be escaped if used in text, but for safety we could include them. In any case I would like to unify the different approaches of HTML escaping. As we are not working in unquoted attributes (we just encode floating HTML text), I would use Robert's code without the extra numeric escapes. The official HTML4 spec (I used HTML4, the passage is the same for other HTML, see [http://www.w3.org/TR/REC-html40/charset.html#h-5.3.2]): {quote} Four character entity references deserve special mention since they are frequently used to escape special characters: lt; represents the sign. gt; represents the sign. amp; represents the sign. quot; represents the mark. Authors wishing to put the character in text should use lt; (ASCII decimal 60) to avoid possible confusion with the beginning of a tag (start tag open delimiter). Similarly, authors should use gt; (ASCII decimal 62) in text instead of to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values. Authors should use amp; (ASCII decimal 38) instead of to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use amp; in attribute values since character references are allowed within CDATA attribute values. Some authors use the character entity reference quot; to encode instances of the double quote mark () since that character may be used to delimit attribute values. {quote} Any comments? SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP -- Key: LUCENE-5191 URL: https://issues.apache.org/jira/browse/LUCENE-5191 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5191.patch The highlighter provides a function to escape HTML, which does to much. To create valid HTML only , , , must be escaped, everything else can kept unescaped. The escaper unfortunately does also additionally escape everything 127, which is unneeded if your web site has the correct encoding. It also produces huge amounts of HTML entities if used with eastern languages. This would not be a bugf if the escaping would be correct, but it isn't, it escapes like that: {{result.append(\#).append((int)ch).append(;);}} So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret: U+10400 (deseret capital letter long i) would be escaped as {{\#55297;\#56320;}} and not as {{\#66560;}}. So we should remove the stupid encoding of chars 127 which is simply useless :-) See also: https://github.com/elasticsearch/elasticsearch/issues/3587 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5189: --- Attachment: LUCENE-5189.patch Patch adds some nocommits and tests that expose some problems: +Problem 1+ If you run the test with {{-Dtests.method=testSegmentMerges -Dtests.seed=7651E2AEEBC55BDF}}, you'll hit an exception: {noformat} NOTE: reproduce with: ant test -Dtestcase=TestNumericDocValuesUpdates -Dtests.method=testSegmentMerges -Dtests.seed=7651E2AEEBC55BDF -Dtests.locale=en_AU -Dtests.timezone=Etc/GMT+11 -Dtests.file.encoding=UTF-8 Aug 29, 2013 11:57:35 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks WARNING: Will linger awaiting termination of 1 leaked thread(s). Aug 29, 2013 11:57:35 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #0,6,TGRP-TestNumericDocValuesUpdates] org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError: formatName=Lucene45 prevValue=Memory at __randomizedtesting.SeedInfo.seed([7651E2AEEBC55BDF]:0) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518) Caused by: java.lang.AssertionError: formatName=Lucene45 prevValue=Memory at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:133) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:105) at org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:389) at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:178) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3732) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3401) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) {noformat} What happens is the test uses RandomCodec and picks MemoryDVF for writing that field. Later, when ReaderAndLiveDocs applies updates to that field, it uses SI.codec, which is not RandomCodec anymore, but Lucene45Codec (or in this case Facet45Codec - based on Codec.forName(Lucene45)), and its DVF returns for that field Lucene45DVF, because Lucene45Codec always returns that. The way it works during search is that PerFieldDVF.FieldsReader does not rely on the Codec at all, but rather looks up an attribute in FieldInfo which tells it the DVFormat.name and then it calls DVF.forName. But for writing, it relies on the Codec. I am not sure how to resolve this. I don't think ReaderAndLiveDocs is doing anything wrong -- per-field is not exposed on Codec API, therefore it shouldn't assume it should do any per-field stuff. But on the other hand, Lucene45Codec instances return per-field DVF based on what the instance says, and don't look at the FieldInfo attributes, as PerFieldDVF.FieldsReader does. Any ideas? +Problem 2+ Robert thought of this usecase: if you have a sparse DocValue field 'f', such that say in segment 1 only doc1 has a value, but in segment 2 none of the documents have values, you cannot really update documents in segment 2, because the FieldInfos for that segment won't list the field as having DocValues at all. For now, I catch that case in ReaderAndLiveDocs and throw an exception. The workaround is to make sure you always have values for a field in a segment, by e.g. always setting some default value. But this is ugly and exposes internal stuff (e.g. segments) to users. Also, it's bad because e.g. if segments 1+2 are merged, you suddenly *can* update documents that were in segment2 before. A way to solve it is to gen FieldInfos as well. That will allow us to additionally support adding new fields through field updates, though that's optional and we can still choose to forbid it. If we gen FieldInfos though, the changes I've done to SegmentInfos (recording per-field dvGen) need to be reverted. So it's important that we come to a resolution about this in this issue. This is somewhat of a corner case (sparse fields), but I don't like the fact that users can trip on exceptions that depend whether or not the segment was merged... +Problem 3+ FieldInfos.Builder neglect to update globalFieldNumbers.docValuesType map, if it updates a FieldInfo's DocValueType. It's an easy fix, and I added a test to numeric updates. If someone has an idea how to reproduce this outside of numeric updates scope, I'll be happy handle this in a separate issue. The
[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753475#comment-13753475 ] Tommaso Teofili commented on SOLR-5201: --- ok, now I recall why the caching logic was put in the AEProvider. Basically an UpdateRequestProcessor is instantiated on each update request (it's not reused) and therefore caching it locally wouldn't help. UIMAUpdateRequestProcessor should reuse the AnalysisEngine -- Key: SOLR-5201 URL: https://issues.apache.org/jira/browse/SOLR-5201 Project: Solr Issue Type: Improvement Components: contrib - UIMA Affects Versions: 4.4 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 4.5, 5.0 As reported in http://markmail.org/thread/2psiyl4ukaejl4fx UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request which is bad for performance therefore it'd be nice if such AEs could be reused whenever that's possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753476#comment-13753476 ] Shai Erera commented on LUCENE-5189: Regarding problem 1, I don't know if it's a valid solution, but maybe if we recorded a per-field-format map for each SegInfo, Lucene45Codec could initialize its dvFormat accordingly? This is not generic though .. it's like we need to have a Codec.serialize() method which dumps stuff to SegInfo (or returns a BytesRef/String from which it can later initialize itself). We'd then not need the attributes on FieldInfo. We have to somehow employ the same logic as we do in PerFieldDVF.FieldsReader, in PerFieldDVF.FieldsWriter for updating existing segments. Whatever solution we'll do here, will help us when we come to implement field updates for postings. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753482#comment-13753482 ] Shai Erera commented on LUCENE-5189: BTW, this may generally not be a bad idea, to let the Codec serialize some stuff which is later given to it in Codec.init(BytesRef). E.g. if a Codec is initialized with some parameters that are also important during search (e.g FacetsCodec can be initialized with FacetIndexingParams, which get lost during search because the Codec is initialized with default ctor), this could be a way for it to serialize/deserialize itself. The name will be used for the newInstance(), the rest to initialize the Codec. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
solr performance testing
Hello, afaik http://code.google.com/a/apache-extras.org/p/luceneutil/ is used for testing Lucene performance. What about Solr? Is it also supported or there are separate well known facility? Thanks in advance -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753500#comment-13753500 ] Shai Erera commented on LUCENE-5189: Regarding problem 3, Mike helped me construct a simple test which reproduces the bug - I opened LUCENE-5192 to fix. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
Shai Erera created LUCENE-5192: -- Summary: FieldInfos.Builder failed to catch adding field with different DV type under some circumstances Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5192: --- Attachment: LUCENE-5192.patch Patch adds a testcase and fixes the bug. The bug only happens if you add same field name as indexable and DV, and then in another segment change its DV type. I'll commit it shortly. FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5202) Support easier overrides of Carrot2 clustering attributes via XML data sets exported from the Workbench.
Dawid Weiss created SOLR-5202: - Summary: Support easier overrides of Carrot2 clustering attributes via XML data sets exported from the Workbench. Key: SOLR-5202 URL: https://issues.apache.org/jira/browse/SOLR-5202 Project: Solr Issue Type: New Feature Reporter: Dawid Weiss Assignee: Dawid Weiss Fix For: 4.5, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753510#comment-13753510 ] Michael McCandless commented on LUCENE-5192: +1, sneaky! FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5123) invert the codec postings API
[ https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5123: --- Attachment: LUCENE-5123.patch New patch, adding a test case that exercises this API a bit... invert the codec postings API - Key: LUCENE-5123 URL: https://issues.apache.org/jira/browse/LUCENE-5123 Project: Lucene - Core Issue Type: Wish Reporter: Robert Muir Assignee: Michael McCandless Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. FreqProxTermsWriter streams the postings at flush, and the default merge() takes the incoming codec api and filters out deleted docs and pushes via same api (but that can be overridden). It could be cleaner if we allowed for a pull model instead (like DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of itself and just passed this to the codec consumer. This would give the codec more flexibility to e.g. do multiple passes if it wanted to do things like encode high-frequency terms more efficiently with a bitset-like encoding or other things... A codec can try to do things like this to some extent today, but its very difficult (look at buffering in Pulsing). We made this change with DV and it made a lot of interesting optimizations easy to implement... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753515#comment-13753515 ] ASF subversion and git services commented on LUCENE-5192: - Commit 1518591 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1518591 ] LUCENE-5192: FieldInfos.Builder failed to catch adding field with different DV type under some circumstances FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query
HeXin created SOLR-5203: --- Summary: Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query Key: SOLR-5203 URL: https://issues.apache.org/jira/browse/SOLR-5203 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.4 Reporter: HeXin Priority: Minor Fix For: 4.5, 5.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753540#comment-13753540 ] Adrien Grand commented on LUCENE-5192: -- Wow, good catch! FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query
[ https://issues.apache.org/jira/browse/SOLR-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HeXin updated SOLR-5203: Description: In some case, we want the value of mm to select BooleanClause as Occur.MUST can according to the weight of query. Only if the weight larger than the threshold, it can be selected as Occur.MUST. The threshold can be configurable, equaling the minimum integer by default. Any comments is welcomed. was:In some case, we want the value of mm to select BooleanClause as Occur.MUST can according to the weight of query. Only if the weight larger than the threshold, it can be selected as Occur.MUST. The threshold can be configurable, equaling the minimum integer by default. Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query -- Key: SOLR-5203 URL: https://issues.apache.org/jira/browse/SOLR-5203 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.4 Reporter: HeXin Priority: Minor Fix For: 4.5, 5.0 In some case, we want the value of mm to select BooleanClause as Occur.MUST can according to the weight of query. Only if the weight larger than the threshold, it can be selected as Occur.MUST. The threshold can be configurable, equaling the minimum integer by default. Any comments is welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query
[ https://issues.apache.org/jira/browse/SOLR-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HeXin updated SOLR-5203: Description: In some case, we want the value of mm to select BooleanClause as Occur.MUST can according to the weight of query. Only if the weight larger than the threshold, it can be selected as Occur.MUST. The threshold can be configurable, equaling the minimum integer by default. Issue Type: Improvement (was: Bug) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query -- Key: SOLR-5203 URL: https://issues.apache.org/jira/browse/SOLR-5203 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.4 Reporter: HeXin Priority: Minor Fix For: 4.5, 5.0 In some case, we want the value of mm to select BooleanClause as Occur.MUST can according to the weight of query. Only if the weight larger than the threshold, it can be selected as Occur.MUST. The threshold can be configurable, equaling the minimum integer by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753546#comment-13753546 ] ASF subversion and git services commented on LUCENE-5192: - Commit 1518616 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1518616 ] LUCENE-5192: FieldInfos.Builder failed to catch adding field with different DV type under some circumstances FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5192. Resolution: Fixed Committed to trunk and 4x. On 4x I had to also fix DocFieldProcessor to call FieldInfos.addOrUpdate even when the field has been encountered. That's because the logic has changed in trunk and now DV fields are processed as stored fields, therefore FIS.addOrUpdate is called for both the posting and NDV, but in 4x it's not, and only the FI was updated in case you added same field with two types (and FIS didn't know about it at all!). FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753557#comment-13753557 ] Shai Erera commented on LUCENE-5189: Regarding problem 1, I hardwired the test to use Lucene45Codec for now so that I'm not blocked. I thought about Codec.serlize/attributes and now I realize it's not a good idea since those attributes must be recorded per-segment, yet the Codec is single-instance for all segments. We can however record these in SegmentInfo.attributes(). The documentation suggests this is where the Codec should record stuff per-segment. Would it work if PerFieldDVF recorded the per-field-format in SegWriteStage.si.attributes() and read them later, instead of FieldInfo.attributes? Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5193) Add jar-src to build.xml
Shai Erera created LUCENE-5193: -- Summary: Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5193) Add jar-src to build.xml
[ https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5193: --- Attachment: LUCENE-5193.patch Simple patch for Lucene modules only, since they already support jar-src. Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor Attachments: LUCENE-5193.patch I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml
[ https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753579#comment-13753579 ] Shai Erera commented on LUCENE-5193: If there are no objections, I'll commit it later today. Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor Attachments: LUCENE-5193.patch I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5193) Add jar-src to build.xml
[ https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5193: --- Attachment: LUCENE-5193.patch Previous patch did not jar-src core and test-framework. Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor Attachments: LUCENE-5193.patch, LUCENE-5193.patch I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753586#comment-13753586 ] Uwe Schindler commented on LUCENE-5189: --- Hi, I had an idea yesterday when thinking about this. Currently (like for deletes) we can update DocValues based on an ID term (by docid is not easily possible with IndexWriter). As the ID term can be anything, you could also use some (group) key that updates lots of documents (like you can delete all documents with a specific term). The current code updates the given field for all those documents to a fixed value. My two ideas are: - we could also support update by query (means like for deletes you provide a query that selects the documents to update) - we could make modifications possible: Instead of giving a value that is set for all selected documents, we could provide a callback interface that is used to modify the current docvalue (numeric or String) of the document to update and returns a changed value. This would be a one-method interface, so it could be used as closure in Java 8, like {{writer.updateDocValues(term, value - value+1);}} (in Java 6/7 this would be {{writer.updateDocValues(term, new NumericDocValuesUpdater() \{ public long update(long value) \{ return value+1; \}\});}}). Servers like Solr or ElasticSearch could implement this interface/closure using e.g. javascript, so one could execute a docvalues update and pass a javascript function applied to every value. We just have to think about concurency: What happens if 2 threads are updating the same value at the same time - maybe this is already handled by the BufferedDeletesQueue!? I just wanted to write this down in this issue, so we could think about allowing to implement this. Of course the current patch is more important to get the whole game running! The updateable by term/query is just one thing which is often requested by users. The typical example is a webapp where you can vote for a document. In that case one would execute the closure {{value - value+1}}. If we implement this so low level, the whole concurreny should be easier than how it is currently impelemented e.g. in Solr or ES. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Test failures the other day
Just in case this ever helps track this down. The other day I had a situation in which I could NOT run a successful test end-to-end (while trying to proof SOLR-4817). Usually one of the distrib tests would fail. Not always the same one. And executing with the seed wouldn't fail. It was only trying to run the full suite. Rebooted my machine and all was well, no failures at all. So how the my environment is getting whacked such that running the full test suite fails is a mystery... FWIW, Erick
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753600#comment-13753600 ] Shai Erera commented on LUCENE-5189: I definitely want to add update by query, but in a separate issue. And the callback idea is interesting. This callback would need to also get the docid I guess (it's missing in your API example)? Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5204) Queries with shards.tolerant=true and stats=true or spellcheck=on do not work
Anca Kopetz created SOLR-5204: - Summary: Queries with shards.tolerant=true and stats=true or spellcheck=on do not work Key: SOLR-5204 URL: https://issues.apache.org/jira/browse/SOLR-5204 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Anca Kopetz In a SolrCloud environment with 2 shards, if one server is down : * when we execute queries with shards.tolerant=truestats=true, a NullPointerException is thrown {code} java.lang.NullPointerException at org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:105) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:722) {code} * when we execute queries with shards.tolerant=truespellcheck=on, a NullPointerException is thrown {code} 2013-08-26 13:51:42,347 [http-8080-8] ERROR org.apache.solr.servlet.SolrDispatchFilter:log:119 - null:java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.finishStage(SpellCheckComponent.java:323) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Bug or backcompat example: Solr example/multicore/solr.xml in legacy format?
+1 for nuking multi core example. And schema less should become the new default too, nuking yet another set of parallel configs! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 28. aug. 2013 kl. 16:47 skrev Mark Miller markrmil...@gmail.com: I have an old JIRA where I started working on this, but I cannot find it. There has been no need for the multi core example for years now. I did a bunch of work taking it out at one point, but I'm sure that work is old enough to be useless now. Never go around to committing it. A few tests tie into those configs and I think there was some other flotsam and jettsom to clean up. - Mark On Aug 27, 2013, at 6:10 PM, Erick Erickson erickerick...@gmail.com wrote: bq: I think we should just get rid of it entirely +1, especially since we're going to core discovery, the collections API, etc. FWIW, Erick On Tue, Aug 27, 2013 at 3:42 PM, Shawn Heisey s...@elyograg.org wrote: On 8/27/2013 11:24 AM, Jack Krupansky wrote: I just happened to notice that the solr.xml file in the Solr example/multicore in branch_4x (and 4.4 as well) is still in the old legacy format (with cores/core). Is that merely an oversight or intentional for demonstrating backwards compatibility? The example/multicore directory seems to generally very out of date. The schema uses an ancient version, and doesn't have any good examples of how to use analyzers effectively. I'm fairly sure that all the examples use solr.xml and are therefore inherently multicore. Unless we plan to thoroughly update the multicore example so it's as modern as the main example, I think we should just get rid of it entirely. If we need an example that uses legacy config methods, I think we should make a new subdirectory. It should come with an extensive README and the solrconfig/schema should be more heavily commented than the standard example. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753622#comment-13753622 ] Uwe Schindler commented on LUCENE-5189: --- bq. This callback would need to also get the docid I guess (it's missing in your API example)? Of course we could add this. Java 8 would also support this cool syntax, something like: {{writer.updateDocValues(term, (docid, value) - value+1);}} The Java 8 example here was just syntactic sugar: For all this its only important that it is an {{interface}} with only one method that gets as many parameters as needed and returns one value. We automatically get the cool java 8 syntax for users, if we design the callback interface to these guidelines. One common example is the Comparator interface in Java. Every ComparatorT can be written in this cool syntax :-) Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
[ https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753720#comment-13753720 ] Robert Muir commented on LUCENE-5191: - {quote} As we are not working in unquoted attributes {quote} You cannot make this determination. If you want to copy this method and put a less secure version in SimpleHTMLEncoder, thats cool with me. But don't make PostingsHighlighter less secure: -1 to that. SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP -- Key: LUCENE-5191 URL: https://issues.apache.org/jira/browse/LUCENE-5191 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5191.patch The highlighter provides a function to escape HTML, which does to much. To create valid HTML only , , , must be escaped, everything else can kept unescaped. The escaper unfortunately does also additionally escape everything 127, which is unneeded if your web site has the correct encoding. It also produces huge amounts of HTML entities if used with eastern languages. This would not be a bugf if the escaping would be correct, but it isn't, it escapes like that: {{result.append(\#).append((int)ch).append(;);}} So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret: U+10400 (deseret capital letter long i) would be escaped as {{\#55297;\#56320;}} and not as {{\#66560;}}. So we should remove the stupid encoding of chars 127 which is simply useless :-) See also: https://github.com/elasticsearch/elasticsearch/issues/3587 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elran Dvir updated SOLR-5084: - Attachment: Solr-5084.trunk.patch new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753736#comment-13753736 ] Elran Dvir commented on SOLR-5084: -- Hi all, I attached a new patch. The patch is based on trunk. It contains changes regarding the issues Robert mentioned (Thanks Robert): 1. fixed the bug where string inputs weren't mapped into their numeric values in ValueSourceScorer.getRangeScorer and getRangeQuery 2. removed analysis chain. In the next following days, I will attach fixes for the remaining issues: 1.Verify value strictness on startup (numeric values start at 0, increment by 1). 2.Throwing exception when indexed value is not in the configuration (either number or string). Thank you all. new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
[ https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753774#comment-13753774 ] Uwe Schindler commented on LUCENE-5191: --- I did not want to modify yours although I disagree. I will commit the current patch and remove the useless extra encoding. SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP -- Key: LUCENE-5191 URL: https://issues.apache.org/jira/browse/LUCENE-5191 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5191.patch The highlighter provides a function to escape HTML, which does to much. To create valid HTML only , , , must be escaped, everything else can kept unescaped. The escaper unfortunately does also additionally escape everything 127, which is unneeded if your web site has the correct encoding. It also produces huge amounts of HTML entities if used with eastern languages. This would not be a bugf if the escaping would be correct, but it isn't, it escapes like that: {{result.append(\#).append((int)ch).append(;);}} So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret: U+10400 (deseret capital letter long i) would be escaped as {{\#55297;\#56320;}} and not as {{\#66560;}}. So we should remove the stupid encoding of chars 127 which is simply useless :-) See also: https://github.com/elasticsearch/elasticsearch/issues/3587 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5200) Add REST support for reading and modifying Solr configuration
[ https://issues.apache.org/jira/browse/SOLR-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753781#comment-13753781 ] Michael Della Bitta commented on SOLR-5200: --- We've wanted the ability to tune commit properties for bulk indexing, and then switch to more incremental indexing-friendly setup on the fly, for a while. +1. Add REST support for reading and modifying Solr configuration - Key: SOLR-5200 URL: https://issues.apache.org/jira/browse/SOLR-5200 Project: Solr Issue Type: New Feature Reporter: Steve Rowe Assignee: Steve Rowe There should be a REST API to allow full read access to, and write access to some elements of, Solr's per-core and per-node configuration not already covered by the Schema REST API: {{solrconfig.xml}}/{{core.properties}}/{{solrcore.properties}} and {{solr.xml}}/{{solr.properties}} (SOLR-4718 discusses addition of {{solr.properties}}). Use cases for runtime configuration modification include scripted setup, troubleshooting, and tuning. Tentative rules-of-thumb about configuration items that should not be modifiable at runtime: # Startup-only items, e.g. where to start core discovery # Items that are deprecated in 4.X and will be removed in 5.0 # Items that if modified should be followed by a full re-index Some issues to consider: Persistence: How (and even whether) to handle persistence for configuration modifications via REST API is not clear - e.g. persisting the entire config file or having one or more sidecar config files that get persisted. The extent of what should be modifiable will likely affect how persistence is implemented. For example, if the only {{solrconfig.xml}} modifiable items turn out to be plugin configurations, an alternative to full-{{solrconfig.xml}} persistence could be individual plugin registration of runtime config modifiable items, along with per-plugin sidecar config persistence. Live reload: Most (if not all) per-core configuration modifications will require core reload, though it will be a live reload, so some things won't be modifiable, e.g. {{dataDir}} and {{IndexWriter}} related settings in {{indexConfig}} - see SOLR-3592. (Should a full reload be supported to handle changes in these places?) Interpolation aka property substitution: I think it would be useful on read access to optionally return raw values in addition to the interpolated values, e.g. {{solr.xml}} {{hostPort}} raw value {{$\{jetty.port:8983}}} vs. interpolated value {{8983}}. Modification requests will accept raw values - property interpolation will be applied. At present interpolation is done once, at parsing time, but if property value modification is supported via the REST API, an alternative could be to delay interpolation until values are requested; in this way, property value modification would not trigger re-parsing the affected configuration source. Response format: Similarly to the schema REST API, results could be returned in XML, JSON, or any other response writer's output format. Transient cores: How should non-loaded transient cores be handled? Simplest thing would be to load the transient core before handling the request, just like other requests. Below I provide an exhaustive list of configuration items in the files in question and indicate which ones I think could be modifiable at runtime. I don't mean to imply that these must all be made modifiable, or for those that are made modifiable, that they must be made so at once - a piecemeal approach will very likely be more appropriate. h2. {{solrconfig.xml}} Note that XIncludes and includes via Document Entities won't survive a modification request (assuming persistence is via overwriting the original file). ||XPath under {{/config/}}||Should be modifiable via REST API?||Rationale||Description|| |{{luceneMatchVersion}}|No|Modifying this should be followed by a full re-index|Controls what version of Lucene various components of Solr adhere to| |{{lib}}|Yes|Required for adding plugins at runtime|Contained jars available via classloader for {{solrconfig.xml}} and {{schema.xml}}| |{{dataDir}}|No|Not supported by live RELOAD|Holds all index data| |{{directoryFactory}}|No|Not supported by live RELOAD|index directory factory| |{{codecFactory}}|No|Modifying this should be followed by a full re-index|index codec factory, per-field SchemaCodecFactory by default| |{{schemaFactory}}|Partial|Although the class shouldn't be modifiable, it should be possible to modify an already Managed schema's mutability|Managed or Classic (non-mutable) schema factory| |{{indexConfig}}|No|{{IndexWriter}}-related settings not supported by live RELOAD|low-level indexing behavior|
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753786#comment-13753786 ] Robert Muir commented on SOLR-5084: --- {quote} As long as the config forces them to be explicit about the values (and has error checking at startup that the values start a 0 and are monotomicly increasing ints) then anyone who wants to insert values into their config is going to have to pause and think about the fact that there is a concrete int associated with the existing values – and is more likely to realize that changing those ints has consequences. {quote} If the values are implicitly 0, 1, 2, ... n, then why force the user to write that out? If you are worried about idiot users, add a comment around the field type to the example: {code} !-- note: you cannot change the order/existing values without reindexing. but you can always add new values to the end. -- {code} Otherwise it just makes the configuration overly verbose to have them write 0..n themselves. new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled
[ https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753788#comment-13753788 ] Eric Pugh commented on SOLR-3580: - I was about to submit a patch for the fact that 'NOT' and 'not' don't work the same, when I stumbled across this issue. My patch file looks rather remarkably like [~mdodswo...@salesforce.com] first patch as well! One thing is that the wiki needs an update: http://wiki.apache.org/solr/ExtendedDisMax#lowercaseOperators I can put that in, referring to the patch files as option if you need not:NOT support. I would like to see something committed, as my customer has the same need for NOT to work. Their users are sophisticated, know the syntax etc. Backup plan is to do something custom. In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled Key: SOLR-3580 URL: https://issues.apache.org/jira/browse/SOLR-3580 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0-ALPHA Reporter: Michael Dodsworth Priority: Minor Attachments: SOLR-3580.patch, SOLR-3580-proposal.patch When lowercase operator support is enabled (for edismax), the lowercase 'not' operator is being wrongly treated as a literal term (and not as an operator). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5204) Queries with shards.tolerant=true and stats=true or spellcheck=on do not work
[ https://issues.apache.org/jira/browse/SOLR-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753817#comment-13753817 ] Shalin Shekhar Mangar commented on SOLR-5204: - Yes, shards.tolerant is supported only in facet, query and grouping only. Stats or spellcheck do not support this param yet. Queries with shards.tolerant=true and stats=true or spellcheck=on do not work - Key: SOLR-5204 URL: https://issues.apache.org/jira/browse/SOLR-5204 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Anca Kopetz In a SolrCloud environment with 2 shards, if one server is down : * when we execute queries with shards.tolerant=truestats=true, a NullPointerException is thrown {code} java.lang.NullPointerException at org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:105) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:722) {code} * when we execute queries with shards.tolerant=truespellcheck=on, a NullPointerException is thrown {code} 2013-08-26 13:51:42,347 [http-8080-8] ERROR org.apache.solr.servlet.SolrDispatchFilter:log:119 - null:java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.finishStage(SpellCheckComponent.java:323) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine
[ https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753823#comment-13753823 ] Hoss Man commented on SOLR-5201: [~teofili]: I'm not sure what kind of state the AnalysisEngine maintains that might be reused/pollute subsequent requests, but there are two things you could do to cache an AnalysisEngine for various durations depending on what you're looking for... * you could create cache the engine in the UIAMAUpdateRequestProcessor object and then it will be re-used for each document included in a single update request * you could create cache the engine in the UIAMAUpdateRequestProcessorFactory, passing it to each UIAMAUpdateRequestProcessor it creates, and then it will be re-used for every document included in every request. UIMAUpdateRequestProcessor should reuse the AnalysisEngine -- Key: SOLR-5201 URL: https://issues.apache.org/jira/browse/SOLR-5201 Project: Solr Issue Type: Improvement Components: contrib - UIMA Affects Versions: 4.4 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 4.5, 5.0 As reported in http://markmail.org/thread/2psiyl4ukaejl4fx UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request which is bad for performance therefore it'd be nice if such AEs could be reused whenever that's possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41
Michael McCandless created LUCENE-5194: -- Summary: TestBackwardsCompatibility should not test Pulsing41 Key: LUCENE-5194 URL: https://issues.apache.org/jira/browse/LUCENE-5194 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 5.0, 4.5 Spinoff from LUCENE-3069, where Billy discovered this ... For some reason it's currently testing a Pulsing41 index (at least index.41.cfs.zip), but we do not guarantee back compat for PulsingPF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753840#comment-13753840 ] ASF subversion and git services commented on SOLR-4249: --- Commit 1518717 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1518717 ] SOLR-4249: UniqFieldsUpdateProcessorFactory now extends FieldMutatingUpdateProcessorFactory and supports all of it's selector options change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory -- Key: SOLR-4249 URL: https://issues.apache.org/jira/browse/SOLR-4249 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Priority: Minor Attachments: SOLR-4249.patch UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code could be eliminated from that class, and the factory could be made more configurable by supporting all of the field matching logic in FieldMutatingUpdateProcessorFactory, not just a list of field names. (the only new code that would be needed is handling the legacy config case currently supported by UniqFieldsUpdateProcessorFactory) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41
[ https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753854#comment-13753854 ] ASF subversion and git services commented on LUCENE-5194: - Commit 1518720 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1518720 ] LUCENE-5194: fix 41 test indices to not use PulsingPostingsFormat TestBackwardsCompatibility should not test Pulsing41 Key: LUCENE-5194 URL: https://issues.apache.org/jira/browse/LUCENE-5194 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 5.0, 4.5 Spinoff from LUCENE-3069, where Billy discovered this ... For some reason it's currently testing a Pulsing41 index (at least index.41.cfs.zip), but we do not guarantee back compat for PulsingPF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41
[ https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753857#comment-13753857 ] ASF subversion and git services commented on LUCENE-5194: - Commit 1518721 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1518721 ] LUCENE-5194: fix 41 test indices to not use PulsingPostingsFormat TestBackwardsCompatibility should not test Pulsing41 Key: LUCENE-5194 URL: https://issues.apache.org/jira/browse/LUCENE-5194 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 5.0, 4.5 Spinoff from LUCENE-3069, where Billy discovered this ... For some reason it's currently testing a Pulsing41 index (at least index.41.cfs.zip), but we do not guarantee back compat for PulsingPF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41
[ https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-5194. Resolution: Fixed TestBackwardsCompatibility should not test Pulsing41 Key: LUCENE-5194 URL: https://issues.apache.org/jira/browse/LUCENE-5194 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 5.0, 4.5 Spinoff from LUCENE-3069, where Billy discovered this ... For some reason it's currently testing a Pulsing41 index (at least index.41.cfs.zip), but we do not guarantee back compat for PulsingPF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5192: --- Attachment: LUCENE-5192.patch Maybe something like this? (for trunk) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch, LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753907#comment-13753907 ] ASF subversion and git services commented on SOLR-4249: --- Commit 1518746 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1518746 ] SOLR-4249: UniqFieldsUpdateProcessorFactory now extends FieldMutatingUpdateProcessorFactory and supports all of it's selector options (merge r1518717) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory -- Key: SOLR-4249 URL: https://issues.apache.org/jira/browse/SOLR-4249 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Priority: Minor Attachments: SOLR-4249.patch UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code could be eliminated from that class, and the factory could be made more configurable by supporting all of the field matching logic in FieldMutatingUpdateProcessorFactory, not just a list of field names. (the only new code that would be needed is handling the legacy config case currently supported by UniqFieldsUpdateProcessorFactory) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-5192: Hmm, that fix wasn't thread safe (the map inside FieldInfos.FieldNumbers is an ordinary HashMap). FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch, LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml
[ https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753958#comment-13753958 ] Steve Rowe commented on LUCENE-5193: +1 I was worried that lucene-codecs src jar wouldn't be built -- in my mind it's in the same category as core and test-framework: an internal module -- but it's pulled in by the {{modules-crawl}} macro, which runs over all sub-directories with {{build.xml}} except {{build/}}, {{core/}}, {{test-framework/}}, and {{tools/}}. I'll make another patch for Solr and the top-level {{build.xml}}. Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor Attachments: LUCENE-5193.patch, LUCENE-5193.patch I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754013#comment-13754013 ] Shai Erera commented on LUCENE-5192: Ahh, good catch. I didn't notice FieldNumbers is sync'd. But, I think this _if_ is wrong/problematic: {noformat} -if (docValues != null) { +if (!fi.hasDocValues() docValues != null) { + // First time we are seeing doc values type for + // this field: {noformat} With this fix, if somebody tries to add a field 'f' as NUMERIC and then BINARY, we won't catch it? This is caught today by FI.setDVType, but with this fix, that won't be called? Do I miss something? Perhaps you can add an 'else if' and compare the given type and fi.getDVType(), but that's just duplicating code from FI.setDVType. FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch, LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754021#comment-13754021 ] Robert Muir commented on LUCENE-5192: - I don't care about code duplication here. We should not invoke the global synced fieldnumbers shit for every element, only when the setting actually changes FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch, LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754040#comment-13754040 ] Shai Erera commented on LUCENE-5192: In that case it should change to: {code:java} if (docValues != null) { if (!fi.hasDocValues()) { // First time we are seeing doc values type for // this field: fi.setDocValuesType(docValues); // must also update docValuesType map so it's // aware of this field's DocValueType globalFieldNumbers.setDocValuesType(fi.number, name, docValues); } else if (docValues != fi.getDocValuesType()) { // THROW EX } } {code} Or, we do this: {code:java} if (docValues != null) { // only pay the synchronization cost if fi does not already have a DVType boolean updateGlobal = !fi.hasDocValues(); fi.setDocValuesType(docValues); // this will also perform the consistency check. if (updateGlobal) { globalFieldNumbers.set(...); } } {code} Since FieldInfo.setDVType is also called from DocFieldsProcessor, I prefer to try and keep the consistency check in one place. FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch, LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754069#comment-13754069 ] Michael McCandless commented on LUCENE-5192: bq. With this fix, if somebody tries to add a field 'f' as NUMERIC and then BINARY, we won't catch it? Actually, we still catch it, because in DocValuesProcessor.addField we always call fieldInfo.setDocValuesType(), so the exc will be thrown from there. Still, I think addOrUpdate *should* fold in the docValues type ... so I'll just go with Shai's 2nd suggestion ... FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch, LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754089#comment-13754089 ] Erick Erickson commented on SOLR-4478: -- I got to thinking about this and trying to take it out of mothballs and I'm starting to think it's a terrible idea for 4.x and should be postponed or abandoned unless and until we do something like what has been discussed elsewhere; having there be one source of truth (ZooKeeper has been discussed for instance). So I'll list out the issues I've thought about and if there are straightforward answers to them I'll be happy to reconsider. Each issue is probably technically do-able, but the sum (and ones I haven't seen yet) totally scare me. 1 Traditional master/slave architectures. Let's say we change the schema (it'd have to be on the master?). How to get that to the slaves? Currently the confFiles directive has an explicit test and will not copy a directory. I'm not convinced it'd even work with relative paths and listing _every_ file in the configset dir would be kludgy at best. And I think the confFiles directive doesn't work outside the conf directory for the core it's replicating anyway. I suppose the user could copy the configset directory to all the nodes in the farm, but 2 The new REST API for modifying the schema. In non-SolrCloud mode, how does that work? Is it only allowed on the master (assuming we can solve 1)? How to enforce? 3 Sharing the solrConfig object is also fraught with issues as discussed above. There's already the share schema option, so at least it's possible to have one shared schema. 4 How to get any changes reloaded in a master/slave environment for all the affected cores on all the machines? You'd need some kind of manual process of going to each one and issuing a new command ReloadAllCores or build in some kind of notification system. Or we'd need to require the user to keep a list of all the nodes and all the cores and script reloading them all. Nobody should be re-inventing ZooKeeper. 5 How to get any changes reloaded in even the non master/slave environment for all the affected cores? A new command? Periodic polling? Check every query/update request? 6 Sticky wickets I haven't thought of yet, I'm afraid, very afraid... Each of these is solvable, but considering the effort involved it doesn't seem like it's worth pursuing right now, at least my interest is disappearing. And wrapped around this is that SolrCloud already handles most of the things I'm worried about, especially getting changes propagated to all the right places in the cluster. SolrCloud already has a way to reload all the nodes that take part in a collection. SolrCloud already has the notifications of changes to the config set built in (at least I think, if not it will). My feeling at this point is that supporting this well would turn into a huge amount of work _that would then be thrown away_ if we go to a one source of truth model in Solr5 (or even 6). And that actually _using_ the capability would be fragile and complex. So unless I can be convinced otherwise, I'm going to assign this back to nobody and forget about it. Allow cores to specify a named config set - Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for
[jira] [Commented] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754094#comment-13754094 ] ASF subversion and git services commented on SOLR-4249: --- Commit 1518836 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1518836 ] SOLR-4249: remove legacy UniqFieldsUpdateProcessorFactory init param syntax from trunk for 5.0 change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory -- Key: SOLR-4249 URL: https://issues.apache.org/jira/browse/SOLR-4249 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Priority: Minor Attachments: SOLR-4249.patch UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code could be eliminated from that class, and the factory could be made more configurable by supporting all of the field matching logic in FieldMutatingUpdateProcessorFactory, not just a list of field names. (the only new code that would be needed is handling the legacy config case currently supported by UniqFieldsUpdateProcessorFactory) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-4249. Resolution: Fixed change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory -- Key: SOLR-4249 URL: https://issues.apache.org/jira/browse/SOLR-4249 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-4249.patch UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code could be eliminated from that class, and the factory could be made more configurable by supporting all of the field matching logic in FieldMutatingUpdateProcessorFactory, not just a list of field names. (the only new code that would be needed is handling the legacy config case currently supported by UniqFieldsUpdateProcessorFactory) --- For users of 4.x starting with 4.5, the existing init param syntax will still be supported, but a warning will be logged recommending they switch to using {{arr name=fieldName.../arr}} instead of {{lst name=fields../lst}}. Starting with 5.0, the fields option won't be recognized at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4249: --- Description: UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code could be eliminated from that class, and the factory could be made more configurable by supporting all of the field matching logic in FieldMutatingUpdateProcessorFactory, not just a list of field names. (the only new code that would be needed is handling the legacy config case currently supported by UniqFieldsUpdateProcessorFactory) --- For users of 4.x starting with 4.5, the existing init param syntax will still be supported, but a warning will be logged recommending they switch to using {{arr name=fieldName.../arr}} instead of {{lst name=fields../lst}}. Starting with 5.0, the fields option won't be recognized at all. was: UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code could be eliminated from that class, and the factory could be made more configurable by supporting all of the field matching logic in FieldMutatingUpdateProcessorFactory, not just a list of field names. (the only new code that would be needed is handling the legacy config case currently supported by UniqFieldsUpdateProcessorFactory) Fix Version/s: 5.0 4.5 change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory -- Key: SOLR-4249 URL: https://issues.apache.org/jira/browse/SOLR-4249 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Hoss Man Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-4249.patch UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code could be eliminated from that class, and the factory could be made more configurable by supporting all of the field matching logic in FieldMutatingUpdateProcessorFactory, not just a list of field names. (the only new code that would be needed is handling the legacy config case currently supported by UniqFieldsUpdateProcessorFactory) --- For users of 4.x starting with 4.5, the existing init param syntax will still be supported, but a warning will be logged recommending they switch to using {{arr name=fieldName.../arr}} instead of {{lst name=fields../lst}}. Starting with 5.0, the fields option won't be recognized at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
[ https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5191: -- Attachment: LUCENE-5191.patch Attached is a new patch also escaping the single ' and the forwards slash (although the latter is not really required, but I did this to make Robert happy). I refuse to encode the Latin1 chars. I will commit this in a minute. SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP -- Key: LUCENE-5191 URL: https://issues.apache.org/jira/browse/LUCENE-5191 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5191.patch, LUCENE-5191.patch The highlighter provides a function to escape HTML, which does to much. To create valid HTML only , , , must be escaped, everything else can kept unescaped. The escaper unfortunately does also additionally escape everything 127, which is unneeded if your web site has the correct encoding. It also produces huge amounts of HTML entities if used with eastern languages. This would not be a bugf if the escaping would be correct, but it isn't, it escapes like that: {{result.append(\#).append((int)ch).append(;);}} So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret: U+10400 (deseret capital letter long i) would be escaped as {{\#55297;\#56320;}} and not as {{\#66560;}}. So we should remove the stupid encoding of chars 127 which is simply useless :-) See also: https://github.com/elasticsearch/elasticsearch/issues/3587 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
[ https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754110#comment-13754110 ] ASF subversion and git services commented on LUCENE-5191: - Commit 1518839 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1518839 ] LUCENE-5191: Fix Unicode corrumption in HTML escaping of Standard Highlighter and Fast Vector Highlighter. SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP -- Key: LUCENE-5191 URL: https://issues.apache.org/jira/browse/LUCENE-5191 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5191.patch, LUCENE-5191.patch The highlighter provides a function to escape HTML, which does to much. To create valid HTML only , , , must be escaped, everything else can kept unescaped. The escaper unfortunately does also additionally escape everything 127, which is unneeded if your web site has the correct encoding. It also produces huge amounts of HTML entities if used with eastern languages. This would not be a bugf if the escaping would be correct, but it isn't, it escapes like that: {{result.append(\#).append((int)ch).append(;);}} So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret: U+10400 (deseret capital letter long i) would be escaped as {{\#55297;\#56320;}} and not as {{\#66560;}}. So we should remove the stupid encoding of chars 127 which is simply useless :-) See also: https://github.com/elasticsearch/elasticsearch/issues/3587 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5193) Add jar-src to build.xml
[ https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-5193: --- Attachment: LUCENE-5193.patch This patch incorporates Shai's Lucene patch, and adds Solr and top-level {{jar-src}} targets. I also took the opportunity to fix up Solr's {{jar-src}} specialization (needed for Solr-specific manifest entries) to be like Lucene's: the {{$\{build.dir}}} is created, and the module's {{src/resources/**}} are included (only solr-uima and solr-langid have these at this point). I think it's ready to go - if you like, Shai, I can commit. Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor Attachments: LUCENE-5193.patch, LUCENE-5193.patch, LUCENE-5193.patch I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
[ https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754112#comment-13754112 ] ASF subversion and git services commented on LUCENE-5191: - Commit 1518840 from [~thetaphi] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1518840 ] Merged revision(s) 1518839 from lucene/dev/trunk: LUCENE-5191: Fix Unicode corrumption in HTML escaping of Standard Highlighter and Fast Vector Highlighter. SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP -- Key: LUCENE-5191 URL: https://issues.apache.org/jira/browse/LUCENE-5191 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5191.patch, LUCENE-5191.patch The highlighter provides a function to escape HTML, which does to much. To create valid HTML only , , , must be escaped, everything else can kept unescaped. The escaper unfortunately does also additionally escape everything 127, which is unneeded if your web site has the correct encoding. It also produces huge amounts of HTML entities if used with eastern languages. This would not be a bugf if the escaping would be correct, but it isn't, it escapes like that: {{result.append(\#).append((int)ch).append(;);}} So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret: U+10400 (deseret capital letter long i) would be escaped as {{\#55297;\#56320;}} and not as {{\#66560;}}. So we should remove the stupid encoding of chars 127 which is simply useless :-) See also: https://github.com/elasticsearch/elasticsearch/issues/3587 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml
[ https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754118#comment-13754118 ] Uwe Schindler commented on LUCENE-5193: --- Thanks, looks good! Especially as the resources are now in the source JAR, which is done by the maven archiver plugin, too. Thanks also for adding the info text on top-level build, so {{ant}} prints it in the usage help. Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor Attachments: LUCENE-5193.patch, LUCENE-5193.patch, LUCENE-5193.patch I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
[ https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-5191. --- Resolution: Fixed SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP -- Key: LUCENE-5191 URL: https://issues.apache.org/jira/browse/LUCENE-5191 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5191.patch, LUCENE-5191.patch The highlighter provides a function to escape HTML, which does to much. To create valid HTML only , , , must be escaped, everything else can kept unescaped. The escaper unfortunately does also additionally escape everything 127, which is unneeded if your web site has the correct encoding. It also produces huge amounts of HTML entities if used with eastern languages. This would not be a bugf if the escaping would be correct, but it isn't, it escapes like that: {{result.append(\#).append((int)ch).append(;);}} So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret: U+10400 (deseret capital letter long i) would be escaped as {{\#55297;\#56320;}} and not as {{\#66560;}}. So we should remove the stupid encoding of chars 127 which is simply useless :-) See also: https://github.com/elasticsearch/elasticsearch/issues/3587 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5194) Annecdotal reports of what smells like thread safety issues with concurrent partial updates?
[ https://issues.apache.org/jira/browse/SOLR-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754144#comment-13754144 ] Yonik Seeley commented on SOLR-5194: After reviewing all the issues, I don't think this is due to any thread safety issues, but due to partial support for BigDecimal. Annecdotal reports of what smells like thread safety issues with concurrent partial updates? Key: SOLR-5194 URL: https://issues.apache.org/jira/browse/SOLR-5194 Project: Solr Issue Type: Bug Reporter: Hoss Man In SOLR-4021 two users reported seeing errors similar to the crux of that issue (ie: JavaBinCodec errors) only when doing bulk document adds while concurrently using partial updates. this smells like a thread safety issue arround the transaction log -- opening a new issue in the hopes that thye can post specific stack traces here since it seems to be a distinct problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4021) JavaBinCodec has poor default behavior for unrecognized classes of objects
[ https://issues.apache.org/jira/browse/SOLR-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754152#comment-13754152 ] Yonik Seeley commented on SOLR-4021: It looks like DIH can produce BigDecimal values, which historically did not have support in Solr, and currently only has partial support. Either DIH needs to be changed to avoid BigDecimal, or we need to add better BigDecimal support (at a minimum, the JavaBin format, and perhaps to atomic updates too). JavaBinCodec has poor default behavior for unrecognized classes of objects -- Key: SOLR-4021 URL: https://issues.apache.org/jira/browse/SOLR-4021 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.0 Reporter: Hoss Man It seems that JavaBinCodec has inconsistent serialize/deserialize behavior when dealing with objects of classes that it doesn't recognized. In particular, unrecnognized objects seem to be serialized with the full classname prepented to the toString() value, and then that resulting concatentated string is left as is during deserialization. as a concrete example: serializing deserializing a BigDecimal value results in a final value like java.math.BigDecimal:1848.66 even though for most users the simple toString() value would have worked as intended. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5198) Make default similarty configurable
[ https://issues.apache.org/jira/browse/SOLR-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754236#comment-13754236 ] Feihong Huang commented on SOLR-5198: - Make default similary configurable maybe make sense, Such as We can use BM25Similarity instead of TFIDFSimilarity just through modifying configure, rather than writing other custom schemasimilarityfactory. Make default similarty configurable --- Key: SOLR-5198 URL: https://issues.apache.org/jira/browse/SOLR-5198 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.4 Reporter: HeXin Priority: Minor Fix For: 4.5, 5.0 Though the code has supported for customizing scoring on a per-field basis in using similarity/ in a schema's fieldType and we can configure our custom similarity factory in schema, we can't configure the default similarty and it is hardcode in SchemaSimilarityFactory. If we want to use another similarity as default similarty instead of DefaultSimilarity provided by lucene, we must to write another similarity factory to do this. Therefore, it is necessary to make default similarty configurable. Any comments is welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5198) Make default similarty configurable
[ https://issues.apache.org/jira/browse/SOLR-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754271#comment-13754271 ] Shawn Heisey commented on SOLR-5198: I am using BM25 without any custom code. Here's the top of my schema.xml: {noformat} schema name=ncdat version=1.5 similarity class=solr.BM25SimilarityFactory/ {noformat} Make default similarty configurable --- Key: SOLR-5198 URL: https://issues.apache.org/jira/browse/SOLR-5198 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.4 Reporter: HeXin Priority: Minor Fix For: 4.5, 5.0 Though the code has supported for customizing scoring on a per-field basis in using similarity/ in a schema's fieldType and we can configure our custom similarity factory in schema, we can't configure the default similarty and it is hardcode in SchemaSimilarityFactory. If we want to use another similarity as default similarty instead of DefaultSimilarity provided by lucene, we must to write another similarity factory to do this. Therefore, it is necessary to make default similarty configurable. Any comments is welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5058) org.apache.solr.update.PeerSync Logging Warning Typo
[ https://issues.apache.org/jira/browse/SOLR-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-5058. Resolution: Fixed Fix Version/s: 5.0 4.5 Assignee: Hoss Man Thanks for reporting this Thomas org.apache.solr.update.PeerSync Logging Warning Typo Key: SOLR-5058 URL: https://issues.apache.org/jira/browse/SOLR-5058 Project: Solr Issue Type: Bug Affects Versions: 4.3 Reporter: Thomas Murphy Assignee: Hoss Man Priority: Trivial Labels: easyfix Fix For: 4.5, 5.0 Original Estimate: 5m Remaining Estimate: 5m Log entry appears on Solr Admin Logging interface: WARN PeerSyncno frame of reference to tell of we've missed updates There is a typo, this looks like it should read to tell if we've PeerSync expands to org.apache.solr.update.PeerSync -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5058) org.apache.solr.update.PeerSync Logging Warning Typo
[ https://issues.apache.org/jira/browse/SOLR-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754282#comment-13754282 ] ASF subversion and git services commented on SOLR-5058: --- Commit 1518874 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1518874 ] SOLR-5058: log msg typo (merge r1518872) org.apache.solr.update.PeerSync Logging Warning Typo Key: SOLR-5058 URL: https://issues.apache.org/jira/browse/SOLR-5058 Project: Solr Issue Type: Bug Affects Versions: 4.3 Reporter: Thomas Murphy Priority: Trivial Labels: easyfix Original Estimate: 5m Remaining Estimate: 5m Log entry appears on Solr Admin Logging interface: WARN PeerSyncno frame of reference to tell of we've missed updates There is a typo, this looks like it should read to tell if we've PeerSync expands to org.apache.solr.update.PeerSync -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5058) org.apache.solr.update.PeerSync Logging Warning Typo
[ https://issues.apache.org/jira/browse/SOLR-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754277#comment-13754277 ] ASF subversion and git services commented on SOLR-5058: --- Commit 1518872 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1518872 ] SOLR-5058: log msg typo org.apache.solr.update.PeerSync Logging Warning Typo Key: SOLR-5058 URL: https://issues.apache.org/jira/browse/SOLR-5058 Project: Solr Issue Type: Bug Affects Versions: 4.3 Reporter: Thomas Murphy Priority: Trivial Labels: easyfix Original Estimate: 5m Remaining Estimate: 5m Log entry appears on Solr Admin Logging interface: WARN PeerSyncno frame of reference to tell of we've missed updates There is a typo, this looks like it should read to tell if we've PeerSync expands to org.apache.solr.update.PeerSync -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41
[ https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754307#comment-13754307 ] Han Jiang commented on LUCENE-5194: --- Thanks Mike! TestBackwardsCompatibility should not test Pulsing41 Key: LUCENE-5194 URL: https://issues.apache.org/jira/browse/LUCENE-5194 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 5.0, 4.5 Spinoff from LUCENE-3069, where Billy discovered this ... For some reason it's currently testing a Pulsing41 index (at least index.41.cfs.zip), but we do not guarantee back compat for PulsingPF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml
[ https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754347#comment-13754347 ] Shai Erera commented on LUCENE-5193: Looks good Steve. Feel free to commit. I'm not sure I'll be able to today. Add jar-src to build.xml Key: LUCENE-5193 URL: https://issues.apache.org/jira/browse/LUCENE-5193 Project: Lucene - Core Issue Type: New Feature Components: general/build Reporter: Shai Erera Priority: Minor Attachments: LUCENE-5193.patch, LUCENE-5193.patch, LUCENE-5193.patch I think it's useful if we have a top-level jar-src which generates source jars for all modules. One can basically do that by iterating through the directories and calling 'ant jar-src' already, so this is just a convenient way to do it. Will attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query
[ https://issues.apache.org/jira/browse/SOLR-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754360#comment-13754360 ] yinyue commented on SOLR-5203: -- Good feature, it's useful when we have weighting terms. Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query -- Key: SOLR-5203 URL: https://issues.apache.org/jira/browse/SOLR-5203 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.4 Reporter: HeXin Priority: Minor Fix For: 4.5, 5.0 In some case, we want the value of mm to select BooleanClause as Occur.MUST can according to the weight of query. Only if the weight larger than the threshold, it can be selected as Occur.MUST. The threshold can be configurable, equaling the minimum integer by default. Any comments is welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5198) Make default similarty configurable
[ https://issues.apache.org/jira/browse/SOLR-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754372#comment-13754372 ] HeXin commented on SOLR-5198: - hi, Shawn, you are right. But i think you have written the class BM25SimilarityFactory at first and maybe its function just to provide BM25Similarity as default similarity. Maybe i have not describe the feature clearly. I just want the two scenarios below can be done just through modifying schema.xml. 1. If we want to use a different default similarity rather than TFIDFSimilarity. 2. If we want to do per-field support and make BM25Similarity as default similarity for the fields which not configure similarity. I think we can support it without any custom code. Make default similarty configurable --- Key: SOLR-5198 URL: https://issues.apache.org/jira/browse/SOLR-5198 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.4 Reporter: HeXin Priority: Minor Fix For: 4.5, 5.0 Though the code has supported for customizing scoring on a per-field basis in using similarity/ in a schema's fieldType and we can configure our custom similarity factory in schema, we can't configure the default similarty and it is hardcode in SchemaSimilarityFactory. If we want to use another similarity as default similarty instead of DefaultSimilarity provided by lucene, we must to write another similarity factory to do this. Therefore, it is necessary to make default similarty configurable. Any comments is welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances
[ https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754381#comment-13754381 ] Shai Erera commented on LUCENE-5192: Ahh that explains it. +1 to commit the synchronization fix! FieldInfos.Builder failed to catch adding field with different DV type under some circumstances --- Key: LUCENE-5192 URL: https://issues.apache.org/jira/browse/LUCENE-5192 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5192.patch, LUCENE-5192.patch I found it while working on LUCENE-5189. I'll attach a patch with a simple testcase which reproduces the problem and a fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org