[jira] [Updated] (LUCENE-9328) Sorting by DocValues while grouping is slower than old good FieldCache
[ https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-9328: - Issue Type: Improvement (was: Bug) Priority: Major (was: Minor) > Sorting by DocValues while grouping is slower than old good FieldCache > -- > > Key: LUCENE-9328 > URL: https://issues.apache.org/jira/browse/LUCENE-9328 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/grouping >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, > LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, > LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, > LUCENE-9328.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > That's why > https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9493) Remove obsolete dev-tools/{idea,netbeans,maven} folders
[ https://issues.apache.org/jira/browse/LUCENE-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190983#comment-17190983 ] David Smiley commented on LUCENE-9493: -- There are a couple nice things about the IntelliJ project that come to mind: * a code style! -- dev-tools/idea/.idea/codeStyleSettings.xml * an ASF copywright profile. -- dev-tools/idea/.idea/copyright/Apache_Software_Foundation.xml I think it would be helpful to provide the xml file for both of these in a simple "intellij" folder there. These can be imported into IntelliJ for an existing project manually. CC [~sarowe] I believe you created the IntelliJ config long ago, or were at least actively involved > Remove obsolete dev-tools/{idea,netbeans,maven} folders > --- > > Key: LUCENE-9493 > URL: https://issues.apache.org/jira/browse/LUCENE-9493 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > I don't think they're used or applicable anymore. Thoughts? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9461) Query hit highlighting components on top of matches API
[ https://issues.apache.org/jira/browse/LUCENE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190981#comment-17190981 ] David Smiley commented on LUCENE-9461: -- Maybe not as a sub-task, but would it make sense to modify the UnifiedHighlighter to use some of these components, thereby reducing redundancy? As I say this, I look at some of these new components and maybe not (yet)... but maybe I'll see it better once you get to the example task. > Query hit highlighting components on top of matches API > --- > > Key: LUCENE-9461 > URL: https://issues.apache.org/jira/browse/LUCENE-9461 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: master (9.0) > > > Highlighters. Eventually, you'll have to face them. > When a Lucene Query is ran over an index, it implies a list of documents that > "matched it" - literally a boolean indication of whether the document should > be included in the search result or not. In practice, many applications need > to convey to users not just the fact that a document matched the query but > also some sort of intuitive explanation of *why* this particular query > matched it. While in many cases the relationship is trivial (term > containment), in case of complex queries it may not be trivial at all (think > of a really short prefix query, a fuzzy term query or even a Boolean > disjunction with a high number of possibilities). > Historically, search engines used to "highlight" the source area of a > document that caused the "hit". If a document was too long, it was truncated > and only the area around the hit (or hits) was displayed (so called > "snippet"). > In my subjective opinion, in the Lucene API highlighters have played a > secondary role to queries and search. And once you're trying to build > something higher-level, highlighters are a crucial and necessary element of > the entire system. > My experience (and users feedback) from an implementation of a document > retrieval system where highlighting was involved was that it just didn't work > as expected. Here are the requirements of that system: > * the query parser uses default field expansion into multiple fields (there > is no single "sink" field), > * the highlights should match *exactly* what caused the hit; a search for > 'title:foo' must not highlight foo in any other field, > * the set of fields to be highlighted isn't really fixed - there are some > fields that should always be displayed - title, summary - and others that > should not be displayed unless they're part of the query (in which case the > highlight is important and should be shown to the user). > * highlights should be accurate for all sorts of queries: fuzzy, phrase, > prefix, Boolean, spans, etc., > * there can be more than one query at one time and they should highlight the > same content (with different colors). > Many highlighters are available in Lucene (vector highlighter, postings > highlighter, unified highlighter) but none of them quite fit the bill above. > Believe me - we have tried (hard). We ended up using unified highlighter but > with subclassing, customizations and all sorts of complex, low-level quirks. > My gut feeling at that point was that it should be the Query that somehow > *exposes* the information about how a given field content matched. Then I > looked at matches API and built a quick prototype retrieving "match regions" > on top of that. It works like magic. Here are the key insights: > * matches API returns exactly what a highlighter needs: for a given query it > iterates over fields and positions (including offsets, if they are available) > that caused a document to be included in the search result, > * when matches API cannot provide offsets, it provides elements from which > offsets can be computed: positions by re-analyzing the field's value, for > example. > * in extreme cases it may happen the matches API doesn't provide anything > useful (a field only indexed, with no stored field value, no positions, no > offsets) but I assume it is up to the application layer to know how to deal > with this then (or not deal with it at all and throw an exception). > * matches API delegates the work of providing proper match ranges to the > query itself (actually, to the weight a query produces), it doesn't need to > know anything about different implementations and their specifics. > The absolute *key* element is the last one. Once you build match region > retriever, highlighting is a merely about organizing match ranges, dealing > with potential overlaps, and proper formatting. It becomes a simple, > tractable problem separated from the internals of Lucene Queries. > The initial set of
[jira] [Commented] (LUCENE-9498) Move matchhighlighter to a separate subproject
[ https://issues.apache.org/jira/browse/LUCENE-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190979#comment-17190979 ] David Smiley commented on LUCENE-9498: -- I think it would be weird to separate one highlighter from the "highlighter" module simply because of these dependencies. The "memory" (MemoryIndex) dependency is fantastic for re-analysis of stored text. It's so useful and so small... I kinda wonder if it'd be better off in lucene-core. Even in spatial is in lucene-core these days! The "queries" dependency is only there because the other highlighters detect certain Query subclasses there to know how to highlight them. The Matches API makes that approach obsolete. The new "matches" highlighter/framework exclusively uses that new API, and the UnifiedHighlighter is dual-mode; can use it or not as one prefers. There's an issue to make it use this default starting in 9.0. > Move matchhighlighter to a separate subproject > -- > > Key: LUCENE-9498 > URL: https://issues.apache.org/jira/browse/LUCENE-9498 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > This is a trivial thing to do (on master at least). Match highlighter has no > other dependencies. It sort of fits in the "highlighter" package but this > package depends on {{queries}} and {{memory}} packages. I wonder if we should > move it to a separate subproject? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14833) Empty highlight entry on match only for some queries
[ https://issues.apache.org/jira/browse/SOLR-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190976#comment-17190976 ] David Smiley commented on SOLR-14833: - Can you try {{hl.method=unified}}? Another debugging aid here is debug=query which will give some insights into the query representation. > Empty highlight entry on match only for some queries > -- > > Key: SOLR-14833 > URL: https://issues.apache.org/jira/browse/SOLR-14833 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 8.6.2 >Reporter: Hossameldin Khalifa >Priority: Critical > Attachments: Screen Shot 1442-01-17 at 3.55.53 AM.png, Screen Shot > 1442-01-17 at 3.56.05 AM.png, Screen Shot 1442-01-17 at 3.56.16 AM.png > > > Solr Input : Solr Input : > ```json\{ "query": "text:(\"ما جرى بين الصحابة\" الماتريدي)", > "fields": "book_id,author_id,cat_id,meta,id,text", "params": {" > rows": 20, "start": 0, "hl": "true", "hl.fl": > "text_highlighting,text_highlighting_copy", "hl.fragmenter": "regex", > "hl.q": "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", > "f.text_highlighting.hl.fragsize": 110, > "f.text_highlighting_copy.hl.fragsize": 0} }}``` > For exactly this text `"ما جرى بين الصحابة" الماتريدي` and some other queries > the highlights have some empty matches.I checked if the indexes don"t have > text stored in them but they seem to look like all other indexes. > Here is an example of some part of the output of the > highlighter:```"d3108d2d-1344-458c-8c28-0639f82b274e": \{"text_highlighting": > [" الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا > يخوض فيها ما جرى بين الصحابة \ufd43، وما > حصل لبعضهم"], "text_highlighting_copy": ["فاجهد علي جهدك! "، وقال \ufd41 في > الدفاع عن عثمان حين سأل هذا الضال: "أما عثمان فكان الله قد عفا عنه وكرهتم أن > تعفوا عنه، وأما عليّ فابن عمّ رسول الله ﷺ وختنه"، ثم أخذ يذكر من محاسن علي > وعثمان \ufd44 حتى أفحم هذا الضال فذهب خائبا، وقال له ابن عمر \ufd41: "اذهب > بهذا الآن معك"، قال العيني \ufd40: أي اقرن هذا العذر بالجواب حتى لا يبقى لك > فيما أجبتك به حجة على ما كنت تعتقد" (1) فينبغي للداعية أن يدافع عن الصحابة > \ufd43 وعن أئمة الهدى من علماء أهل السنة والجماعة، ولكن بالحكمة والموعظة > الحسنة، والجدال بالحسنى.\nرابعا: من أساليب الدعوة: استخدام الشدة مع بعض > المدعوين: الأصل في الأساليب في الدعوة إلى الله \ufdff الرفق واللين، ولكن من > المدعوين من لا يجدي ولا ينفع فيه ومعه إلا الشدة والقوة؛ ولهذا استخدم عبد الله > بن عمر \ufd41 أسلوب الشدة مع الرجل الضال الذي يطعن في علي وعثمان \ufd44، > فقال: "أرغم الله بأنفك"، وقال \ufd41: "قاتلنا حتى لم تكن فتنة وكان الدين لله، > وأنتم تريدون أن تقاتلوا حتى تكون فتنة ويكون الدين لغير الله"، وهذا فيه قوة في > الأسلوب، ولكن لا يفعل ذلك إلا مع الأمن من الوقوع في المفاسد، والله المستعان > (2).\nخامسا: أهمية الكف عما جرى بين الصحابة \ufd43: إن من الأمور المهمة التي > ينبغي للداعية أن يعرض عنها ولا يخوض فيها ما جرى > بين الصحابة \ufd43، وما حصل لبعضهم؛ لأن الكف عن ذلك مذهب > أهل الحق والاعتدال (3)؛ ولهذا قال عبد الله بن عمر \ufd44 في هذا الحديث: "أما > عثمان فكان الله قد عفا عنه فكرهتم أن تعفوا عنه، وأما عليّ فابن عمّ رسول الله > ﷺ"، قال شيخ الإسلام ابن تيمية \ufd40 في مذهب أهل\n_\n(1) عمدة القاري، > شرح صحيح البخاري، 16/ 207.\n(2) انظر: الحديث رقم 116، الدرس العاشر.\n(3) > انظر: شرح العقيدة الواسطية، لابن تيمية، تأليف محمد خليل الهراس، ص 250."]}, > "1f36e221-2683-4bc7-9732-e6a64298f2df": {}}``` > I tried setting `hl.maxAnalyzedChars` to a large integer value and it still > did not workOne thing I also know that when removing `"hl.q": > "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", ` from the params it > works.However it then does not highlight the stop words, which is not my > desired behaviour. > Here is the relevant part of my solr schema > ```xml version="1.6"> id > positionIncrementGap="100"> class="solr.SynonymGraphFilterFactory" > tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true" /> class="solr.WhitespaceTokenizerFactory"/> class="solr.WordDelimiterGraphFilterFactory"/> class="solr.FlattenGraphFilterFactory"/> class="solr.StopFilterFactory" words="lang/stopwords_ar.txt" > ignoreCase="true"/> > class="solr.ArabicStemFilterFactory"/> class="solr.RemoveDuplicatesTokenFilterFactory"/> > positionIncrementGap="100"> class="solr.SynonymGraphFilterFactory" > tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true" /> class="solr.WhitespaceTokenizerFactory"/> class="solr.WordDelimiterGraphFilterFactory"/>
[jira] [Resolved] (SOLR-14832) Inversion Eglish and numbers characters in Arabic documents
[ https://issues.apache.org/jira/browse/SOLR-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14832. --- Resolution: Invalid Please raise questions like this on the user's list, we try to reserve JIRAs for known bugs/enhancements rather than usage questions. The JIRA system is not a support portal. See: http://lucene.apache.org/solr/community.html#mailing-lists-irc there are links to both Lucene and Solr mailing lists there. A _lot_ more people will see your question on that list and may be able to help more quickly. If it's determined that this really is a code issue or enhancement to Lucene or Solr and not a configuration/usage problem, we can raise a new JIRA or reopen this one. > Inversion Eglish and numbers characters in Arabic documents > --- > > Key: SOLR-14832 > URL: https://issues.apache.org/jira/browse/SOLR-14832 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 4.1 >Reporter: Vlad >Priority: Major > > Hi Support, > > please help to resolve an issue. I upload/index several documents in English > and in Arabic languages to SOLR, in addition I use handler for Arabic > language: > > > > words="stopwords.txt" enablePositionIncrements="true" /> > class="solr.RemoveDuplicatesTokenFilterFactory"/> > class="solr.ArabicNormalizationFilterFactory"/> > > > > > > > words="stopwords.txt" enablePositionIncrements="true" /> > ignoreCase="true" expand="true"/> > class="solr.RemoveDuplicatesTokenFilterFactory"/> > class="solr.ArabicNormalizationFilterFactory"/> > > > > > > There are two environments: > # Local machine: > - SOLR version: 4,2 > - Windows version: 10 > > # DEV env: > - SOLR version 4.1 as part of the cloudera suit > - Linux core version: 3.10.0-862 > > Issue appears when uploading documents: > # Local machine: > - Doc in English with English words only - ok (for example, > "[www.apache.org|http://www.apache.org/];) > - Doc in Arabic with some English words - ok (for example, > "[www.apache.org|http://www.apache.org/];) > > # DEV env: > - Doc in English with English words only - ok (for example, > "[www.apache.org|http://www.apache.org/];) > - Doc in Arabic with some English - English text is inverted > (for example, "gro.echapa.www"), what makes search by key words impossible. > > Please advise whether this fixable and how? > > Thank you in advance! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14833) Empty highlight entry on match only for some queries
[ https://issues.apache.org/jira/browse/SOLR-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossameldin Khalifa updated SOLR-14833: --- Attachment: Screen Shot 1442-01-17 at 3.56.16 AM.png Screen Shot 1442-01-17 at 3.56.05 AM.png Screen Shot 1442-01-17 at 3.55.53 AM.png > Empty highlight entry on match only for some queries > -- > > Key: SOLR-14833 > URL: https://issues.apache.org/jira/browse/SOLR-14833 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 8.6.2 >Reporter: Hossameldin Khalifa >Priority: Critical > Attachments: Screen Shot 1442-01-17 at 3.55.53 AM.png, Screen Shot > 1442-01-17 at 3.56.05 AM.png, Screen Shot 1442-01-17 at 3.56.16 AM.png > > > Solr Input : Solr Input : > ```json\{ "query": "text:(\"ما جرى بين الصحابة\" الماتريدي)", > "fields": "book_id,author_id,cat_id,meta,id,text", "params": {" > rows": 20, "start": 0, "hl": "true", "hl.fl": > "text_highlighting,text_highlighting_copy", "hl.fragmenter": "regex", > "hl.q": "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", > "f.text_highlighting.hl.fragsize": 110, > "f.text_highlighting_copy.hl.fragsize": 0} }}``` > For exactly this text `"ما جرى بين الصحابة" الماتريدي` and some other queries > the highlights have some empty matches.I checked if the indexes don"t have > text stored in them but they seem to look like all other indexes. > Here is an example of some part of the output of the > highlighter:```"d3108d2d-1344-458c-8c28-0639f82b274e": \{"text_highlighting": > [" الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا > يخوض فيها ما جرى بين الصحابة \ufd43، وما > حصل لبعضهم"], "text_highlighting_copy": ["فاجهد علي جهدك! "، وقال \ufd41 في > الدفاع عن عثمان حين سأل هذا الضال: "أما عثمان فكان الله قد عفا عنه وكرهتم أن > تعفوا عنه، وأما عليّ فابن عمّ رسول الله ﷺ وختنه"، ثم أخذ يذكر من محاسن علي > وعثمان \ufd44 حتى أفحم هذا الضال فذهب خائبا، وقال له ابن عمر \ufd41: "اذهب > بهذا الآن معك"، قال العيني \ufd40: أي اقرن هذا العذر بالجواب حتى لا يبقى لك > فيما أجبتك به حجة على ما كنت تعتقد" (1) فينبغي للداعية أن يدافع عن الصحابة > \ufd43 وعن أئمة الهدى من علماء أهل السنة والجماعة، ولكن بالحكمة والموعظة > الحسنة، والجدال بالحسنى.\nرابعا: من أساليب الدعوة: استخدام الشدة مع بعض > المدعوين: الأصل في الأساليب في الدعوة إلى الله \ufdff الرفق واللين، ولكن من > المدعوين من لا يجدي ولا ينفع فيه ومعه إلا الشدة والقوة؛ ولهذا استخدم عبد الله > بن عمر \ufd41 أسلوب الشدة مع الرجل الضال الذي يطعن في علي وعثمان \ufd44، > فقال: "أرغم الله بأنفك"، وقال \ufd41: "قاتلنا حتى لم تكن فتنة وكان الدين لله، > وأنتم تريدون أن تقاتلوا حتى تكون فتنة ويكون الدين لغير الله"، وهذا فيه قوة في > الأسلوب، ولكن لا يفعل ذلك إلا مع الأمن من الوقوع في المفاسد، والله المستعان > (2).\nخامسا: أهمية الكف عما جرى بين الصحابة \ufd43: إن من الأمور المهمة التي > ينبغي للداعية أن يعرض عنها ولا يخوض فيها ما جرى > بين الصحابة \ufd43، وما حصل لبعضهم؛ لأن الكف عن ذلك مذهب > أهل الحق والاعتدال (3)؛ ولهذا قال عبد الله بن عمر \ufd44 في هذا الحديث: "أما > عثمان فكان الله قد عفا عنه فكرهتم أن تعفوا عنه، وأما عليّ فابن عمّ رسول الله > ﷺ"، قال شيخ الإسلام ابن تيمية \ufd40 في مذهب أهل\n_\n(1) عمدة القاري، > شرح صحيح البخاري، 16/ 207.\n(2) انظر: الحديث رقم 116، الدرس العاشر.\n(3) > انظر: شرح العقيدة الواسطية، لابن تيمية، تأليف محمد خليل الهراس، ص 250."]}, > "1f36e221-2683-4bc7-9732-e6a64298f2df": {}}``` > I tried setting `hl.maxAnalyzedChars` to a large integer value and it still > did not workOne thing I also know that when removing `"hl.q": > "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", ` from the params it > works.However it then does not highlight the stop words, which is not my > desired behaviour. > Here is the relevant part of my solr schema > ```xml version="1.6"> id > positionIncrementGap="100"> class="solr.SynonymGraphFilterFactory" > tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true" /> class="solr.WhitespaceTokenizerFactory"/> class="solr.WordDelimiterGraphFilterFactory"/> class="solr.FlattenGraphFilterFactory"/> class="solr.StopFilterFactory" words="lang/stopwords_ar.txt" > ignoreCase="true"/> > class="solr.ArabicStemFilterFactory"/> class="solr.RemoveDuplicatesTokenFilterFactory"/> > positionIncrementGap="100"> class="solr.SynonymGraphFilterFactory" > tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true" /> class="solr.WhitespaceTokenizerFactory"/> class="solr.WordDelimiterGraphFilterFactory"/>
[jira] [Created] (SOLR-14833) Empty highlight entry on match only for some queries
Hossameldin Khalifa created SOLR-14833: -- Summary: Empty highlight entry on match only for some queries Key: SOLR-14833 URL: https://issues.apache.org/jira/browse/SOLR-14833 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: search Affects Versions: 8.6.2 Reporter: Hossameldin Khalifa Solr Input : Solr Input : ```json\{ "query": "text:(\"ما جرى بين الصحابة\" الماتريدي)", "fields": "book_id,author_id,cat_id,meta,id,text", "params": {" rows": 20, "start": 0, "hl": "true", "hl.fl": "text_highlighting,text_highlighting_copy", "hl.fragmenter": "regex", "hl.q": "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", "f.text_highlighting.hl.fragsize": 110, "f.text_highlighting_copy.hl.fragsize": 0} }}``` For exactly this text `"ما جرى بين الصحابة" الماتريدي` and some other queries the highlights have some empty matches.I checked if the indexes don"t have text stored in them but they seem to look like all other indexes. Here is an example of some part of the output of the highlighter:```"d3108d2d-1344-458c-8c28-0639f82b274e": \{"text_highlighting": [" الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا يخوض فيها ما جرى بين الصحابة \ufd43، وما حصل لبعضهم"], "text_highlighting_copy": ["فاجهد علي جهدك! "، وقال \ufd41 في الدفاع عن عثمان حين سأل هذا الضال: "أما عثمان فكان الله قد عفا عنه وكرهتم أن تعفوا عنه، وأما عليّ فابن عمّ رسول الله ﷺ وختنه"، ثم أخذ يذكر من محاسن علي وعثمان \ufd44 حتى أفحم هذا الضال فذهب خائبا، وقال له ابن عمر \ufd41: "اذهب بهذا الآن معك"، قال العيني \ufd40: أي اقرن هذا العذر بالجواب حتى لا يبقى لك فيما أجبتك به حجة على ما كنت تعتقد" (1) فينبغي للداعية أن يدافع عن الصحابة \ufd43 وعن أئمة الهدى من علماء أهل السنة والجماعة، ولكن بالحكمة والموعظة الحسنة، والجدال بالحسنى.\nرابعا: من أساليب الدعوة: استخدام الشدة مع بعض المدعوين: الأصل في الأساليب في الدعوة إلى الله \ufdff الرفق واللين، ولكن من المدعوين من لا يجدي ولا ينفع فيه ومعه إلا الشدة والقوة؛ ولهذا استخدم عبد الله بن عمر \ufd41 أسلوب الشدة مع الرجل الضال الذي يطعن في علي وعثمان \ufd44، فقال: "أرغم الله بأنفك"، وقال \ufd41: "قاتلنا حتى لم تكن فتنة وكان الدين لله، وأنتم تريدون أن تقاتلوا حتى تكون فتنة ويكون الدين لغير الله"، وهذا فيه قوة في الأسلوب، ولكن لا يفعل ذلك إلا مع الأمن من الوقوع في المفاسد، والله المستعان (2).\nخامسا: أهمية الكف عما جرى بين الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا يخوض فيها ما جرى بين الصحابة \ufd43، وما حصل لبعضهم؛ لأن الكف عن ذلك مذهب أهل الحق والاعتدال (3)؛ ولهذا قال عبد الله بن عمر \ufd44 في هذا الحديث: "أما عثمان فكان الله قد عفا عنه فكرهتم أن تعفوا عنه، وأما عليّ فابن عمّ رسول الله ﷺ"، قال شيخ الإسلام ابن تيمية \ufd40 في مذهب أهل\n_\n(1) عمدة القاري، شرح صحيح البخاري، 16/ 207.\n(2) انظر: الحديث رقم 116، الدرس العاشر.\n(3) انظر: شرح العقيدة الواسطية، لابن تيمية، تأليف محمد خليل الهراس، ص 250."]}, "1f36e221-2683-4bc7-9732-e6a64298f2df": {}}``` I tried setting `hl.maxAnalyzedChars` to a large integer value and it still did not workOne thing I also know that when removing `"hl.q": "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", ` from the params it works.However it then does not highlight the stop words, which is not my desired behaviour. Here is the relevant part of my solr schema ```xml id ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9501) IndexSortSortedNumericDocValuesRangeQuery violates iterator invariant.
[ https://issues.apache.org/jira/browse/LUCENE-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190949#comment-17190949 ] Julie Tibshirani commented on LUCENE-9501: -- The fix to the query itself: https://github.com/apache/lucene-solr/pull/1833 Another change related to the Asserting* classes: https://github.com/apache/lucene-solr/pull/1834 The query fix should be merged before the Asserting* wrapper change. Otherwise TestIndexSortSortedDocValuesQuery tests will start to fail sporadically. > IndexSortSortedNumericDocValuesRangeQuery violates iterator invariant. > -- > > Key: LUCENE-9501 > URL: https://issues.apache.org/jira/browse/LUCENE-9501 > Project: Lucene - Core > Issue Type: Bug >Reporter: Julie Tibshirani >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In LUCENE-7714 we added a new query to sandbox called > IndexSortSortedNumericDocValuesRangeQuery that optimizes range calculations > when the field is sorted. The query has a bad bug: its DocIdSetIterator can > return an old value for docID() even after advance has returned NO_MORE_DOCS. > This violates the DocIdSetIterator contract and means that it's possible for > DocIdSetIterator#advance to be called when it's already been exhausted (which > can result in invalid reads). > We would have expected this issue to be caught in tests, especially because > classes like AssertingIndexSearcher check for these invariants. As part of > this fix I'll look into improvements to the Asserting* wrapper framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``
Sorabh Hamirwasia created LUCENE-9508: - Summary: DocumentsWriter doesn't check for BlockedFlushes in stall mode`` Key: LUCENE-9508 URL: https://issues.apache.org/jira/browse/LUCENE-9508 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 8.5.1 Reporter: Sorabh Hamirwasia Hi, I was investigating an issue where the memory usage by a single Lucene IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, this case ~3GB). So ideally memory usage should not go above that limit. I looked into the heap dump and found that the fullFlush thread when enters *markForFullFlush* method, it tries to take lock on the ThreadStates of all the DWPT thread sequentially. If lock on one of the ThreadState is blocked then it will block indefinitely. In this this is what happened as one of the DWPT thread was stuck in indexing process. Due to this fullFlush thread was unable to populate the flush queue even though the stall mode was detected. This caused the new indexing request which came on indexing thread to continue after sleeping for a second, and continue with indexing. In **preUpdate()** method it looks for the stalled case and see if there is any pending flushes (based on flush queue), if not then sleep and continue. Question: 1) Should **preUpdate** look into the blocked flushes information as well instead of just flush queue ? 2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates ? Since single blocking writing thread can block the full flush here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani opened a new pull request #1834: Make sure to test normal scorers with asserting wrappers.
jtibshirani opened a new pull request #1834: URL: https://github.com/apache/lucene-solr/pull/1834 When a query is run at the top-level, the searcher uses `Weight#bulkScorer`. Many queries don't implement this explicitly and instead rely on the default implementation which delegates to `Weight#scorer`. Previously `AssertingWeight` would always wrap the delegate's bulk scorer. So for queries that rely on `Weight#scorer`, we weren't wrapping the scorer or iterator to run checks. This change proposes that `AssertingWeight#bulkScorer` sometimes use the default implementation to make sure we also test normal scorers. This change would have caught the bug in LUCENE-9501. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #1833: LUCENE-9501: Fix invariant violation in IndexSortSortedNumericDocValuesRangeQuery.
jtibshirani commented on a change in pull request #1833: URL: https://github.com/apache/lucene-solr/pull/1833#discussion_r483868710 ## File path: lucene/sandbox/src/test/org/apache/lucene/search/TestIndexSortSortedNumericDocValuesRangeQuery.java ## @@ -65,7 +65,7 @@ public void testSameHitsAsPointRangeQuery() throws IOException { iw.deleteDocuments(LongPoint.newRangeQuery("idx", 0L, 10L)); } final IndexReader reader = iw.getReader(); - final IndexSearcher searcher = newSearcher(reader, false); + final IndexSearcher searcher = newSearcher(reader); Review comment: This isn't critical for test coverage, but it seemed off that we had disabled wrapping the reader. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jtibshirani opened a new pull request #1833: LUCENE-9501: Fix invariant violation in IndexSortSortedNumericDocValuesRangeQuery.
jtibshirani opened a new pull request #1833: URL: https://github.com/apache/lucene-solr/pull/1833 Previously the DocIdSetIterator returned an old value for docID even after advance returned NO_MORE_DOCS. This violates the DocIdSetIterator contract and made it possible for the iterator's advance method to be called even after it was already exhausted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14832) Inversion Eglish and numbers characters in Arabic documents
[ https://issues.apache.org/jira/browse/SOLR-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vlad updated SOLR-14832: Description: Hi Support, please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language: There are two environments: # Local machine: - SOLR version: 4,2 - Windows version: 10 # DEV env: - SOLR version 4.1 as part of the cloudera suit - Linux core version: 3.10.0-862 Issue appears when uploading documents: # Local machine: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/];) - Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/];) # DEV env: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/];) - Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible. Please advise whether this fixable and how? Thank you in advance! was: Hi Support, please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language: There are two environments: # Local machine: - SOLR version: 4,2 - Windows version: 10 # DEV env: - SOLR version: - Cloudera suit - Linux core version: 3.10.0-862 Issue appears when uploading documents: # Local machine: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/];) - Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/];) # DEV env: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/];) - Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible. Please advise whether this fixable and how? > Inversion Eglish and numbers characters in Arabic documents > --- > > Key: SOLR-14832 > URL: https://issues.apache.org/jira/browse/SOLR-14832 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 4.1 >Reporter: Vlad >Priority: Major > > Hi Support, > > please help to resolve an issue. I upload/index several documents in English > and in Arabic languages to SOLR, in addition I use handler for Arabic > language: > > > > words="stopwords.txt" enablePositionIncrements="true" /> > class="solr.RemoveDuplicatesTokenFilterFactory"/> > class="solr.ArabicNormalizationFilterFactory"/> > > > > > > > words="stopwords.txt" enablePositionIncrements="true" /> > ignoreCase="true" expand="true"/> > class="solr.RemoveDuplicatesTokenFilterFactory"/> > class="solr.ArabicNormalizationFilterFactory"/> > > > > > > There are two environments: > # Local machine: > - SOLR version: 4,2 > - Windows version: 10 > > # DEV env: > - SOLR version 4.1 as part of the cloudera suit > - Linux core version: 3.10.0-862 > > Issue appears when uploading documents: > # Local machine: > - Doc in English with English words only - ok (for example, > "[www.apache.org|http://www.apache.org/];) > - Doc in Arabic with some English words - ok (for example, > "[www.apache.org|http://www.apache.org/];) > > # DEV env: > - Doc in English with English words only - ok (for example, > "[www.apache.org|http://www.apache.org/];) > - Doc in Arabic with some English - English text is inverted > (for example, "gro.echapa.www"), what makes search by key words impossible. > > Please advise whether
[jira] [Created] (SOLR-14832) Inversion Eglish and numbers characters in Arabic documents
Vlad created SOLR-14832: --- Summary: Inversion Eglish and numbers characters in Arabic documents Key: SOLR-14832 URL: https://issues.apache.org/jira/browse/SOLR-14832 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 4.1 Reporter: Vlad Hi Support, please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language: There are two environments: # Local machine: - SOLR version: 4,2 - Windows version: 10 # DEV env: - SOLR version: - Cloudera suit - Linux core version: 3.10.0-862 Issue appears when uploading documents: # Local machine: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/];) - Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/];) # DEV env: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/];) - Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible. Please advise whether this fixable and how? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman edited a comment on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation
chatman edited a comment on pull request #1684: URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687329334 I withdraw all outstanding concerns. Verbosity, clunkiness/ineffectiveness/misplacement of configuration etc are all my "perceptions" that I don't want to come in the way of the completion of this effort. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman edited a comment on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation
chatman edited a comment on pull request #1684: URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-686355702 > @noblepaul & @chatman I find the tone of your latest comments offensive - that's no way to build a consensus. Please think twice before posting and calm down - if you have a different opinion about technical merits of this PR then I'm sure you can express it without personal attacks. I don't see how Noble's comments can construed as offensive. I may be biased in favour of my own comments, but I *apologise* if they were perceived as such. In any case, there is no personal attack anywhere. > By all means, if you disagree so strongly with the approach presented here then please do so - just be sure that you actually will do it instead of just complaining. I find choice of such words (" instead of just complaining") as unprofessional. This is a proposal, and comments are added to critique the design, not complain. On the other hand, Ilan wrote this on Slack: > If there’s consensus for Noble’s approach (or for that matter no consensus that goals 1-3 above are good guiding principles), I will stop work on SOLR-14613 and move on to other unrelated topics. Such threats of "stop work" unless one's design is agreed upon should cease, and constructive ways to collaborate should be explored. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman commented on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation
chatman commented on pull request #1684: URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687329334 I withdraw all outstanding concerns. Verbosity, clunkiness of configuration etc are all my "perceptions" that I don't want to come in the way of the completion of this effort. > This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
vthacker commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483794345 ## File path: gradle/validation/error-prone.gradle ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +allprojects { prj -> + plugins.withType(JavaPlugin) { +prj.apply plugin: 'net.ltgt.errorprone' + +dependencies { + errorprone("com.google.errorprone:error_prone_core") +} + +tasks.withType(JavaCompile) { task -> + options.errorprone.errorproneArgs = [ + // test + '-Xep:ExtendingJUnitAssert:OFF', Review comment: sounds good! I'll take the current branch and add two things and then commit the code 1. `options.errorprone.disableWarningsInGeneratedCode = true` 2. CHANGES entry This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…
dweiss commented on a change in pull request #1830: URL: https://github.com/apache/lucene-solr/pull/1830#discussion_r483793729 ## File path: gradle/validation/validate-source-patterns.gradle ## @@ -29,50 +33,117 @@ buildscript { } } -configure(rootProject) { - task("validateSourcePatterns", type: ValidateSourcePatternsTask) { task -> +def extensions = [ +'adoc', +'bat', +'cmd', +'css', +'g4', +'gradle', +'groovy', +'html', +'java', +'jflex', +'jj', +'js', +'json', +'mdtext', +'pl', +'policy', +'properties', +'py', +'sh', +'template', +'vm', +'xml', +'xsl', +] + +// Create source validation task local for each project's files. +subprojects { + task validateSourcePatterns(type: ValidateSourcePatternsTask) { task -> group = 'Verification' description = 'Validate Source Patterns' // This task has no proper outputs. setupDummyOutputs(task) -sourceFiles = project.fileTree(project.rootDir) { - [ -'java', 'jflex', 'py', 'pl', 'g4', 'jj', 'html', 'js', -'css', 'xml', 'xsl', 'vm', 'sh', 'cmd', 'bat', 'policy', -'properties', 'mdtext', 'groovy', 'gradle', -'template', 'adoc', 'json', - ].each{ -include "lucene/**/*.${it}" -include "solr/**/*.${it}" -include "dev-tools/**/*.${it}" -include "gradle/**/*.${it}" +sourceFiles = fileTree(projectDir) { + extensions.each{ +include "*.${it}" + } + + // default excludes. Review comment: It could be. I didn't have time to clean up everything. The speedup was significant for me anyway (order of magnitude). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…
dweiss commented on pull request #1830: URL: https://github.com/apache/lucene-solr/pull/1830#issuecomment-687320373 No worries. Not very urgent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
dweiss commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483793353 ## File path: gradle/validation/error-prone.gradle ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +allprojects { prj -> + plugins.withType(JavaPlugin) { +prj.apply plugin: 'net.ltgt.errorprone' + +dependencies { + errorprone("com.google.errorprone:error_prone_core") +} + +tasks.withType(JavaCompile) { task -> + options.errorprone.errorproneArgs = [ + // test + '-Xep:ExtendingJUnitAssert:OFF', Review comment: Varun - feel free to take this branch (or patch) and roll it out on yours. I didn't intend it to be committed, I just wanted to show what's needed for it to compile and work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
dweiss commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483793090 ## File path: lucene/core/src/java/org/apache/lucene/analysis/CharArrayMap.java ## @@ -523,6 +523,7 @@ public void clear() { * @throws NullPointerException * if the given map is null. */ + @SuppressWarnings("ReferenceEquality") Review comment: Yes, exactly. I wanted the pr to include an example of how this can be done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] vthacker commented on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation
vthacker commented on pull request #1684: URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687318445 > The fact is, all of has the same objective: to make the product better. > he purpose is to ensure that the feature/change is >correct >performant/efficient >user-friendly I think everyone agrees on this. I really wish we can be better while giving feedback being nicer. We'd be able to collaborate better and keep the focus on the design decisions What are the current concerns with the current PR? 1. The verbosity? 2. Where the config lives? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] vthacker edited a comment on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation
vthacker edited a comment on pull request #1684: URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687318445 > The fact is, all of has the same objective: to make the product better. > he purpose is to ensure that the feature/change is >correct >performant/efficient >user-friendly I think everyone agrees on this. I really wish we can be nicer while giving feedback. We'd be able to collaborate better and keep the focus on the design decisions What are the current concerns with the current PR? 1. The verbosity? 2. Where the config lives? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…
uschindler commented on pull request #1830: URL: https://github.com/apache/lucene-solr/pull/1830#issuecomment-687317814 I will check over the weekend. Was too busy today! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14579) Comment SolrJ 'Utils' generic map functions
[ https://issues.apache.org/jira/browse/SOLR-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190872#comment-17190872 ] Uwe Schindler commented on SOLR-14579: -- bq. I have no recollection of doing so, certainly not intentionally. Sorry my fault: This JIRA comment confused me, as it was comming directly after the master and 8.x commits. But this is a different branch, why was it added back there?: [https://issues.apache.org/jira/browse/SOLR-14579?focusedCommentId=1712=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1712] > Comment SolrJ 'Utils' generic map functions > --- > > Key: SOLR-14579 > URL: https://issues.apache.org/jira/browse/SOLR-14579 > Project: Solr > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Megan Carey >Assignee: Erick Erickson >Priority: Minor > Fix For: 8.7 > > Attachments: SOLR-14579.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Remove the map functions like `NEW_HASHMAP_FUN` from the Utils class in solrj > module to reduce warnings and improve code quality. > [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/Utils.java#L92] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…
madrob commented on a change in pull request #1830: URL: https://github.com/apache/lucene-solr/pull/1830#discussion_r483773660 ## File path: gradle/validation/validate-source-patterns.gradle ## @@ -29,50 +33,117 @@ buildscript { } } -configure(rootProject) { - task("validateSourcePatterns", type: ValidateSourcePatternsTask) { task -> +def extensions = [ +'adoc', +'bat', +'cmd', +'css', +'g4', +'gradle', +'groovy', +'html', +'java', +'jflex', +'jj', +'js', +'json', +'mdtext', +'pl', +'policy', +'properties', +'py', +'sh', +'template', +'vm', +'xml', +'xsl', +] + +// Create source validation task local for each project's files. +subprojects { + task validateSourcePatterns(type: ValidateSourcePatternsTask) { task -> group = 'Verification' description = 'Validate Source Patterns' // This task has no proper outputs. setupDummyOutputs(task) -sourceFiles = project.fileTree(project.rootDir) { - [ -'java', 'jflex', 'py', 'pl', 'g4', 'jj', 'html', 'js', -'css', 'xml', 'xsl', 'vm', 'sh', 'cmd', 'bat', 'policy', -'properties', 'mdtext', 'groovy', 'gradle', -'template', 'adoc', 'json', - ].each{ -include "lucene/**/*.${it}" -include "solr/**/*.${it}" -include "dev-tools/**/*.${it}" -include "gradle/**/*.${it}" +sourceFiles = fileTree(projectDir) { + extensions.each{ +include "*.${it}" + } + + // default excludes. Review comment: should the excludes be an input property so that we don't have to repeat them later on root project? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14829) Default components are missing facet_module and terms in documentation
[ https://issues.apache.org/jira/browse/SOLR-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190814#comment-17190814 ] David Smiley commented on SOLR-14829: - In your own/custom definition of your request handler, do you need to actually list these components at all, vs just rely on the default list? I think most people by far let the defaults happen. > Default components are missing facet_module and terms in documentation > -- > > Key: SOLR-14829 > URL: https://issues.apache.org/jira/browse/SOLR-14829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation, examples >Affects Versions: 8.6.2 >Reporter: Johannes Baiter >Assignee: Ishan Chattopadhyaya >Priority: Minor > Attachments: SOLR-14829.patch > > > In the reference guide, the list of search components that are enabled by > default is missing the {{facet_module}} and {{terms}} components. The terms > component is instead listed under "other useful components", while the > {{FacetModule}} is never listed anywhere in the documentation, despite it > being neccessary for the JSON Facet API to work. > This is also how I stumbled upon this, I spent hours trying to figure out why > JSON-based faceting was not working with my setup, after taking a glance at > the {{SearchHandler}} source code based on a hunch, it became clear that my > custom list of search components (created based on the list in the reference > guide) was to blame. > A patch for the documentation gap is attached, but I think there are some > other issues with the naming/documentation around the two faceting APIs that > may be worth discussing: > * The names {{facet_module}} / {{FacetModule}} are very misleading, since > the documentation is always talking about the "JSON Facet API", but the term > "JSON" does not appear in the name of the component nor does the component > have any documentation attached that mentions this > * Why is the {{FacetModule}} class located in the {{search.facet}} package > while every single other search component included in the core is located in > the {{handler.component}} package? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9497) Integerate Error Prone ( Static Analysis Tool ) during compilation
[ https://issues.apache.org/jira/browse/LUCENE-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190810#comment-17190810 ] Varun Thacker commented on LUCENE-9497: --- > error prone uses regular SuppressWarnings annotation with custom names for > each category. yep! We'll use this when we start enabling warnings to suppress legit uses > Integerate Error Prone ( Static Analysis Tool ) during compilation > -- > > Key: LUCENE-9497 > URL: https://issues.apache.org/jira/browse/LUCENE-9497 > Project: Lucene - Core > Issue Type: Task >Reporter: Varun Thacker >Priority: Minor > Time Spent: 4h 20m > Remaining Estimate: 0h > > Integrate [https://github.com/google/error-prone] during compilation of our > source code to catch mistakes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
vthacker commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483720288 ## File path: gradle/validation/error-prone.gradle ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +allprojects { prj -> + plugins.withType(JavaPlugin) { +prj.apply plugin: 'net.ltgt.errorprone' + +dependencies { + errorprone("com.google.errorprone:error_prone_core") +} + +tasks.withType(JavaCompile) { task -> + options.errorprone.errorproneArgs = [ + // test + '-Xep:ExtendingJUnitAssert:OFF', Review comment: I am okay with either of the two styles. Ideally we'd want this list to get much shorter soon :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on pull request #1827: SOLR-14792: Remove VelocityResponseWriter
dsmiley commented on pull request #1827: URL: https://github.com/apache/lucene-solr/pull/1827#issuecomment-687245022 BTW the more up do date https://github.com/erikhatcher/solritas is in the next few days, the better as I'll be doing a recorded Activate session on this September 10th with a demo of it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
vthacker commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483719049 ## File path: lucene/core/src/java/org/apache/lucene/analysis/CharArrayMap.java ## @@ -523,6 +523,7 @@ public void clear() { * @throws NullPointerException * if the given map is null. */ + @SuppressWarnings("ReferenceEquality") Review comment: This was the example you wanted to try out on how to suppress legitimate warnings of ReferenceEquality ( or any other warnings ) when we start enabling the checks ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
vthacker commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483717480 ## File path: gradle/validation/error-prone.gradle ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +allprojects { prj -> + plugins.withType(JavaPlugin) { +prj.apply plugin: 'net.ltgt.errorprone' + +dependencies { + errorprone("com.google.errorprone:error_prone_core") +} + +tasks.withType(JavaCompile) { task -> + options.errorprone.errorproneArgs = [ Review comment: Let's add this ? I probably missed it in my PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14439) Upgrade to Tika 1.24.1
[ https://issues.apache.org/jira/browse/SOLR-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Salamon updated SOLR-14439: -- Attachment: SOLR-14339.patch > Upgrade to Tika 1.24.1 > -- > > Key: SOLR-14439 > URL: https://issues.apache.org/jira/browse/SOLR-14439 > Project: Solr > Issue Type: Task > Components: contrib - DataImportHandler >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Attachments: SOLR-14339.patch > > > We recently released 1.24.1 with several fixes for DoS vulnerabilities we > found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server
[ https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190787#comment-17190787 ] David Smiley commented on SOLR-7632: I don't think an URP makes sense for this because Tika needs the entire binary input stream. URPs operate on SolrInputDocument. A RequestHandler is perfect. > Change the ExtractingRequestHandler to use Tika-Server > -- > > Key: SOLR-7632 > URL: https://issues.apache.org/jira/browse/SOLR-7632 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Reporter: Chris A. Mattmann >Priority: Major > Labels: gsoc2017, memex > > It's a pain to upgrade Tika's jars all the times when we release, and if Tika > fails it messes up the ExtractingRequestHandler (e.g., the document type > caused Tika to fail, etc). A more reliable way and also separated, and easier > to deploy version of the ExtractingRequestHandler would make a network call > to the Tika JAXRS server, and then call Tika on the Solr server side, get the > results and then index the information that way. I have a patch in the works > from the DARPA Memex project and I hope to post it soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] erikhatcher commented on pull request #1827: SOLR-14792: Remove VelocityResponseWriter
erikhatcher commented on pull request #1827: URL: https://github.com/apache/lucene-solr/pull/1827#issuecomment-687232685 > @erikhatcher BTW, the .adoc format renders nicely in Github if you were to pull the ref guide docs over to https://github.com/erikhatcher/solritas. We could also update the link in the solr.cool entry to point directly to them, instead of the general github README page ;-) Thanks for that tip! I'll definitely be pulling the docs over and adjusting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
dweiss commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483689014 ## File path: gradle/validation/error-prone.gradle ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +allprojects { prj -> + plugins.withType(JavaPlugin) { +prj.apply plugin: 'net.ltgt.errorprone' + +dependencies { + errorprone("com.google.errorprone:error_prone_core") +} + +tasks.withType(JavaCompile) { task -> + options.errorprone.errorproneArgs = [ Review comment: I suggested the same to Varun. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1743: Gradual naming convention enforcement.
dweiss commented on a change in pull request #1743: URL: https://github.com/apache/lucene-solr/pull/1743#discussion_r483684981 ## File path: lucene/test-framework/src/java/org/apache/lucene/util/VerifyTestClassNamingConvention.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import com.carrotsearch.randomizedtesting.RandomizedContext; +import org.junit.Assume; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.UncheckedIOException; +import java.io.Writer; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.StandardOpenOption; +import java.util.HashSet; +import java.util.Set; +import java.util.regex.Pattern; + +/** + * Enforce test naming convention. + */ +public class VerifyTestClassNamingConvention extends AbstractBeforeAfterRule { + public static final Pattern ALLOWED_CONVENTION = Pattern.compile("(.+?)\\.Test[^.]+"); + + private static Set exceptions; + static { +try { + exceptions = new HashSet<>(); + try (BufferedReader is = + new BufferedReader( + new InputStreamReader( + VerifyTestClassNamingConvention.class.getResourceAsStream("test-naming-exceptions.txt"), + StandardCharsets.UTF_8))) { +is.lines().forEach(exceptions::add); + } +} catch (IOException e) { + throw new UncheckedIOException(e); +} + } + + @Override + protected void before() throws Exception { +if (TestRuleIgnoreTestSuites.isRunningNested()) { + // Ignore nested test suites that test the test framework itself. + return; +} + +String suiteName = RandomizedContext.current().getTargetClass().getName(); + +// You can use this helper method to dump all suite names to a file. +// Run gradle with one worker so that it doesn't try to append to the same +// file from multiple processes: +// +// gradlew test --max-workers 1 -Dtests.useSecurityManager=false +// +// dumpSuiteNamesOnly(suiteName); + +if (!ALLOWED_CONVENTION.matcher(suiteName).matches()) { + // if this class exists on the exception list, leave it. Review comment: Same here, really. It was just an example of how it can be solved, not a final solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1743: Gradual naming convention enforcement.
dweiss commented on a change in pull request #1743: URL: https://github.com/apache/lucene-solr/pull/1743#discussion_r483684317 ## File path: lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java ## @@ -613,6 +613,7 @@ public static TestRuleIgnoreAfterMaxFailures replaceMaxFailureRule(TestRuleIgnor RuleChain r = RuleChain.outerRule(new TestRuleIgnoreTestSuites()) .around(ignoreAfterMaxFailures) .around(suiteFailureMarker = new TestRuleMarkFailure()) + .around(new VerifyTestClassNamingConvention()) Review comment: This is just code, anything can be changed... In the example I wrote it can't be turned off. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1743: Gradual naming convention enforcement.
cpoerschke commented on a change in pull request #1743: URL: https://github.com/apache/lucene-solr/pull/1743#discussion_r483679806 ## File path: lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java ## @@ -613,6 +613,7 @@ public static TestRuleIgnoreAfterMaxFailures replaceMaxFailureRule(TestRuleIgnor RuleChain r = RuleChain.outerRule(new TestRuleIgnoreTestSuites()) .around(ignoreAfterMaxFailures) .around(suiteFailureMarker = new TestRuleMarkFailure()) + .around(new VerifyTestClassNamingConvention()) Review comment: question: would this convention automatically and always apply to all classes derived from `LuceneTestCase` including any non-`org.apache` name spaces or would it be possible to opt-out (without an exclusion list) somehow for custom code that might perhaps have chosen a different convention? ## File path: lucene/test-framework/src/java/org/apache/lucene/util/VerifyTestClassNamingConvention.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import com.carrotsearch.randomizedtesting.RandomizedContext; +import org.junit.Assume; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.UncheckedIOException; +import java.io.Writer; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.nio.file.StandardOpenOption; +import java.util.HashSet; +import java.util.Set; +import java.util.regex.Pattern; + +/** + * Enforce test naming convention. + */ +public class VerifyTestClassNamingConvention extends AbstractBeforeAfterRule { + public static final Pattern ALLOWED_CONVENTION = Pattern.compile("(.+?)\\.Test[^.]+"); + + private static Set exceptions; + static { +try { + exceptions = new HashSet<>(); + try (BufferedReader is = + new BufferedReader( + new InputStreamReader( + VerifyTestClassNamingConvention.class.getResourceAsStream("test-naming-exceptions.txt"), + StandardCharsets.UTF_8))) { +is.lines().forEach(exceptions::add); + } +} catch (IOException e) { + throw new UncheckedIOException(e); +} + } + + @Override + protected void before() throws Exception { +if (TestRuleIgnoreTestSuites.isRunningNested()) { + // Ignore nested test suites that test the test framework itself. + return; +} + +String suiteName = RandomizedContext.current().getTargetClass().getName(); + +// You can use this helper method to dump all suite names to a file. +// Run gradle with one worker so that it doesn't try to append to the same +// file from multiple processes: +// +// gradlew test --max-workers 1 -Dtests.useSecurityManager=false +// +// dumpSuiteNamesOnly(suiteName); + +if (!ALLOWED_CONVENTION.matcher(suiteName).matches()) { + // if this class exists on the exception list, leave it. Review comment: It's possible (though rare) that both `TestFooBar.java` and `FooBarTest.java` classes co-exist. I wonder if the `ALLOW_CONVENTION` and `test-naming-exceptions.txt` logic might be mutually exclusive i.e. when something is on the exclusion list then its opposite is not valid i.e. the excluded test may be renamed (and removed from the exclusion list) but until that is done the conventional naming is discouraged to avoid confusion between the two variants? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on pull request #1823: SOLR-14510: Remove deprecations added with BMW support
cpoerschke commented on pull request #1823: URL: https://github.com/apache/lucene-solr/pull/1823#issuecomment-687200873 > Remove deprecations added with BMW support > ... It would be useful to add this deprecation information ... +1 to make it a clearer what is being deprecated. "BMW support" at first glance made me think of the car manufacturer but no, it's not that but "BlockMax WAND support" instead. To the reader of the deprecation information, does it matter why the thing that is being removed was deprecated, I wonder? If not then something like _"Remove deprecated writeStartDocumentList variant in TextResponseWriter and its sub-classes."_ could work perhaps, though it's rather long, hmm. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1828: LUCENE-9497: add google error prone checks
madrob commented on a change in pull request #1828: URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483663516 ## File path: gradle/validation/error-prone.gradle ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +allprojects { prj -> + plugins.withType(JavaPlugin) { +prj.apply plugin: 'net.ltgt.errorprone' + +dependencies { + errorprone("com.google.errorprone:error_prone_core") +} + +tasks.withType(JavaCompile) { task -> + options.errorprone.errorproneArgs = [ Review comment: I would also like to see `options.errorprone.disableWarningsInGeneratedCode = true` ## File path: gradle/validation/error-prone.gradle ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +allprojects { prj -> + plugins.withType(JavaPlugin) { +prj.apply plugin: 'net.ltgt.errorprone' + +dependencies { + errorprone("com.google.errorprone:error_prone_core") +} + +tasks.withType(JavaCompile) { task -> + options.errorprone.errorproneArgs = [ + // test + '-Xep:ExtendingJUnitAssert:OFF', Review comment: Personal style, but I think ```options.errorprone { disable 'ExtendingJUnitAssert' }``` is more clear than using `-Xep`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] HoustonPutman commented on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation
HoustonPutman commented on pull request #1684: URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687193971 I completely agree with David and Andrzej on all points. No one is going through the comments with a fine tooth comb, it is blatantly disrespectful. And doubling down instead of acknowledging and apologizing makes it even worse. To your point Noble, Apache's motto is "Community over Code". There is no reason to put up with rudeness because someone graces us with a PR review. It is easier to review a PR and be dismissive and rude, but it is infinitely healthier and more constructive to be empathetic and kind. It also leads to a community that is more willing to contribute and collaborate. It's reasonable to expect mutual respect within the Lucene/Solr community. If we are in the place where we should accept any type of language when someone graces us with a review, then that is something we need to seriously address. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190758#comment-17190758 ] Markus Kalkbrenner commented on SOLR-13973: --- I just wanted to emphasize Solr's usage in general in the PHP world and not pretend that the removal of Tika will break thousands of installations: {quote}For sure, just a few of all these installations will use Tika indirectly via the extraction handler. {quote} With Solarium and Search API Solr we always focus the latest Solr version! For both we recently had to go back to 8.5 because of SOLR-14768 because of the test failures. BTW I think I should contribute to your documentation regarding libraries for different programming languages. Nothing else than solarium should be mentioned anymore for PHP. Most major CMS, Shop Systems, ... agreed to base their Solr integration on this library. But this gets off-topic here. I understand that you want to get Tika out of the VM and the out of the build dependencies. Go for it :) I reached my goal to create some awareness for third party concerns. And it seems that SOLR-7632 is a reasonable compromise. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9451) Sort.rewrite doesn't always return this when unchanged
[ https://issues.apache.org/jira/browse/LUCENE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190755#comment-17190755 ] ASF subversion and git services commented on LUCENE-9451: - Commit 6c94ca9cb33795cdc29797ff2d17f1869813d3f9 in lucene-solr's branch refs/heads/master from Mike Drob [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6c94ca9 ] LUCENE-9451 Sort.rewrite does not always return this when unchanged (#1731) > Sort.rewrite doesn't always return this when unchanged > -- > > Key: LUCENE-9451 > URL: https://issues.apache.org/jira/browse/LUCENE-9451 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.7 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Sort.rewrite doesn't always return {{this}} as advertised in the Javadoc even > if the underlying fields are unchanged. This is because the comparison uses > reference equality. > There are two solutions we can do here, 1) switch from reference equality to > object equality, and 2) fix some of the underlying sort fields to not create > unnecessary objects. > cc: [~jpountz] [~romseygeek] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9451) Sort.rewrite doesn't always return this when unchanged
[ https://issues.apache.org/jira/browse/LUCENE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved LUCENE-9451. --- Fix Version/s: master (9.0) Resolution: Fixed > Sort.rewrite doesn't always return this when unchanged > -- > > Key: LUCENE-9451 > URL: https://issues.apache.org/jira/browse/LUCENE-9451 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.7 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > Sort.rewrite doesn't always return {{this}} as advertised in the Javadoc even > if the underlying fields are unchanged. This is because the comparison uses > reference equality. > There are two solutions we can do here, 1) switch from reference equality to > object equality, and 2) fix some of the underlying sort fields to not create > unnecessary objects. > cc: [~jpountz] [~romseygeek] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #1731: LUCENE-9451 Sort.rewrite does not always return this when unchanged
madrob merged pull request #1731: URL: https://github.com/apache/lucene-solr/pull/1731 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke opened a new pull request #1832: SOLR-14831: remove deprecated-and-unused "facet.distrib.mco" constant
cpoerschke opened a new pull request #1832: URL: https://github.com/apache/lucene-solr/pull/1832 https://issues.apache.org/jira/browse/SOLR-14831 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14831) remove deprecated-and-unused "facet.distrib.mco" constant
Christine Poerschke created SOLR-14831: -- Summary: remove deprecated-and-unused "facet.distrib.mco" constant Key: SOLR-14831 URL: https://issues.apache.org/jira/browse/SOLR-14831 Project: Solr Issue Type: Task Components: SolrJ Reporter: Christine Poerschke Assignee: Christine Poerschke This is ready for removal e.g. as per the https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java#L139-L144 comment: {code} * @deprecated * This option is no longer used nor will if affect any queries as the fix has been built in. (SOLR-11711) * This will be removed entirely in 8.0.0 */ @Deprecated public static final String FACET_DISTRIB_MCO = FACET_DISTRIB + ".mco"; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server
[ https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190751#comment-17190751 ] Alexandre Rafalovitch commented on SOLR-7632: - I agree on the critical path. I was just wondering whether, given the number of internal changes and explanations required on release, it makes sense to also make it into a more flexible architecture on the Solr side. Making it URP, I think would allow to compose it with other pipeline elements in different order (e.g. preprocess file name, feed to Tika, apply DateParser), or possibly even distribute the load by running it on each node, instead of as first step. But that's just an idea. If others do not see the benefits, it is not worth chasing. > Change the ExtractingRequestHandler to use Tika-Server > -- > > Key: SOLR-7632 > URL: https://issues.apache.org/jira/browse/SOLR-7632 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Reporter: Chris A. Mattmann >Priority: Major > Labels: gsoc2017, memex > > It's a pain to upgrade Tika's jars all the times when we release, and if Tika > fails it messes up the ExtractingRequestHandler (e.g., the document type > caused Tika to fail, etc). A more reliable way and also separated, and easier > to deploy version of the ExtractingRequestHandler would make a network call > to the Tika JAXRS server, and then call Tika on the Solr server side, get the > results and then index the information that way. I have a patch in the works > from the DARPA Memex project and I hope to post it soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server
[ https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190745#comment-17190745 ] Erick Erickson commented on SOLR-7632: -- The critical bit is moving it out of the Solr JVM. How would moving it to a URP help that issue? > Change the ExtractingRequestHandler to use Tika-Server > -- > > Key: SOLR-7632 > URL: https://issues.apache.org/jira/browse/SOLR-7632 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Reporter: Chris A. Mattmann >Priority: Major > Labels: gsoc2017, memex > > It's a pain to upgrade Tika's jars all the times when we release, and if Tika > fails it messes up the ExtractingRequestHandler (e.g., the document type > caused Tika to fail, etc). A more reliable way and also separated, and easier > to deploy version of the ExtractingRequestHandler would make a network call > to the Tika JAXRS server, and then call Tika on the Solr server side, get the > results and then index the information that way. I have a patch in the works > from the DARPA Memex project and I hope to post it soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190728#comment-17190728 ] Alexandre Rafalovitch commented on SOLR-13973: -- Sanity check on the links and numbers (I use both Drupal 8/9 and Solr, though not currently together): 1) [https://www.drupal.org/project/apachesolr] is for Drupal 7 and before. Those people will also continue using Solr 4 or whatever was the configuration last updated to (2 years ago at the latest) 2) [https://www.drupal.org/project/search_api_solr] starting from v4 release does support Drupal 8.8+/9 (only) and Solr versions more recent than Solr 6.4. It was released in May 2020 and updated since. The current adoption is probably still fairly low, but will accelerate next year, once previous release version is no longer supported (notice said December). Drupal was never known for chasing latest Solr version as they have their own configuration that is designed for field definitions with wildcards and maybe only recently (if at all) with managed-schema API manipulation. They can also can keep using Solr 8 for another 5-6 years with Tika built in. If Tika is removed from Solr (in version 9 the earliest), this will only affect the choices of those setting up new Drupal installation and wanting new features of Solr 9. At that point (say in 4 years), we can figure something out for Solr 11. Most likely a variation on preconfigured Solr and Tika colocated in a Docker container. On the other hand, I honestly don't know much about solarium library directly. Perhaps it is a serious issue there, though we have to look again at number of active installations*percentage of those using /extract handler*percentage of people able to run _latest_ Solr process but not a second (also Java) process. So, to me, this sounds less like a -1, then as an awareness for a bit of an extra education around that edge case. And, yes, awareness of the greater community; something we really need to pay more attention to in general. Of course, any improvement of workflow we can do between Solr and Tika, both standalone, would be very good regardless of this particular use case. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190720#comment-17190720 ] Markus Kalkbrenner commented on SOLR-13973: --- {quote}So that'd be SOLR-7632 as [~erickerickson] pointed out? {quote} Yes, sounds like it. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh
[ https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190718#comment-17190718 ] ASF subversion and git services commented on SOLR-14704: Commit 65da5ed32c940529b27a518deb8ffd1e61aa2e96 in lucene-solr's branch refs/heads/master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=65da5ed ] SOLR-14704 add download option to cloud.sh (#1715) > Add download option to solr/cloud-dev/cloud.sh > -- > > Key: SOLR-14704 > URL: https://issues.apache.org/jira/browse/SOLR-14704 > Project: Solr > Issue Type: New Feature > Components: scripts and tools >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > For easier testing of things like RC artifacts I'm adding an option to > cloud.sh which will curl a tarball down from the web instead of building it > locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gus-asf merged pull request #1715: SOLR-14704 add download option to cloud.sh
gus-asf merged pull request #1715: URL: https://github.com/apache/lucene-solr/pull/1715 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190716#comment-17190716 ] Tim Allison commented on SOLR-13973: So that'd be SOLR-7632 as [~erickerickson] pointed out? > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190714#comment-17190714 ] Markus Kalkbrenner commented on SOLR-13973: --- {quote}to get Tika out of Solr's jvm {quote} I understand that goal. {quote}I've been thinking about adding an "indexer" endpoint to Tika. You'd configure your Solr/ES connection info and error handling choices via json at startup and then send the bytes to tika-server's /indexer endpoint. It would parse the file and forward the result to Solr. Would that simplify anything? {quote} I think that makes sense. A good approach would be if Solr keeps its "API" for the clients, in other words the extraction handler. The new implementation of the extraction handler would forward the document to the new endpoint of the standalone Tika server and handle its response. This approach would keep the complexity of a new connection with its own new API away from the clients. the new handler should be available when the old one gets deprecated. And don't get me wrong. I really appreciate all your hard work! And our PHP stuff would be nothing without Solr ;) > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14821) docValuesTermsFilter should support single-valued docValues fields
[ https://issues.apache.org/jira/browse/SOLR-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190712#comment-17190712 ] Jason Gerlowski commented on SOLR-14821: Ah, hmm, I can see it locally now - I initially couldn't because I was specifying {{docValuesTermsFIlter}} but didn't have enough terms for {{docValuesTermsFilterTopLevel}} to be chosen - which is where the actual problem is. Trivial to reproduce locally if you specify dVTFTL as your method directly. The test code I mentioned in my comment above must actually be multi-valued for {{author_s}}? In any case your fix is straightforward and correct. Just need to fix the test up before committing. > docValuesTermsFilter should support single-valued docValues fields > -- > > Key: SOLR-14821 > URL: https://issues.apache.org/jira/browse/SOLR-14821 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Anatolii Siuniaev >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-14821.patch > > > SOLR-13890 introduced a post-filter implementation for docValuesTermsFilter > in TermsQParserPlugin. But now it supports only multi-valued docValues > fields (i.e. SORTED_SET type DocValues) > It doesn't work for single-valued docValues fields (i.e. SORTED type > DocValues), though it should. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190711#comment-17190711 ] Tim Allison commented on SOLR-13973: [~mkalkbrenner] I've been thinking about adding an "indexer" endpoint to Tika. You'd configure your Solr/ES connection info and error handling choices via json at startup and then send the bytes to tika-server's /indexer endpoint. It would parse the file and forward the result to Solr. Would that simplify anything? I'm thoroughly on board with "don't break the user experience", but we've got to get Tika out of Solr's jvm. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9507) Custom order for leaves in DirectoryReader, IndexWriter and searcher
Jim Ferenczi created LUCENE-9507: Summary: Custom order for leaves in DirectoryReader, IndexWriter and searcher Key: LUCENE-9507 URL: https://issues.apache.org/jira/browse/LUCENE-9507 Project: Lucene - Core Issue Type: New Feature Reporter: Jim Ferenczi Now that we're able [to skip documents efficiently when sorting by a numeric field|https://issues.apache.org/jira/browse/LUCENE-9280], I was wondering if we could optimize sorted queries further by also sorting the leaf readers based on the primary sort. For time-based indices in Elasticsearch, we've implemented an optimization that does that at query time. If the query is sorted by a numeric docvalue field, prior to search, we sort the leaves according to the query sort. When sorting by timestamp this small optimization can have a big impact since early termination can be reached much faster if the sort values in the segments don't overlap too much. Applying this optimization at query time is challenging , it has the benefit to work on any numeric field sort and order but it requires to use a multi-reader that will reorganize the segments. It can also be deceptive that after a force merge to 1 segment sorted queries may be slower since there is nothing to sort anymore. So, another option that I look at is to add the ability to provide a leaf order directly in the IndexWriter and DirectoryReader. That could be similar to an index sort or even complementary to it since sorting segments based on the index sort could also help at query time. For time-based indices that cannot afford index sorting but have lots of sorted queries on timestamp, forcing the order of segments could speed up sorted queries significantly. The advantage of forcing a single leaf sort in the writer/reader is that we can also use it to influence the merges by putting the segments with the highest value first. That would help with the case of indices that are merged to a single segment but would like to keep the sorted queries fast but also for the multi-segments case since big segments would have more chance to have highest values first too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190704#comment-17190704 ] Erick Erickson commented on SOLR-13973: --- 7632 is at least related... > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-7633) Change the ExtractingRequestHandler to use Tika-Server
[ https://issues.apache.org/jira/browse/SOLR-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190703#comment-17190703 ] Erick Erickson commented on SOLR-7633: -- Might be able to close this one, there's more commentary on 7632. > Change the ExtractingRequestHandler to use Tika-Server > -- > > Key: SOLR-7633 > URL: https://issues.apache.org/jira/browse/SOLR-7633 > Project: Solr > Issue Type: Improvement > Components: contrib - Solr Cell (Tika extraction) >Reporter: Chris A. Mattmann >Priority: Major > Labels: memex > Fix For: 5.0.1 > > > It's a pain to upgrade Tika's jars all the times when we release, and if Tika > fails it messes up the ExtractingRequestHandler (e.g., the document type > caused Tika to fail, etc). A more reliable way and also separated, and easier > to deploy version of the ExtractingRequestHandler would make a network call > to the Tika JAXRS server, and then call Tika on the Solr server side, get the > results and then index the information that way. I have a patch in the works > from the DARPA Memex project and I hope to post it soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on pull request #1813: SOLR-14613: No new APIs. use the existing APIs
noblepaul commented on pull request #1813: URL: https://github.com/apache/lucene-solr/pull/1813#issuecomment-687126415 >Really? This is like saying that we only ever need collection admin APIs and we don't need any autoscaling. I don't think I made myself clear. Users would definitely like Solr to place the replicas correctly. But if it means implementing a some plugin in java and packaging it in a jar & deploying it in their cluster, they would rather not do it. If it's as easy as writing down some DSL, they may use it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part
murblanc commented on a change in pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483593482 ## File path: solr/core/src/java/org/apache/solr/cluster/scheduler/Schedulable.java ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.scheduler; + +/** + * Component to be scheduled and executed according to the schedule. + */ +public interface Schedulable { + + Schedule getSchedule(); + + /** + * Execute the component. + * NOTE: this should be a lightweight method that executes quickly, to avoid blocking the + * execution of other schedules. If it requires more work it should do this in a separate thread. Review comment: If a scheduled component starts a new thread to do its work, the schedule is going to get skewed pretty quickly and we might have multiple "copies" of the scheduled component being started in parallel. We'd be delegating the responsibility of insuring a single executing instance of the component (on a given node where it was registered) to the component itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part
murblanc commented on a change in pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483592144 ## File path: solr/core/src/java/org/apache/solr/cluster/scheduler/SolrScheduler.java ## @@ -0,0 +1,27 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.scheduler; + +/** + * + */ +public interface SolrScheduler { + + void registerSchedulable(Schedulable schedulable); Review comment: How is this method going to be called in practice? Do we assume each node (in the SolrCloud cluster) will register all tasks locally (those that are not `ClusterSingleton`) and that multiple instances are going to run in parallel? Or will a `ClusterSingleton` task be registering other scheduled tasks that as a consequence will only have a single instance running on the cluster? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14821) docValuesTermsFilter should support single-valued docValues fields
[ https://issues.apache.org/jira/browse/SOLR-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190701#comment-17190701 ] Jason Gerlowski commented on SOLR-14821: We do have a test for this it turns out, here's the relevant bit from TestTermsQParserPlugin: {code} @Test public void testTermsMethodEquivalency() { // Run queries with a variety of 'method' and postfilter options. final TermsParams[] methods = new TermsParams[] { ... new TermsParams("docValuesTermsFilter", true), new TermsParams("docValuesTermsFilter", false), new TermsParams("docValuesTermsFilterTopLevel", true), new TermsParams("docValuesTermsFilterTopLevel", false), new TermsParams("docValuesTermsFilterPerSegment", true), new TermsParams("docValuesTermsFilterPerSegment", false) }; for (TermsParams method : methods) { // Single-valued field, single term value ModifiableSolrParams params = new ModifiableSolrParams(); params.add("q", method.buildQuery("author_s", "Robert Jordan")); params.add("sort", "id asc"); assertQ(req(params, "indent", "on"), "*[count(//doc)=2]", "//result/doc[1]/str[@name='id'][.='2']", "//result/doc[2]/str[@name='id'][.='3']" ); // Single-valued field, multiple term values params = new ModifiableSolrParams(); params.add("q", method.buildQuery("author_s", "Robert Jordan,Isaac Asimov")); params.add("sort", "id asc"); assertQ(req(params, "indent", "on"), "*[count(//doc)=3]", "//result/doc[1]/str[@name='id'][.='2']", "//result/doc[2]/str[@name='id'][.='3']", "//result/doc[3]/str[@name='id'][.='7']" ); ... } } {code} I'm not able to reproduce locally on master either. Do you have an easy way to reproduce this [~anatolii_siuniaev]? > docValuesTermsFilter should support single-valued docValues fields > -- > > Key: SOLR-14821 > URL: https://issues.apache.org/jira/browse/SOLR-14821 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Reporter: Anatolii Siuniaev >Assignee: Jason Gerlowski >Priority: Minor > Attachments: SOLR-14821.patch > > > SOLR-13890 introduced a post-filter implementation for docValuesTermsFilter > in TermsQParserPlugin. But now it supports only multi-valued docValues > fields (i.e. SORTED_SET type DocValues) > It doesn't work for single-valued docValues fields (i.e. SORTED type > DocValues), though it should. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part
murblanc commented on a change in pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483590300 ## File path: solr/core/src/java/org/apache/solr/cluster/scheduler/impl/SolrSchedulerImpl.java ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.scheduler.impl; + +import java.lang.invoke.MethodHandles; +import java.time.Instant; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; + +import org.apache.solr.cloud.ClusterSingleton; +import org.apache.solr.cluster.scheduler.Schedule; +import org.apache.solr.cluster.scheduler.Schedulable; +import org.apache.solr.cluster.scheduler.SolrScheduler; +import org.apache.solr.common.util.SolrNamedThreadFactory; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Scheduled executions are triggered at most with {@link #SCHEDULE_INTERVAL_SEC} interval. + * Each registered {@link Schedulable} is processed sequentially and if its next execution time + * is in the past its {@link Schedulable#run()} method will be invoked. + * NOTE: If the total time of execution of all registered Schedulable-s exceeds any schedule + * interval then exact execution times will be silently missed. + */ +public class SolrSchedulerImpl implements SolrScheduler, ClusterSingleton { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public static final int SCHEDULE_INTERVAL_SEC = 10; Review comment: Instead of polling for tasks to run every 10 seconds could we be smart and set the next execution time of the scheduler to when the next job needs to be run? Is it worth the additional investment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13652) Remove update from initParams in example solrconfig files that only mention "df"
[ https://issues.apache.org/jira/browse/SOLR-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190697#comment-17190697 ] Alexandre Rafalovitch commented on SOLR-13652: -- Added a section on Learning vs Production vs Kitchen Sink schema. Still needs more thought, especially on production part. > Remove update from initParams in example solrconfig files that only mention > "df" > > > Key: SOLR-13652 > URL: https://issues.apache.org/jira/browse/SOLR-13652 > Project: Solr > Issue Type: Improvement > Components: examples >Reporter: Erick Erickson >Priority: Minor > Labels: easyfix, newbie > > At least some of the solrconfig files we ship have this entry: > path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,update"> > > text > > > > which has lead at least one user to wonder if there's some kind of automatic > way to have the df field populated for updates. I don't even know how you'd > send an update that didn't have a specific field. We should remove the > "update/**". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part
murblanc commented on a change in pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483584416 ## File path: solr/core/src/java/org/apache/solr/cluster/scheduler/impl/CompiledSchedule.java ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.scheduler.impl; + +import java.lang.invoke.MethodHandles; +import java.text.ParseException; +import java.time.Instant; +import java.time.format.DateTimeFormatter; +import java.time.format.DateTimeFormatterBuilder; +import java.time.temporal.ChronoField; +import java.util.Date; +import java.util.Locale; +import java.util.TimeZone; + +import org.apache.solr.cluster.scheduler.Schedule; +import org.apache.solr.common.SolrException; +import org.apache.solr.util.DateMathParser; +import org.apache.solr.util.TimeZoneUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A version of {@link Schedule} where some of the fields are already resolved. Review comment: Given this class does not implement `Schedule` (didn't get yet to the point where it's used) maybe its name or this comment should be clarified? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part
murblanc commented on a change in pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483583874 ## File path: solr/core/src/java/org/apache/solr/cluster/scheduler/impl/CompiledSchedule.java ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cluster.scheduler.impl; + +import java.lang.invoke.MethodHandles; +import java.text.ParseException; +import java.time.Instant; +import java.time.format.DateTimeFormatter; +import java.time.format.DateTimeFormatterBuilder; +import java.time.temporal.ChronoField; +import java.util.Date; +import java.util.Locale; +import java.util.TimeZone; + +import org.apache.solr.cluster.scheduler.Schedule; +import org.apache.solr.common.SolrException; +import org.apache.solr.util.DateMathParser; +import org.apache.solr.util.TimeZoneUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A version of {@link Schedule} where some of the fields are already resolved. + */ +class CompiledSchedule { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + final String name; + final TimeZone timeZone; + final Instant startTime; + final String interval; + final DateMathParser dateMathParser; + + Instant lastRunAt; + + /** + * Compile a schedule. + * @param schedule schedule. + * @throws Exception if startTime or interval cannot be parsed. + */ + CompiledSchedule(Schedule schedule) throws Exception { +this.name = schedule.getName(); +this.timeZone = TimeZoneUtils.getTimeZone(schedule.getTimeZone()); +this.startTime = parseStartTime(new Date(), schedule.getStartTime(), timeZone); +this.lastRunAt = startTime; +this.interval = schedule.getInterval(); +this.dateMathParser = new DateMathParser(timeZone); +// this is just to verify that the interval math is valid +shouldRun(); + } + + private Instant parseStartTime(Date now, String startTimeStr, TimeZone timeZone) throws Exception { +try { + // try parsing startTime as an ISO-8601 date time string + return DateMathParser.parseMath(now, startTimeStr).toInstant(); +} catch (SolrException e) { + if (e.code() != SolrException.ErrorCode.BAD_REQUEST.code) { +throw new Exception("startTime: error parsing value '" + startTimeStr + "': " + e.toString()); + } +} +DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder() + .append(DateTimeFormatter.ISO_LOCAL_DATE).appendPattern("['T'[HH[:mm[:ss") +.parseDefaulting(ChronoField.HOUR_OF_DAY, 0) +.parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0) +.parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0) +.toFormatter(Locale.ROOT).withZone(timeZone.toZoneId()); +try { + return Instant.from(dateTimeFormatter.parse(startTimeStr)); +} catch (Exception e) { + throw new Exception("startTime: error parsing startTime '" + startTimeStr + "': " + e.toString()); +} + } + + /** + * Returns true if the last run + run interval is already in the past. + */ + boolean shouldRun() { +dateMathParser.setNow(new Date(lastRunAt.toEpochMilli())); +Instant nextRunTime; +try { + Date next = dateMathParser.parseMath(interval); + nextRunTime = next.toInstant(); +} catch (ParseException e) { + log.warn("Invalid math expression, skipping: " + e); + return false; +} +if (Instant.now().isAfter(nextRunTime)) { + return true; +} else { + return false; +} + } + + /** + * This setter MUST be invoked after each run. + * @param lastRunAt time when the schedule was last run. + */ + void setLastRunAt(Instant lastRunAt) { Review comment: Unclear if `lastRunAt` is the time the schedule last started or last completed. If completed, can we simplify by not passing an instant and building it here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part
murblanc commented on a change in pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483581489 ## File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java ## @@ -775,6 +779,42 @@ private void doCompatCheck(BiConsumer consumer) { } } + /** + * Start {@link ClusterSingleton} plugins when we become the leader. + */ + private void startClusterSingletons() { +PluginBag handlers = getCoreContainer().getRequestHandlers(); +if (handlers == null) { + return; +} +handlers.keySet().forEach(handlerName -> { + SolrRequestHandler handler = handlers.get(handlerName); + if (handler instanceof ClusterSingleton) { +try { + ((ClusterSingleton) handler).start(); +} catch (Exception e) { + log.warn("Exception starting ClusterSingleton " + handler, e); +} + } +}); + } + + /** + * Stop {@link ClusterSingleton} plugins when we lose leadership. + */ + private void stopClusterSingletons() { +PluginBag handlers = getCoreContainer().getRequestHandlers(); Review comment: Should we stop currently configured `ClusterSingleton` handlers or rather stop all those we've started? Is the configuration immutable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part
murblanc commented on a change in pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483580777 ## File path: solr/core/src/java/org/apache/solr/cloud/ClusterSingleton.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.cloud; + +/** + * Intended for {@link org.apache.solr.core.CoreContainer} plugins that should be + * enabled only one instance per cluster. + * Components that implement this interface are always in one of two states: + * + * STOPPED - the default state. The component is idle and does not perform Review comment: Should we add `STARTING` and `STOPPING` states? Assuming the call to `start()` waits until the plugin has completed its startup (and similarly the call to `stop()` waiting for it to stop) might be expensive, and as implemented `start()` delays the `Overseer` starting in general. Possibly waiting for `stop()` to complete makes sense (in order to guarantee no two `ClusterSingleton` plugins are running concurrently on the cluster, so maybe state `STOPPING` is not needed), but I'd think we don't need to wait for `start()` to have completed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram opened a new pull request #1831: SOLR-14749 the scheduler part
sigram opened a new pull request #1831: URL: https://github.com/apache/lucene-solr/pull/1831 See PR 1758 for the background on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on pull request #1815: SOLR-14151: Bug fixes
noblepaul commented on pull request #1815: URL: https://github.com/apache/lucene-solr/pull/1815#issuecomment-687092388 > Noble, could you be more specific about the bugs that this is fixing? I'm not sure exactly what the bug is `TestBulkSchemaConcurrent` was failing consistently after the original commit.So, I ensured that it passes consistently This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1815: SOLR-14151: Bug fixes
noblepaul commented on a change in pull request #1815: URL: https://github.com/apache/lucene-solr/pull/1815#discussion_r483562559 ## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ## @@ -1582,6 +1582,13 @@ private CoreDescriptor reloadCoreDescriptor(CoreDescriptor oldDesc) { public void reload(String name) { reload(name, null); } + public void reload(String name, UUID coreId, boolean async) { +if(async) { + runAsync(() -> reload(name, coreId)); +} else { + reload(name, coreId); +} + } Review comment: It's a generic method. I thought I would use it. But there are bugs in our core reloading. So, if I use asyn reload, some tests fail This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1815: SOLR-14151: Bug fixes
noblepaul commented on a change in pull request #1815: URL: https://github.com/apache/lucene-solr/pull/1815#discussion_r483561919 ## File path: solr/core/src/java/org/apache/solr/core/ConfigSetService.java ## @@ -81,8 +81,7 @@ public final ConfigSet loadConfigSet(CoreDescriptor dcore) { ) ? false: true; SolrConfig solrConfig = createSolrConfig(dcore, coreLoader, trusted); - IndexSchema indexSchema = createIndexSchema(dcore, solrConfig, false); - return new ConfigSet(configSetName(dcore), solrConfig, force -> indexSchema, properties, trusted); + return new ConfigSet(configSetName(dcore), solrConfig, force -> createIndexSchema(dcore, solrConfig, force), properties, trusted); Review comment: Yes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190681#comment-17190681 ] Markus Kalkbrenner edited comment on SOLR-13973 at 9/4/20, 11:18 AM: - {quote}Perhaps even a simple Tika integration in SolrJ would make sense, making it super simple to do the extraction on client side, which is probably what most users should consider anyway. {quote} As maintainer of Solarium, the major PHP Client for Solr, and of the Solr-Drupal-Integration I know that there're users and Solr Service Providers who rely on the ExtractionHandler and the out-of-the-box experience as [~AndrewGr] described. Even if I understand your motivation as a developer, moving the workflow to the client side will put a significant work load on other developers, even if you add Tika support to SolrJ. Maybe the amount of people who use Solr in combination with a different programming language is higher compared to the amount of Java projects which use SolrJ. There're more than 58,000 active Drupal installations using Solr as search backend today: [https://www.drupal.org/project/usage/search_api_solr] https://www.drupal.org/project/usage/apachesolr github lists 895 repositories that directly depend on the PHP solarium library: [https://github.com/solariumphp/solarium/network/dependents] These includes packages from other PHP frameworks like symfony, laravel, typo3, wordpress, ... Nearly 200,000 composer based build processes of PHP projects pulled the solarium library within the last 30 days: [https://packagist.org/packages/solarium/solarium/stats#major/all] For sure, just a few of all these installations will use Tika indirectly via the extraction handler. But it won't be an easy task to add a stand alone Tika server to their stack. I know a lot of hosters who don't provide it yet to their customers. I won't say that you shouldn't deprecate the embedded Tika at all. But take careful steps and be aware of the fact that the community of Solr users might be much greater as you think due to the out-of-the-box solutions that exist, especially in the PHP world. BTW SOLR-14768 has been detected automatically by the automated integration tests of the solarium library and also by the automated integration tests of the Search API Solr Drupal module! was (Author: mkalkbrenner): {quote}Perhaps even a simple Tika integration in SolrJ would make sense, making it super simple to do the extraction on client side, which is probably what most users should consider anyway. {quote} As maintainer of Solarium, the major PHP Client for Solr, and of the Solr-Drupal-Integration I know that there're users and Solr Service Providers who rely on the ExtractionHandler and the out-of-the-box experience as [~AndrewGr] described. Even if I understand your motivation as a developer, moving the workflow to the client side will put a significant work load on other developers, even if you add Tika support to SolrJ. Maybe the amount of people who use Solr in combination with a different programming language is higher compared to the amount of Java projects which use SolrJ. There're more than 40,000 active Drupal installations using Solr as search backend today: [https://www.drupal.org/project/usage/search_api_solr] github lists 895 repositories that directly depend on the PHP solarium library: [https://github.com/solariumphp/solarium/network/dependents] These includes packages from other PHP frameworks like symfony, laravel, typo3, wordpress, ... Nearly 200,000 composer based build processes of PHP projects pulled the solarium library within the last 30 days: [https://packagist.org/packages/solarium/solarium/stats#major/all] For sure, just a few of all these installations will use Tika indirectly via the extraction handler. But it won't be an easy task to add a stand alone Tika server to their stack. I know a lot of hosters who don't provide it yet to their customers. I won't say that you shouldn't deprecate the embedded Tika at all. But take careful steps and be aware of the fact that the community of Solr users might be much greater as you think due to the out-of-the-box solutions that exist, especially in the PHP world. BTW SOLR-14768 has been detected automatically by the automated integration tests of the solarium library and also by the automated integration tests of the Search API Solr Drupal module! > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190681#comment-17190681 ] Markus Kalkbrenner commented on SOLR-13973: --- {quote}Perhaps even a simple Tika integration in SolrJ would make sense, making it super simple to do the extraction on client side, which is probably what most users should consider anyway. {quote} As maintainer of Solarium, the major PHP Client for Solr, and of the Solr-Drupal-Integration I know that there're users and Solr Service Providers who rely on the ExtractionHandler and the out-of-the-box experience as [~AndrewGr] described. Even if I understand your motivation as a developer, moving the workflow to the client side will put a significant work load on other developers, even if you add Tika support to SolrJ. Maybe the amount of people who use Solr in combination with a different programming language is higher compared to the amount of Java projects which use SolrJ. There're more than 40,000 active Drupal installations using Solr as search backend today: [https://www.drupal.org/project/usage/search_api_solr] github lists 895 repositories that directly depend on the PHP solarium library: [https://github.com/solariumphp/solarium/network/dependents] These includes packages from other PHP frameworks like symfony, laravel, typo3, wordpress, ... Nearly 200,000 composer based build processes of PHP projects pulled the solarium library within the last 30 days: [https://packagist.org/packages/solarium/solarium/stats#major/all] For sure, just a few of all these installations will use Tika indirectly via the extraction handler. But it won't be an easy task to add a stand alone Tika server to their stack. I know a lot of hosters who don't provide it yet to their customers. I won't say that you shouldn't deprecate the embedded Tika at all. But take careful steps and be aware of the fact that the community of Solr users might be much greater as you think due to the out-of-the-box solutions that exist, especially in the PHP world. BTW SOLR-14768 has been detected automatically by the automated integration tests of the solarium library and also by the automated integration tests of the Search API Solr Drupal module! > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #1725: LUCENE-9449 Skip docs with _doc sort and "after"
jimczi commented on a change in pull request #1725: URL: https://github.com/apache/lucene-solr/pull/1725#discussion_r483531827 ## File path: lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java ## @@ -290,5 +299,114 @@ public void testFloatSortOptimization() throws IOException { dir.close(); } + public void testDocSortOptimizationWithAfter() throws IOException { +final Directory dir = newDirectory(); +final IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig()); +final int numDocs = atLeast(1500); +for (int i = 0; i < numDocs; ++i) { + final Document doc = new Document(); + writer.addDocument(doc); + if ((i > 0) && (i % 500 == 0)) { +writer.commit(); + } +} +final IndexReader reader = DirectoryReader.open(writer); +IndexSearcher searcher = new IndexSearcher(reader); +final int numHits = 3; +final int totalHitsThreshold = 3; +final int searchAfter = 1400; + +// sort by _doc with search after should trigger optimization +{ + final Sort sort = new Sort(FIELD_DOC); + FieldDoc after = new FieldDoc(searchAfter, Float.NaN, new Integer[]{searchAfter}); + final TopFieldCollector collector = TopFieldCollector.create(sort, numHits, after, totalHitsThreshold); + searcher.search(new MatchAllDocsQuery(), collector); + TopDocs topDocs = collector.topDocs(); + assertEquals(topDocs.scoreDocs.length, numHits); + for (int i = 0; i < numHits; i++) { +int expectedDocID = searchAfter + 1 + i; +assertEquals(expectedDocID, topDocs.scoreDocs[i].doc); + } + assertTrue(collector.isEarlyTerminated()); + // check that very few hits were collected, and most hits before searchAfter were skipped + assertTrue(topDocs.totalHits.value < (numDocs - searchAfter)); +} + +// sort by _doc + _score with search after should trigger optimization +{ + final Sort sort = new Sort(FIELD_DOC, FIELD_SCORE); + FieldDoc after = new FieldDoc(searchAfter, Float.NaN, new Object[]{searchAfter, 1.0f}); + final TopFieldCollector collector = TopFieldCollector.create(sort, numHits, after, totalHitsThreshold); + searcher.search(new MatchAllDocsQuery(), collector); + TopDocs topDocs = collector.topDocs(); + assertEquals(topDocs.scoreDocs.length, numHits); + for (int i = 0; i < numHits; i++) { +int expectedDocID = searchAfter + 1 + i; +assertEquals(expectedDocID, topDocs.scoreDocs[i].doc); + } + assertTrue(collector.isEarlyTerminated()); + // assert that very few hits were collected, and most hits before searchAfter were skipped + assertTrue(topDocs.totalHits.value < (numDocs - searchAfter)); +} + +// sort by _doc desc should not trigger optimization +{ + final Sort sort = new Sort(new SortField(null, SortField.Type.DOC, true)); + FieldDoc after = new FieldDoc(searchAfter, Float.NaN, new Integer[]{searchAfter}); + final TopFieldCollector collector = TopFieldCollector.create(sort, numHits, after, totalHitsThreshold); + searcher.search(new MatchAllDocsQuery(), collector); + TopDocs topDocs = collector.topDocs(); + for (int i = 0; i < numHits; i++) { +int expectedDocID = searchAfter - 1 - i; +assertEquals(expectedDocID, topDocs.scoreDocs[i].doc); + } + assertEquals(topDocs.scoreDocs.length, numHits); + // assert that many hits were collected including all hits before searchAfter + assertTrue(topDocs.totalHits.value > searchAfter); + +} + +writer.close(); +reader.close(); +dir.close(); + } + + + public void testDocSortOptimization() throws IOException { +final Directory dir = newDirectory(); +final IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig()); +final int numDocs = atLeast(1500); Review comment: why do you need that many documents ? `100` should be enough, no ? ## File path: lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java ## @@ -290,5 +299,114 @@ public void testFloatSortOptimization() throws IOException { dir.close(); } + public void testDocSortOptimizationWithAfter() throws IOException { +final Directory dir = newDirectory(); +final IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig()); +final int numDocs = atLeast(1500); +for (int i = 0; i < numDocs; ++i) { + final Document doc = new Document(); + writer.addDocument(doc); + if ((i > 0) && (i % 500 == 0)) { +writer.commit(); + } +} +final IndexReader reader = DirectoryReader.open(writer); +IndexSearcher searcher = new IndexSearcher(reader); +final int numHits = 3; +final int totalHitsThreshold = 3; +final int searchAfter = 1400; + +// sort by _doc with search after should trigger optimization +{ + final Sort sort
[GitHub] [lucene-solr] mayya-sharipova commented on pull request #1725: LUCENE-9449 Skip docs with _doc sort and "after"
mayya-sharipova commented on pull request #1725: URL: https://github.com/apache/lucene-solr/pull/1725#issuecomment-687065707 @jimczi Thanks for the review so far, I am wondering if you have any further comments? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh commented on pull request #1827: SOLR-14792: Remove VelocityResponseWriter
epugh commented on pull request #1827: URL: https://github.com/apache/lucene-solr/pull/1827#issuecomment-687060583 @erikhatcher BTW, the .adoc format renders nicely in Github if you were to pull the ref guide docs over to https://github.com/erikhatcher/solritas. We could also update the link in the solr.cool entry to point directly to them, instead of the general github README page ;-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?
[ https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190659#comment-17190659 ] Dawid Weiss commented on LUCENE-9500: - bq. I missed to run precommit after cherry-picking. No, not you, Uwe! :) [yellow card emoji] > Did we hit a DEFLATE bug? > - > > Key: LUCENE-9500 > URL: https://issues.apache.org/jira/browse/LUCENE-9500 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.x, master (9.0), 8.7 >Reporter: Adrien Grand >Assignee: Uwe Schindler >Priority: Critical > Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, > jdk15 > Fix For: 8.x, master (9.0), 8.7 > > Attachments: PresetDictTest.java, test_data.txt > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I've been digging > [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/] > all day and managed to isolate a simple reproduction that shows the problem. > I've been starring at it all day and can't find what we are doing wrong, > which makes me wonder whether we're calling DEFLATE the wrong way or whether > we hit a DEFLATE bug. I've looked at it so much that I may be missing the > most obvious stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?
[ https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190658#comment-17190658 ] Uwe Schindler commented on LUCENE-9500: --- Ah, ECJ was complaining :-) Tahnks for fixing. I missed to run precommit after cherry-picking. > Did we hit a DEFLATE bug? > - > > Key: LUCENE-9500 > URL: https://issues.apache.org/jira/browse/LUCENE-9500 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.x, master (9.0), 8.7 >Reporter: Adrien Grand >Assignee: Uwe Schindler >Priority: Critical > Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, > jdk15 > Fix For: 8.x, master (9.0), 8.7 > > Attachments: PresetDictTest.java, test_data.txt > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I've been digging > [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/] > all day and managed to isolate a simple reproduction that shows the problem. > I've been starring at it all day and can't find what we are doing wrong, > which makes me wonder whether we're calling DEFLATE the wrong way or whether > we hit a DEFLATE bug. I've looked at it so much that I may be missing the > most obvious stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?
[ https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190653#comment-17190653 ] Uwe Schindler commented on LUCENE-9500: --- Thanks Adrien! Was this causing javadocs build failure? > Did we hit a DEFLATE bug? > - > > Key: LUCENE-9500 > URL: https://issues.apache.org/jira/browse/LUCENE-9500 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.x, master (9.0), 8.7 >Reporter: Adrien Grand >Assignee: Uwe Schindler >Priority: Critical > Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, > jdk15 > Fix For: 8.x, master (9.0), 8.7 > > Attachments: PresetDictTest.java, test_data.txt > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I've been digging > [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/] > all day and managed to isolate a simple reproduction that shows the problem. > I've been starring at it all day and can't find what we are doing wrong, > which makes me wonder whether we're calling DEFLATE the wrong way or whether > we hit a DEFLATE bug. I've looked at it so much that I may be missing the > most obvious stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?
[ https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190652#comment-17190652 ] ASF subversion and git services commented on LUCENE-9500: - Commit d7299890c75bfe403f14390a0dfb70e2689fdf3c in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d729989 ] LUCENE-9500: There is no setDictionary(ByteBuffer) in JDK8. > Did we hit a DEFLATE bug? > - > > Key: LUCENE-9500 > URL: https://issues.apache.org/jira/browse/LUCENE-9500 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 8.x, master (9.0), 8.7 >Reporter: Adrien Grand >Assignee: Uwe Schindler >Priority: Critical > Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, > jdk15 > Fix For: 8.x, master (9.0), 8.7 > > Attachments: PresetDictTest.java, test_data.txt > > Time Spent: 2h 20m > Remaining Estimate: 0h > > I've been digging > [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/] > all day and managed to isolate a simple reproduction that shows the problem. > I've been starring at it all day and can't find what we are doing wrong, > which makes me wonder whether we're calling DEFLATE the wrong way or whether > we hit a DEFLATE bug. I've looked at it so much that I may be missing the > most obvious stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…
dweiss commented on pull request #1830: URL: https://github.com/apache/lucene-solr/pull/1830#issuecomment-687038375 @uschindler you know these checks better - to me I refactored this stuff in the same way as before (and they're running much faster now since they can run in parallel). If you have a spare minute to eyeball though it'd be good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss opened a new pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…
dweiss opened a new pull request #1830: URL: https://github.com/apache/lucene-solr/pull/1830 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9506) Gradle: split validateSourcePatterns into per-project and root-specific tasks (allow parallelism)
Dawid Weiss created LUCENE-9506: --- Summary: Gradle: split validateSourcePatterns into per-project and root-specific tasks (allow parallelism) Key: LUCENE-9506 URL: https://issues.apache.org/jira/browse/LUCENE-9506 Project: Lucene - Core Issue Type: Task Reporter: Dawid Weiss Assignee: Dawid Weiss -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9505) Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation
[ https://issues.apache.org/jira/browse/LUCENE-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190641#comment-17190641 ] ASF subversion and git services commented on LUCENE-9505: - Commit d31a42763be26fcaee886ea2249a4d8d4bc0a119 in lucene-solr's branch refs/heads/master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d31a427 ] LUCENE-9505: add dummy outputs. (#1829) > Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation > -- > > Key: LUCENE-9505 > URL: https://issues.apache.org/jira/browse/LUCENE-9505 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > We have several tasks that only have inputs and no outputs. For incremental > builds, this means that they are only re-run if: > * the inputs change, > * --rerun-tasks is given on command line. > Gradle has a built-in rule for "cleaning" the outputs of a task - a > "clean[TaskName]" rule, so in theory you could clean the outputs of a single > task and re-run the entire build with only that task being re-run. It would > be sometimes convenient. > We could add a dummy output to these tasks instead of upToDateWhen (for > example, touch an empty file at the end of the task's execution). Then > cleanXXX should work for them (and so would incremental builds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9505) Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation
[ https://issues.apache.org/jira/browse/LUCENE-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9505. - Fix Version/s: master (9.0) Resolution: Fixed > Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation > -- > > Key: LUCENE-9505 > URL: https://issues.apache.org/jira/browse/LUCENE-9505 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > We have several tasks that only have inputs and no outputs. For incremental > builds, this means that they are only re-run if: > * the inputs change, > * --rerun-tasks is given on command line. > Gradle has a built-in rule for "cleaning" the outputs of a task - a > "clean[TaskName]" rule, so in theory you could clean the outputs of a single > task and re-run the entire build with only that task being re-run. It would > be sometimes convenient. > We could add a dummy output to these tasks instead of upToDateWhen (for > example, touch an empty file at the end of the task's execution). Then > cleanXXX should work for them (and so would incremental builds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #1829: LUCENE-9505: add dummy outputs to tasks with no outputs
dweiss merged pull request #1829: URL: https://github.com/apache/lucene-solr/pull/1829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss opened a new pull request #1829: LUCENE-9505: add dummy outputs to tasks with no outputs
dweiss opened a new pull request #1829: URL: https://github.com/apache/lucene-solr/pull/1829 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9505) Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation
Dawid Weiss created LUCENE-9505: --- Summary: Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation Key: LUCENE-9505 URL: https://issues.apache.org/jira/browse/LUCENE-9505 Project: Lucene - Core Issue Type: Task Reporter: Dawid Weiss Assignee: Dawid Weiss We have several tasks that only have inputs and no outputs. For incremental builds, this means that they are only re-run if: * the inputs change, * --rerun-tasks is given on command line. Gradle has a built-in rule for "cleaning" the outputs of a task - a "clean[TaskName]" rule, so in theory you could clean the outputs of a single task and re-run the entire build with only that task being re-run. It would be sometimes convenient. We could add a dummy output to these tasks instead of upToDateWhen (for example, touch an empty file at the end of the task's execution). Then cleanXXX should work for them (and so would incremental builds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9418) Ordered intervals can give inaccurate hits on interleaved terms
[ https://issues.apache.org/jira/browse/LUCENE-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190630#comment-17190630 ] Alan Woodward commented on LUCENE-9418: --- Hi [~Brain2000], I think you have a different problem there; this issue concerns Interval queries, whereas you look to have a problem with a sorting collector. Can you open a new issue, with a reproducible test failure if possible? > Ordered intervals can give inaccurate hits on interleaved terms > --- > > Key: LUCENE-9418 > URL: https://issues.apache.org/jira/browse/LUCENE-9418 > Project: Lucene - Core > Issue Type: Bug >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.6 > > Time Spent: 20m > Remaining Estimate: 0h > > Given the text 'A B A C', an ordered interval over 'A B C' will return the > inaccurate interval [2, 3], due to the way minimization is handled after > matches are found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9497) Integerate Error Prone ( Static Analysis Tool ) during compilation
[ https://issues.apache.org/jira/browse/LUCENE-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190628#comment-17190628 ] Dawid Weiss commented on LUCENE-9497: - The compilation problem was caused by the fact that we exclude error prone from published/ runtime dependencies (transitive dependency from guava) but at the same time if it's missing, error prone's annotation processor can complain about missing annotation types. Not to mention each and every of these dependencies uses a different version of error_prone_annotations... this is confusing like hell, sigh. I think I fixed most of these issues here: https://github.com/apache/lucene-solr/pull/1828 I also looked at how individual pieces of code can be marked as valid - error prone uses regular SuppressWarnings annotation with custom names for each category. This is fine, I think. > Integerate Error Prone ( Static Analysis Tool ) during compilation > -- > > Key: LUCENE-9497 > URL: https://issues.apache.org/jira/browse/LUCENE-9497 > Project: Lucene - Core > Issue Type: Task >Reporter: Varun Thacker >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > Integrate [https://github.com/google/error-prone] during compilation of our > source code to catch mistakes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya reassigned SOLR-13973: --- Assignee: Ishan Chattopadhyaya > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190623#comment-17190623 ] Ishan Chattopadhyaya commented on SOLR-13973: - [~mkalkbrenner], I think we should fix this (SOLR-14768) in 8.7, while simultaneously deprecating it. > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13973) Deprecate Tika
[ https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190622#comment-17190622 ] Markus Kalkbrenner commented on SOLR-13973: --- In fact you don't need to deprecate the feature in 8.7 anymore as you already broke it in 8.6 ;) see SOLR-14768 > Deprecate Tika > -- > > Key: SOLR-13973 > URL: https://issues.apache.org/jira/browse/SOLR-13973 > Project: Solr > Issue Type: Improvement >Reporter: Ishan Chattopadhyaya >Priority: Blocker > Fix For: 8.7 > > Time Spent: 10m > Remaining Estimate: 0h > > Solr's primary responsibility should be to focus on search and scalability. > Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us > down. I propose that we deprecate it going forward. > Tika can be run outside Solr. Going forward, if someone wants to use these, > it should be possible to bring them into third party packages and installed > via package manager. > Plan is to just to throw warnings in logs and add deprecation notes in > reference guide for now. Removal can be done in 9.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #1816: LUCENE-9497: Integerate Error Prone ( Static Analysis Tool ) during compilation
dweiss commented on pull request #1816: URL: https://github.com/apache/lucene-solr/pull/1816#issuecomment-686969828 I made a few changes to consolidate the version used across compilations. Still don't know what the original problem was, let's see if this passes though. https://github.com/apache/lucene-solr/pull/1828 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org