[jira] [Updated] (LUCENE-9328) Sorting by DocValues while grouping is slower than old good FieldCache

2020-09-04 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-9328:
-
Issue Type: Improvement  (was: Bug)
  Priority: Major  (was: Minor)

> Sorting by DocValues while grouping is slower than old good FieldCache
> --
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9493) Remove obsolete dev-tools/{idea,netbeans,maven} folders

2020-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190983#comment-17190983
 ] 

David Smiley commented on LUCENE-9493:
--

There are a couple nice things about the IntelliJ project that come to mind:
* a code style!  -- dev-tools/idea/.idea/codeStyleSettings.xml
* an ASF copywright profile. -- 
dev-tools/idea/.idea/copyright/Apache_Software_Foundation.xml

I think it would be helpful to provide the xml file for both of these in a 
simple "intellij" folder there.  These can be imported into IntelliJ for an 
existing project manually.

CC [~sarowe] I believe you created the IntelliJ config long ago, or were at 
least actively involved

> Remove obsolete dev-tools/{idea,netbeans,maven} folders
> ---
>
> Key: LUCENE-9493
> URL: https://issues.apache.org/jira/browse/LUCENE-9493
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I don't think they're used or applicable anymore. Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9461) Query hit highlighting components on top of matches API

2020-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190981#comment-17190981
 ] 

David Smiley commented on LUCENE-9461:
--

Maybe not as a sub-task, but would it make sense to modify the 
UnifiedHighlighter to use some of these components, thereby reducing 
redundancy?  As I say this, I look at some of these new components and maybe 
not (yet)... but maybe I'll see it better once you get to the example task.

> Query hit highlighting components on top of matches API
> ---
>
> Key: LUCENE-9461
> URL: https://issues.apache.org/jira/browse/LUCENE-9461
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
>
> Highlighters. Eventually, you'll have to face them. 
> When a Lucene Query is ran over an index, it implies a list of documents that 
> "matched it" - literally a boolean indication of whether the document should 
> be included in the search result or not. In practice, many applications need 
> to convey to users not just the fact that a document matched the query but 
> also some sort of intuitive explanation of *why* this particular query 
> matched it. While in many cases the relationship is trivial (term 
> containment), in case of complex queries it may not be trivial at all (think 
> of a really short prefix query, a fuzzy term query or even a Boolean 
> disjunction with a high number of possibilities).
> Historically, search engines used to "highlight" the source area of a 
> document that caused the "hit". If a document was too long, it was truncated 
> and only the area around the hit (or hits) was displayed (so called 
> "snippet").
> In my subjective opinion, in the Lucene API highlighters have played a 
> secondary role to queries and search. And once you're trying to build 
> something higher-level, highlighters are a crucial and necessary element of 
> the entire system. 
> My experience (and users feedback) from an implementation of a document 
> retrieval system where highlighting was involved was that it just didn't work 
> as expected. Here are the requirements of that system:
> * the query parser uses default field expansion into multiple fields (there 
> is no single "sink" field),
> * the highlights should match *exactly* what caused the hit; a search for 
> 'title:foo' must not highlight foo in any other field,
> * the set of fields to be highlighted isn't really fixed - there are some 
> fields that should always be displayed - title, summary - and others that 
> should not be displayed unless they're part of the query (in which case the 
> highlight is important and should be shown to the user).
> * highlights should be accurate for all sorts of queries: fuzzy, phrase, 
> prefix, Boolean, spans, etc.,
> * there can be more than one query at one time and they should highlight the 
> same content (with different colors).
> Many highlighters are available in Lucene (vector highlighter, postings 
> highlighter, unified highlighter) but none of them quite fit the bill above. 
> Believe me - we have tried (hard). We ended up using unified highlighter but 
> with subclassing, customizations and all sorts of complex, low-level quirks. 
> My gut feeling at that point was that it should be the Query that somehow 
> *exposes* the information about how a given field content matched. Then I 
> looked at matches API and built a quick prototype retrieving "match regions" 
> on top of that. It works like magic. Here are the key insights:
> * matches API returns exactly what a highlighter needs: for a given query it 
> iterates over fields and positions (including offsets, if they are available) 
> that caused a document to be included in the search result,
> * when matches API cannot provide offsets, it provides elements from which 
> offsets can be computed: positions by re-analyzing the field's value, for 
> example.
> * in extreme cases it may happen the matches API doesn't provide anything 
> useful (a field only indexed, with no stored field value, no positions, no 
> offsets) but I assume it is up to the application layer to know how to deal 
> with this then (or not deal with it at all and throw an exception).
> * matches API delegates the work of providing proper match ranges to the 
> query itself (actually, to the weight a query produces), it doesn't need to 
> know anything about different implementations and their specifics.
> The absolute *key* element is the last one. Once you build match region 
> retriever, highlighting is a merely about organizing match ranges, dealing 
> with potential overlaps, and proper formatting. It becomes a simple, 
> tractable problem separated from the internals of Lucene Queries.
> The initial set of 

[jira] [Commented] (LUCENE-9498) Move matchhighlighter to a separate subproject

2020-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190979#comment-17190979
 ] 

David Smiley commented on LUCENE-9498:
--

I think it would be weird to separate one highlighter from the "highlighter" 
module simply because of these dependencies.  

The "memory" (MemoryIndex) dependency is fantastic for re-analysis of stored 
text.  It's so useful and so small... I kinda wonder if it'd be better off in 
lucene-core.  Even in spatial is in lucene-core these days!

The "queries" dependency is only there because the other highlighters detect 
certain Query subclasses there to know how to highlight them.  The Matches API 
makes that approach obsolete.  The new "matches" highlighter/framework 
exclusively uses that new API, and the UnifiedHighlighter is dual-mode; can use 
it or not as one prefers.  There's an issue to make it use this default 
starting in 9.0.

> Move matchhighlighter to a separate subproject
> --
>
> Key: LUCENE-9498
> URL: https://issues.apache.org/jira/browse/LUCENE-9498
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>
> This is a trivial thing to do (on master at least). Match highlighter has no 
> other dependencies. It sort of fits in the "highlighter" package but this 
> package depends on {{queries}} and {{memory}} packages. I wonder if we should 
> move it to a separate subproject?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14833) Empty highlight entry on match only for some queries

2020-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190976#comment-17190976
 ] 

David Smiley commented on SOLR-14833:
-

Can you try {{hl.method=unified}}?  Another debugging aid here is debug=query 
which will give some insights into the query representation.

>  Empty highlight entry on match only for some queries 
> --
>
> Key: SOLR-14833
> URL: https://issues.apache.org/jira/browse/SOLR-14833
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.6.2
>Reporter: Hossameldin Khalifa
>Priority: Critical
> Attachments: Screen Shot 1442-01-17 at 3.55.53 AM.png, Screen Shot 
> 1442-01-17 at 3.56.05 AM.png, Screen Shot 1442-01-17 at 3.56.16 AM.png
>
>
> Solr Input : Solr Input : 
> ```json\{    "query": "text:(\"ما جرى بين الصحابة\" الماتريدي)",     
> "fields": "book_id,author_id,cat_id,meta,id,text",    "params": {"        
> rows": 20, "start": 0,         "hl": "true",        "hl.fl": 
> "text_highlighting,text_highlighting_copy",        "hl.fragmenter": "regex",  
>       "hl.q": "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)",         
> "f.text_highlighting.hl.fragsize": 110,        
> "f.text_highlighting_copy.hl.fragsize": 0}        }}```
> For exactly this text `"ما جرى بين الصحابة" الماتريدي` and some other queries 
> the highlights have some empty matches.I checked if the indexes don"t have 
> text stored in them but they seem to look like all other indexes.
> Here is an example of some part of the output of the 
> highlighter:```"d3108d2d-1344-458c-8c28-0639f82b274e": \{"text_highlighting": 
> [" الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا 
> يخوض فيها ما جرى بين الصحابة \ufd43، وما 
> حصل لبعضهم"], "text_highlighting_copy": ["فاجهد علي جهدك! "، وقال \ufd41 في 
> الدفاع عن عثمان حين سأل هذا الضال: "أما عثمان فكان الله قد عفا عنه وكرهتم أن 
> تعفوا عنه، وأما عليّ فابن عمّ رسول الله ﷺ وختنه"، ثم أخذ يذكر من محاسن علي 
> وعثمان \ufd44 حتى أفحم هذا الضال فذهب خائبا، وقال له ابن عمر \ufd41: "اذهب 
> بهذا الآن معك"، قال العيني \ufd40: أي اقرن هذا العذر بالجواب حتى لا يبقى لك 
> فيما أجبتك به حجة على ما كنت تعتقد" (1) فينبغي للداعية أن يدافع عن الصحابة 
> \ufd43 وعن أئمة الهدى من علماء أهل السنة والجماعة، ولكن بالحكمة والموعظة 
> الحسنة، والجدال بالحسنى.\nرابعا: من أساليب الدعوة: استخدام الشدة مع بعض 
> المدعوين: الأصل في الأساليب في الدعوة إلى الله \ufdff الرفق واللين، ولكن من 
> المدعوين من لا يجدي ولا ينفع فيه ومعه إلا الشدة والقوة؛ ولهذا استخدم عبد الله 
> بن عمر \ufd41 أسلوب الشدة مع الرجل الضال الذي يطعن في علي وعثمان \ufd44، 
> فقال: "أرغم الله بأنفك"، وقال \ufd41: "قاتلنا حتى لم تكن فتنة وكان الدين لله، 
> وأنتم تريدون أن تقاتلوا حتى تكون فتنة ويكون الدين لغير الله"، وهذا فيه قوة في 
> الأسلوب، ولكن لا يفعل ذلك إلا مع الأمن من الوقوع في المفاسد، والله المستعان 
> (2).\nخامسا: أهمية الكف عما جرى بين الصحابة \ufd43: إن من الأمور المهمة التي 
> ينبغي للداعية أن يعرض عنها ولا يخوض فيها ما جرى 
> بين الصحابة \ufd43، وما حصل لبعضهم؛ لأن الكف عن ذلك مذهب 
> أهل الحق والاعتدال (3)؛ ولهذا قال عبد الله بن عمر \ufd44 في هذا الحديث: "أما 
> عثمان فكان الله قد عفا عنه فكرهتم أن تعفوا عنه، وأما عليّ فابن عمّ رسول الله 
> ﷺ"، قال شيخ الإسلام ابن تيمية \ufd40 في مذهب أهل\n_\n(1) عمدة القاري، 
> شرح صحيح البخاري، 16/ 207.\n(2) انظر: الحديث رقم 116، الدرس العاشر.\n(3) 
> انظر: شرح العقيدة الواسطية، لابن تيمية، تأليف محمد خليل الهراس، ص 250."]}, 
> "1f36e221-2683-4bc7-9732-e6a64298f2df": {}}```
> I tried setting `hl.maxAnalyzedChars` to a large integer value and it still 
> did not workOne thing I also know that when removing `"hl.q": 
> "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", `  from the params it 
> works.However it then does not highlight the stop words, which is not my 
> desired behaviour.
> Here is the relevant part of my solr schema 
> ```xml version="1.6">  id
>    positionIncrementGap="100">   class="solr.SynonymGraphFilterFactory" 
> tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true" />       class="solr.WhitespaceTokenizerFactory"/>  class="solr.WordDelimiterGraphFilterFactory"/>  class="solr.FlattenGraphFilterFactory"/>  class="solr.StopFilterFactory" words="lang/stopwords_ar.txt" 
> ignoreCase="true"/>         
>           class="solr.ArabicStemFilterFactory"/>       class="solr.RemoveDuplicatesTokenFilterFactory"/>      
>    positionIncrementGap="100">   class="solr.SynonymGraphFilterFactory" 
> tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true" />       class="solr.WhitespaceTokenizerFactory"/>  class="solr.WordDelimiterGraphFilterFactory"/>  

[jira] [Resolved] (SOLR-14832) Inversion Eglish and numbers characters in Arabic documents

2020-09-04 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14832.
---
Resolution: Invalid

Please raise questions like this on the user's list, we try to reserve JIRAs 
for known bugs/enhancements rather than usage questions. The JIRA system is not 
a support portal.

See: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc there are links 
to both Lucene and Solr mailing lists there.

A _lot_ more people will see your question on that list and may be able to help 
more quickly.

If it's determined that this really is a code issue or enhancement to Lucene or 
Solr and not a configuration/usage problem, we can raise a new JIRA or reopen 
this one.



> Inversion Eglish and numbers characters in Arabic documents
> ---
>
> Key: SOLR-14832
> URL: https://issues.apache.org/jira/browse/SOLR-14832
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 4.1
>Reporter: Vlad
>Priority: Major
>
> Hi Support,
>  
> please help to resolve an issue. I upload/index several documents in English 
> and in Arabic languages to SOLR, in addition I use handler for Arabic 
> language:
>   
>    
>     
>      words="stopwords.txt" enablePositionIncrements="true" />
>       class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       class="solr.ArabicNormalizationFilterFactory"/>
>     
>     
>  
>   
>   
>     
>      words="stopwords.txt" enablePositionIncrements="true" />
>      ignoreCase="true" expand="true"/>
>       class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        class="solr.ArabicNormalizationFilterFactory"/>
>     
>     
>  
>   
>  
> There are two environments:
>  # Local machine:
>     - SOLR version: 4,2
>     - Windows version: 10
>  
>  # DEV env:
>     - SOLR version 4.1 as part of the cloudera suit
>     - Linux core version: 3.10.0-862
>  
> Issue appears when uploading documents:
>  # Local machine:
>     - Doc in English with English words only - ok (for example, 
> "[www.apache.org|http://www.apache.org/];)
>     - Doc in Arabic with some English words - ok (for example, 
> "[www.apache.org|http://www.apache.org/];)
>  
>  # DEV env:
>     - Doc in English with English words only - ok (for example, 
> "[www.apache.org|http://www.apache.org/];)
>     - Doc in Arabic with some English - English text is inverted 
> (for example, "gro.echapa.www"), what makes search by key words impossible.
>  
> Please advise whether this fixable and how?
>  
> Thank you in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14833) Empty highlight entry on match only for some queries

2020-09-04 Thread Hossameldin Khalifa (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hossameldin Khalifa updated SOLR-14833:
---
Attachment: Screen Shot 1442-01-17 at 3.56.16 AM.png
Screen Shot 1442-01-17 at 3.56.05 AM.png
Screen Shot 1442-01-17 at 3.55.53 AM.png

>  Empty highlight entry on match only for some queries 
> --
>
> Key: SOLR-14833
> URL: https://issues.apache.org/jira/browse/SOLR-14833
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 8.6.2
>Reporter: Hossameldin Khalifa
>Priority: Critical
> Attachments: Screen Shot 1442-01-17 at 3.55.53 AM.png, Screen Shot 
> 1442-01-17 at 3.56.05 AM.png, Screen Shot 1442-01-17 at 3.56.16 AM.png
>
>
> Solr Input : Solr Input : 
> ```json\{    "query": "text:(\"ما جرى بين الصحابة\" الماتريدي)",     
> "fields": "book_id,author_id,cat_id,meta,id,text",    "params": {"        
> rows": 20, "start": 0,         "hl": "true",        "hl.fl": 
> "text_highlighting,text_highlighting_copy",        "hl.fragmenter": "regex",  
>       "hl.q": "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)",         
> "f.text_highlighting.hl.fragsize": 110,        
> "f.text_highlighting_copy.hl.fragsize": 0}        }}```
> For exactly this text `"ما جرى بين الصحابة" الماتريدي` and some other queries 
> the highlights have some empty matches.I checked if the indexes don"t have 
> text stored in them but they seem to look like all other indexes.
> Here is an example of some part of the output of the 
> highlighter:```"d3108d2d-1344-458c-8c28-0639f82b274e": \{"text_highlighting": 
> [" الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا 
> يخوض فيها ما جرى بين الصحابة \ufd43، وما 
> حصل لبعضهم"], "text_highlighting_copy": ["فاجهد علي جهدك! "، وقال \ufd41 في 
> الدفاع عن عثمان حين سأل هذا الضال: "أما عثمان فكان الله قد عفا عنه وكرهتم أن 
> تعفوا عنه، وأما عليّ فابن عمّ رسول الله ﷺ وختنه"، ثم أخذ يذكر من محاسن علي 
> وعثمان \ufd44 حتى أفحم هذا الضال فذهب خائبا، وقال له ابن عمر \ufd41: "اذهب 
> بهذا الآن معك"، قال العيني \ufd40: أي اقرن هذا العذر بالجواب حتى لا يبقى لك 
> فيما أجبتك به حجة على ما كنت تعتقد" (1) فينبغي للداعية أن يدافع عن الصحابة 
> \ufd43 وعن أئمة الهدى من علماء أهل السنة والجماعة، ولكن بالحكمة والموعظة 
> الحسنة، والجدال بالحسنى.\nرابعا: من أساليب الدعوة: استخدام الشدة مع بعض 
> المدعوين: الأصل في الأساليب في الدعوة إلى الله \ufdff الرفق واللين، ولكن من 
> المدعوين من لا يجدي ولا ينفع فيه ومعه إلا الشدة والقوة؛ ولهذا استخدم عبد الله 
> بن عمر \ufd41 أسلوب الشدة مع الرجل الضال الذي يطعن في علي وعثمان \ufd44، 
> فقال: "أرغم الله بأنفك"، وقال \ufd41: "قاتلنا حتى لم تكن فتنة وكان الدين لله، 
> وأنتم تريدون أن تقاتلوا حتى تكون فتنة ويكون الدين لغير الله"، وهذا فيه قوة في 
> الأسلوب، ولكن لا يفعل ذلك إلا مع الأمن من الوقوع في المفاسد، والله المستعان 
> (2).\nخامسا: أهمية الكف عما جرى بين الصحابة \ufd43: إن من الأمور المهمة التي 
> ينبغي للداعية أن يعرض عنها ولا يخوض فيها ما جرى 
> بين الصحابة \ufd43، وما حصل لبعضهم؛ لأن الكف عن ذلك مذهب 
> أهل الحق والاعتدال (3)؛ ولهذا قال عبد الله بن عمر \ufd44 في هذا الحديث: "أما 
> عثمان فكان الله قد عفا عنه فكرهتم أن تعفوا عنه، وأما عليّ فابن عمّ رسول الله 
> ﷺ"، قال شيخ الإسلام ابن تيمية \ufd40 في مذهب أهل\n_\n(1) عمدة القاري، 
> شرح صحيح البخاري، 16/ 207.\n(2) انظر: الحديث رقم 116، الدرس العاشر.\n(3) 
> انظر: شرح العقيدة الواسطية، لابن تيمية، تأليف محمد خليل الهراس، ص 250."]}, 
> "1f36e221-2683-4bc7-9732-e6a64298f2df": {}}```
> I tried setting `hl.maxAnalyzedChars` to a large integer value and it still 
> did not workOne thing I also know that when removing `"hl.q": 
> "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", `  from the params it 
> works.However it then does not highlight the stop words, which is not my 
> desired behaviour.
> Here is the relevant part of my solr schema 
> ```xml version="1.6">  id
>    positionIncrementGap="100">   class="solr.SynonymGraphFilterFactory" 
> tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true" />       class="solr.WhitespaceTokenizerFactory"/>  class="solr.WordDelimiterGraphFilterFactory"/>  class="solr.FlattenGraphFilterFactory"/>  class="solr.StopFilterFactory" words="lang/stopwords_ar.txt" 
> ignoreCase="true"/>         
>           class="solr.ArabicStemFilterFactory"/>       class="solr.RemoveDuplicatesTokenFilterFactory"/>      
>    positionIncrementGap="100">   class="solr.SynonymGraphFilterFactory" 
> tokenizerFactory="solr.StandardTokenizerFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true" />       class="solr.WhitespaceTokenizerFactory"/>  class="solr.WordDelimiterGraphFilterFactory"/>  

[jira] [Created] (SOLR-14833) Empty highlight entry on match only for some queries

2020-09-04 Thread Hossameldin Khalifa (Jira)
Hossameldin Khalifa created SOLR-14833:
--

 Summary:  Empty highlight entry on match only for some queries 
 Key: SOLR-14833
 URL: https://issues.apache.org/jira/browse/SOLR-14833
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: search
Affects Versions: 8.6.2
Reporter: Hossameldin Khalifa


Solr Input : Solr Input : 
```json\{    "query": "text:(\"ما جرى بين الصحابة\" الماتريدي)",     "fields": 
"book_id,author_id,cat_id,meta,id,text",    "params": {"        rows": 20, 
"start": 0,         "hl": "true",        "hl.fl": 
"text_highlighting,text_highlighting_copy",        "hl.fragmenter": "regex",    
    "hl.q": "text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)",         
"f.text_highlighting.hl.fragsize": 110,        
"f.text_highlighting_copy.hl.fragsize": 0}        }}```
For exactly this text `"ما جرى بين الصحابة" الماتريدي` and some other queries 
the highlights have some empty matches.I checked if the indexes don"t have text 
stored in them but they seem to look like all other indexes.
Here is an example of some part of the output of the 
highlighter:```"d3108d2d-1344-458c-8c28-0639f82b274e": \{"text_highlighting": 
[" الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا يخوض 
فيها ما جرى بين الصحابة \ufd43، وما حصل 
لبعضهم"], "text_highlighting_copy": ["فاجهد علي جهدك! "، وقال \ufd41 في الدفاع 
عن عثمان حين سأل هذا الضال: "أما عثمان فكان الله قد عفا عنه وكرهتم أن تعفوا 
عنه، وأما عليّ فابن عمّ رسول الله ﷺ وختنه"، ثم أخذ يذكر من محاسن علي وعثمان 
\ufd44 حتى أفحم هذا الضال فذهب خائبا، وقال له ابن عمر \ufd41: "اذهب بهذا الآن 
معك"، قال العيني \ufd40: أي اقرن هذا العذر بالجواب حتى لا يبقى لك فيما أجبتك به 
حجة على ما كنت تعتقد" (1) فينبغي للداعية أن يدافع عن الصحابة \ufd43 وعن أئمة 
الهدى من علماء أهل السنة والجماعة، ولكن بالحكمة والموعظة الحسنة، والجدال 
بالحسنى.\nرابعا: من أساليب الدعوة: استخدام الشدة مع بعض المدعوين: الأصل في 
الأساليب في الدعوة إلى الله \ufdff الرفق واللين، ولكن من المدعوين من لا يجدي 
ولا ينفع فيه ومعه إلا الشدة والقوة؛ ولهذا استخدم عبد الله بن عمر \ufd41 أسلوب 
الشدة مع الرجل الضال الذي يطعن في علي وعثمان \ufd44، فقال: "أرغم الله بأنفك"، 
وقال \ufd41: "قاتلنا حتى لم تكن فتنة وكان الدين لله، وأنتم تريدون أن تقاتلوا 
حتى تكون فتنة ويكون الدين لغير الله"، وهذا فيه قوة في الأسلوب، ولكن لا يفعل ذلك 
إلا مع الأمن من الوقوع في المفاسد، والله المستعان (2).\nخامسا: أهمية الكف عما 
جرى بين الصحابة \ufd43: إن من الأمور المهمة التي ينبغي للداعية أن يعرض عنها ولا 
يخوض فيها ما جرى بين الصحابة \ufd43، وما 
حصل لبعضهم؛ لأن الكف عن ذلك مذهب أهل الحق والاعتدال (3)؛ ولهذا قال عبد الله بن 
عمر \ufd44 في هذا الحديث: "أما عثمان فكان الله قد عفا عنه فكرهتم أن تعفوا عنه، 
وأما عليّ فابن عمّ رسول الله ﷺ"، قال شيخ الإسلام ابن تيمية \ufd40 في مذهب 
أهل\n_\n(1) عمدة القاري، شرح صحيح البخاري، 16/ 207.\n(2) انظر: الحديث 
رقم 116، الدرس العاشر.\n(3) انظر: شرح العقيدة الواسطية، لابن تيمية، تأليف محمد 
خليل الهراس، ص 250."]}, "1f36e221-2683-4bc7-9732-e6a64298f2df": {}}```
I tried setting `hl.maxAnalyzedChars` to a large integer value and it still did 
not workOne thing I also know that when removing `"hl.q": 
"text_highlighting:(\"ما جرى بين الصحابة\" الماتريدي)", `  from the params it 
works.However it then does not highlight the stop words, which is not my 
desired behaviour.
Here is the relevant part of my solr schema 
```xml  id
    
                                       
                                              

                                      
                   
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9501) IndexSortSortedNumericDocValuesRangeQuery violates iterator invariant.

2020-09-04 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190949#comment-17190949
 ] 

Julie Tibshirani commented on LUCENE-9501:
--

The fix to the query itself: https://github.com/apache/lucene-solr/pull/1833
Another change related to the Asserting* classes: 
https://github.com/apache/lucene-solr/pull/1834

The query fix should be merged before the Asserting* wrapper change. Otherwise 
TestIndexSortSortedDocValuesQuery tests will start to fail sporadically.


> IndexSortSortedNumericDocValuesRangeQuery violates iterator invariant.
> --
>
> Key: LUCENE-9501
> URL: https://issues.apache.org/jira/browse/LUCENE-9501
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Julie Tibshirani
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In LUCENE-7714 we added a new query to sandbox called 
> IndexSortSortedNumericDocValuesRangeQuery that optimizes range calculations 
> when the field is sorted. The query has a bad bug: its DocIdSetIterator can 
> return an old value for docID() even after advance has returned NO_MORE_DOCS. 
> This violates the DocIdSetIterator contract and means that it's possible for 
> DocIdSetIterator#advance to be called when it's already been exhausted (which 
> can result in invalid reads).
> We would have expected this issue to be caught in tests, especially because 
> classes like AssertingIndexSearcher check for these invariants. As part of 
> this fix I'll look into improvements to the Asserting* wrapper framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9508) DocumentsWriter doesn't check for BlockedFlushes in stall mode``

2020-09-04 Thread Sorabh Hamirwasia (Jira)
Sorabh Hamirwasia created LUCENE-9508:
-

 Summary: DocumentsWriter doesn't check for BlockedFlushes in stall 
mode``
 Key: LUCENE-9508
 URL: https://issues.apache.org/jira/browse/LUCENE-9508
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 8.5.1
Reporter: Sorabh Hamirwasia


Hi,

I was investigating an issue where the memory usage by a single Lucene 
IndexWriter went up to ~23GB. Lucene has a concept of stalling in case the 
memory used by each index breaches the 2 X ramBuffer limit (10% of JVM heap, 
this case ~3GB). So ideally memory usage should not go above that limit. I 
looked into the heap dump and found that the fullFlush thread when enters 
*markForFullFlush* method, it tries to take lock on the ThreadStates of all the 
DWPT thread sequentially. If lock on one of the ThreadState is blocked then it 
will block indefinitely.  In this this is what happened as one of the DWPT 
thread was stuck in indexing process. Due to this fullFlush thread was unable 
to populate the flush queue even though the stall mode was detected. This 
caused the new indexing request which came on indexing thread to continue after 
sleeping for a second, and continue with indexing. In **preUpdate()** method it 
looks for the stalled case and see if there is any pending flushes (based on 
flush queue), if not then sleep and continue. 

Question: 
1) Should **preUpdate** look into the blocked flushes information as well 
instead of just flush queue ?
2) Should the fullFlush thread wait indefinitely for the lock on ThreadStates ? 
Since single blocking writing thread can block the full flush here.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani opened a new pull request #1834: Make sure to test normal scorers with asserting wrappers.

2020-09-04 Thread GitBox


jtibshirani opened a new pull request #1834:
URL: https://github.com/apache/lucene-solr/pull/1834


   When a query is run at the top-level, the searcher uses `Weight#bulkScorer`.
   Many queries don't implement this explicitly and instead rely on the default
   implementation which delegates to `Weight#scorer`.
   
   Previously `AssertingWeight` would always wrap the delegate's bulk scorer. So
   for queries that rely on `Weight#scorer`, we weren't wrapping the scorer or
   iterator to run checks. This change proposes that 
`AssertingWeight#bulkScorer`
   sometimes use the default implementation to make sure we also test normal
   scorers.
   
   This change would have caught the bug in LUCENE-9501.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani commented on a change in pull request #1833: LUCENE-9501: Fix invariant violation in IndexSortSortedNumericDocValuesRangeQuery.

2020-09-04 Thread GitBox


jtibshirani commented on a change in pull request #1833:
URL: https://github.com/apache/lucene-solr/pull/1833#discussion_r483868710



##
File path: 
lucene/sandbox/src/test/org/apache/lucene/search/TestIndexSortSortedNumericDocValuesRangeQuery.java
##
@@ -65,7 +65,7 @@ public void testSameHitsAsPointRangeQuery() throws 
IOException {
 iw.deleteDocuments(LongPoint.newRangeQuery("idx", 0L, 10L));
   }
   final IndexReader reader = iw.getReader();
-  final IndexSearcher searcher = newSearcher(reader, false);
+  final IndexSearcher searcher = newSearcher(reader);

Review comment:
   This isn't critical for test coverage, but it seemed off that we had 
disabled wrapping the reader.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jtibshirani opened a new pull request #1833: LUCENE-9501: Fix invariant violation in IndexSortSortedNumericDocValuesRangeQuery.

2020-09-04 Thread GitBox


jtibshirani opened a new pull request #1833:
URL: https://github.com/apache/lucene-solr/pull/1833


   Previously the DocIdSetIterator returned an old value for docID even after
   advance returned NO_MORE_DOCS. This violates the DocIdSetIterator contract 
and
   made it possible for the iterator's advance method to be called even after it
   was already exhausted.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14832) Inversion Eglish and numbers characters in Arabic documents

2020-09-04 Thread Vlad (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad updated SOLR-14832:

Description: 
Hi Support,

 

please help to resolve an issue. I upload/index several documents in English 
and in Arabic languages to SOLR, in addition I use handler for Arabic language:

  

   

    

    

     

     

    

    

 

  

  

    

    

    

     

      

    

    

 

  

 

There are two environments:
 # Local machine:

    - SOLR version: 4,2

    - Windows version: 10

 
 # DEV env:

    - SOLR version 4.1 as part of the cloudera suit

    - Linux core version: 3.10.0-862

 

Issue appears when uploading documents:
 # Local machine:

    - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

    - Doc in Arabic with some English words - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

 
 # DEV env:

    - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

    - Doc in Arabic with some English - English text is inverted 
(for example, "gro.echapa.www"), what makes search by key words impossible.

 

Please advise whether this fixable and how?

 

Thank you in advance!

  was:
Hi Support,

 

please help to resolve an issue. I upload/index several documents in English 
and in Arabic languages to SOLR, in addition I use handler for Arabic language:

  

   

    

    

     

     

    

    

 

  

  

    

    

    

     

      

    

    

 

  

 

There are two environments:
 # Local machine:

    - SOLR version: 4,2

    - Windows version: 10

 
 # DEV env:

    - SOLR version: 

                - Cloudera suit

    - Linux core version: 3.10.0-862

 

Issue appears when uploading documents:
 # Local machine:

    - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

    - Doc in Arabic with some English words - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

 
 # DEV env:

    - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

    - Doc in Arabic with some English - English text is inverted 
(for example, "gro.echapa.www"), what makes search by key words impossible.

 

Please advise whether this fixable and how?


> Inversion Eglish and numbers characters in Arabic documents
> ---
>
> Key: SOLR-14832
> URL: https://issues.apache.org/jira/browse/SOLR-14832
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 4.1
>Reporter: Vlad
>Priority: Major
>
> Hi Support,
>  
> please help to resolve an issue. I upload/index several documents in English 
> and in Arabic languages to SOLR, in addition I use handler for Arabic 
> language:
>   
>    
>     
>      words="stopwords.txt" enablePositionIncrements="true" />
>       class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       class="solr.ArabicNormalizationFilterFactory"/>
>     
>     
>  
>   
>   
>     
>      words="stopwords.txt" enablePositionIncrements="true" />
>      ignoreCase="true" expand="true"/>
>       class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        class="solr.ArabicNormalizationFilterFactory"/>
>     
>     
>  
>   
>  
> There are two environments:
>  # Local machine:
>     - SOLR version: 4,2
>     - Windows version: 10
>  
>  # DEV env:
>     - SOLR version 4.1 as part of the cloudera suit
>     - Linux core version: 3.10.0-862
>  
> Issue appears when uploading documents:
>  # Local machine:
>     - Doc in English with English words only - ok (for example, 
> "[www.apache.org|http://www.apache.org/];)
>     - Doc in Arabic with some English words - ok (for example, 
> "[www.apache.org|http://www.apache.org/];)
>  
>  # DEV env:
>     - Doc in English with English words only - ok (for example, 
> "[www.apache.org|http://www.apache.org/];)
>     - Doc in Arabic with some English - English text is inverted 
> (for example, "gro.echapa.www"), what makes search by key words impossible.
>  
> Please advise whether 

[jira] [Created] (SOLR-14832) Inversion Eglish and numbers characters in Arabic documents

2020-09-04 Thread Vlad (Jira)
Vlad created SOLR-14832:
---

 Summary: Inversion Eglish and numbers characters in Arabic 
documents
 Key: SOLR-14832
 URL: https://issues.apache.org/jira/browse/SOLR-14832
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 4.1
Reporter: Vlad


Hi Support,

 

please help to resolve an issue. I upload/index several documents in English 
and in Arabic languages to SOLR, in addition I use handler for Arabic language:

  

   

    

    

     

     

    

    

 

  

  

    

    

    

     

      

    

    

 

  

 

There are two environments:
 # Local machine:

    - SOLR version: 4,2

    - Windows version: 10

 
 # DEV env:

    - SOLR version: 

                - Cloudera suit

    - Linux core version: 3.10.0-862

 

Issue appears when uploading documents:
 # Local machine:

    - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

    - Doc in Arabic with some English words - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

 
 # DEV env:

    - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/];)

    - Doc in Arabic with some English - English text is inverted 
(for example, "gro.echapa.www"), what makes search by key words impossible.

 

Please advise whether this fixable and how?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] chatman edited a comment on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation

2020-09-04 Thread GitBox


chatman edited a comment on pull request #1684:
URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687329334


   I withdraw all outstanding concerns. Verbosity, 
clunkiness/ineffectiveness/misplacement of configuration
   etc are all my "perceptions" that I don't want to come in the way of the 
completion of this effort.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] chatman edited a comment on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation

2020-09-04 Thread GitBox


chatman edited a comment on pull request #1684:
URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-686355702


   > @noblepaul & @chatman I find the tone of your latest comments offensive - 
that's no way to build a consensus. Please think twice before posting and calm 
down - if you have a different opinion about technical merits of this PR then 
I'm sure you can express it without personal attacks.
   
   I don't see how Noble's comments can construed as offensive. I may be biased 
in favour of my own comments, but I *apologise* if they were perceived as such. 
In any case, there is no personal attack anywhere.
   
   > By all means, if you disagree so strongly with the approach presented here 
then please do so - just be sure that you actually will do it instead of just 
complaining.
   
   I find choice of such words (" instead of just complaining") as 
unprofessional. This is a proposal, and comments are added to critique the 
design, not complain.
   
   On the other hand, Ilan wrote this on Slack:
   
   > If there’s consensus for Noble’s approach (or for that matter no consensus 
that goals 1-3 above are good guiding principles), I will stop work on 
SOLR-14613 and move on to other unrelated topics.
   
   Such threats of "stop work" unless one's design is agreed upon should cease, 
and constructive ways to collaborate should be explored.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] chatman commented on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation

2020-09-04 Thread GitBox


chatman commented on pull request #1684:
URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687329334


   I withdraw all outstanding concerns. Verbosity, clunkiness of configuration
   etc are all my "perceptions" that I don't want to come in the way of the
   completion of this effort.
   
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


vthacker commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483794345



##
File path: gradle/validation/error-prone.gradle
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'net.ltgt.errorprone'
+
+dependencies {
+  errorprone("com.google.errorprone:error_prone_core")
+}
+
+tasks.withType(JavaCompile) { task ->
+  options.errorprone.errorproneArgs = [
+  // test
+  '-Xep:ExtendingJUnitAssert:OFF',

Review comment:
   sounds good! I'll take the current branch and add two things and then 
commit the code
   1. `options.errorprone.disableWarningsInGeneratedCode = true`
   2. CHANGES entry 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…

2020-09-04 Thread GitBox


dweiss commented on a change in pull request #1830:
URL: https://github.com/apache/lucene-solr/pull/1830#discussion_r483793729



##
File path: gradle/validation/validate-source-patterns.gradle
##
@@ -29,50 +33,117 @@ buildscript {
   }
 }
 
-configure(rootProject) {
-  task("validateSourcePatterns", type: ValidateSourcePatternsTask) { task ->
+def extensions = [
+'adoc',
+'bat',
+'cmd',
+'css',
+'g4',
+'gradle',
+'groovy',
+'html',
+'java',
+'jflex',
+'jj',
+'js',
+'json',
+'mdtext',
+'pl',
+'policy',
+'properties',
+'py',
+'sh',
+'template',
+'vm',
+'xml',
+'xsl',
+]
+
+// Create source validation task local for each project's files.
+subprojects {
+  task validateSourcePatterns(type: ValidateSourcePatternsTask) { task ->
 group = 'Verification'
 description = 'Validate Source Patterns'
 
 // This task has no proper outputs.
 setupDummyOutputs(task)
 
-sourceFiles = project.fileTree(project.rootDir) {
-  [
-'java', 'jflex', 'py', 'pl', 'g4', 'jj', 'html', 'js',
-'css', 'xml', 'xsl', 'vm', 'sh', 'cmd', 'bat', 'policy',
-'properties', 'mdtext', 'groovy', 'gradle',
-'template', 'adoc', 'json',
-  ].each{
-include "lucene/**/*.${it}"
-include "solr/**/*.${it}"
-include "dev-tools/**/*.${it}"
-include "gradle/**/*.${it}"
+sourceFiles = fileTree(projectDir) {
+  extensions.each{
+include "*.${it}"
+  }
+
+  // default excludes.

Review comment:
   It could be. I didn't have time to clean up everything. The speedup was 
significant for me anyway (order of magnitude).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…

2020-09-04 Thread GitBox


dweiss commented on pull request #1830:
URL: https://github.com/apache/lucene-solr/pull/1830#issuecomment-687320373


   No worries. Not very urgent.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


dweiss commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483793353



##
File path: gradle/validation/error-prone.gradle
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'net.ltgt.errorprone'
+
+dependencies {
+  errorprone("com.google.errorprone:error_prone_core")
+}
+
+tasks.withType(JavaCompile) { task ->
+  options.errorprone.errorproneArgs = [
+  // test
+  '-Xep:ExtendingJUnitAssert:OFF',

Review comment:
   Varun - feel free to take this branch (or patch) and roll it out on 
yours. I didn't intend it to be committed, I just wanted to show what's needed 
for it to compile and work.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


dweiss commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483793090



##
File path: lucene/core/src/java/org/apache/lucene/analysis/CharArrayMap.java
##
@@ -523,6 +523,7 @@ public void clear() {
* @throws NullPointerException
*   if the given map is null.
*/
+  @SuppressWarnings("ReferenceEquality")

Review comment:
   Yes, exactly. I wanted the pr to include an example of how this can be 
done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vthacker commented on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation

2020-09-04 Thread GitBox


vthacker commented on pull request #1684:
URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687318445


   > The fact is, all of has the same objective: to make the product better.
   
   > he purpose is to ensure that the feature/change is
   
   >correct
   >performant/efficient
   >user-friendly
   
   I think everyone agrees on this. I really wish we can be better while giving 
feedback being nicer. We'd be able to collaborate better and keep the focus on 
the design decisions
   
   What are the current concerns with the current PR?
   1. The verbosity?
   2. Where the config lives?
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vthacker edited a comment on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation

2020-09-04 Thread GitBox


vthacker edited a comment on pull request #1684:
URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687318445


   > The fact is, all of has the same objective: to make the product better.
   
   > he purpose is to ensure that the feature/change is
   
   >correct
   >performant/efficient
   >user-friendly
   
   I think everyone agrees on this. I really wish we can be nicer while giving 
feedback. We'd be able to collaborate better and keep the focus on the design 
decisions
   
   What are the current concerns with the current PR?
   1. The verbosity?
   2. Where the config lives?
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…

2020-09-04 Thread GitBox


uschindler commented on pull request #1830:
URL: https://github.com/apache/lucene-solr/pull/1830#issuecomment-687317814


   I will check over the weekend. Was too busy today!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14579) Comment SolrJ 'Utils' generic map functions

2020-09-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190872#comment-17190872
 ] 

Uwe Schindler commented on SOLR-14579:
--

bq. I have no recollection of doing so, certainly not intentionally.

Sorry my fault: This JIRA comment confused me, as it was comming directly after 
the master and 8.x commits. But this is a different branch, why was it added 
back there?:

[https://issues.apache.org/jira/browse/SOLR-14579?focusedCommentId=1712=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1712]

> Comment SolrJ 'Utils' generic map functions
> ---
>
> Key: SOLR-14579
> URL: https://issues.apache.org/jira/browse/SOLR-14579
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Megan Carey
>Assignee: Erick Erickson
>Priority: Minor
> Fix For: 8.7
>
> Attachments: SOLR-14579.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Remove the map functions like `NEW_HASHMAP_FUN` from the Utils class in solrj 
> module to reduce warnings and improve code quality.
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/Utils.java#L92]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…

2020-09-04 Thread GitBox


madrob commented on a change in pull request #1830:
URL: https://github.com/apache/lucene-solr/pull/1830#discussion_r483773660



##
File path: gradle/validation/validate-source-patterns.gradle
##
@@ -29,50 +33,117 @@ buildscript {
   }
 }
 
-configure(rootProject) {
-  task("validateSourcePatterns", type: ValidateSourcePatternsTask) { task ->
+def extensions = [
+'adoc',
+'bat',
+'cmd',
+'css',
+'g4',
+'gradle',
+'groovy',
+'html',
+'java',
+'jflex',
+'jj',
+'js',
+'json',
+'mdtext',
+'pl',
+'policy',
+'properties',
+'py',
+'sh',
+'template',
+'vm',
+'xml',
+'xsl',
+]
+
+// Create source validation task local for each project's files.
+subprojects {
+  task validateSourcePatterns(type: ValidateSourcePatternsTask) { task ->
 group = 'Verification'
 description = 'Validate Source Patterns'
 
 // This task has no proper outputs.
 setupDummyOutputs(task)
 
-sourceFiles = project.fileTree(project.rootDir) {
-  [
-'java', 'jflex', 'py', 'pl', 'g4', 'jj', 'html', 'js',
-'css', 'xml', 'xsl', 'vm', 'sh', 'cmd', 'bat', 'policy',
-'properties', 'mdtext', 'groovy', 'gradle',
-'template', 'adoc', 'json',
-  ].each{
-include "lucene/**/*.${it}"
-include "solr/**/*.${it}"
-include "dev-tools/**/*.${it}"
-include "gradle/**/*.${it}"
+sourceFiles = fileTree(projectDir) {
+  extensions.each{
+include "*.${it}"
+  }
+
+  // default excludes.

Review comment:
   should the excludes be an input property so that we don't have to repeat 
them later on root project?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14829) Default components are missing facet_module and terms in documentation

2020-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190814#comment-17190814
 ] 

David Smiley commented on SOLR-14829:
-

In your own/custom definition of your request handler, do you need to actually 
list these components at all, vs just rely on the default list?  I think most 
people by far let the defaults happen.

> Default components are missing facet_module and terms in documentation
> --
>
> Key: SOLR-14829
> URL: https://issues.apache.org/jira/browse/SOLR-14829
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, examples
>Affects Versions: 8.6.2
>Reporter: Johannes Baiter
>Assignee: Ishan Chattopadhyaya
>Priority: Minor
> Attachments: SOLR-14829.patch
>
>
> In the reference guide, the list of search components that are enabled by 
> default is missing the {{facet_module}} and {{terms}} components. The terms 
> component is instead listed under "other useful components", while the 
> {{FacetModule}} is never listed anywhere in the documentation, despite it 
> being neccessary for the JSON Facet API to work.
> This is also how I stumbled upon this, I spent hours trying to figure out why 
> JSON-based faceting was not working with my setup, after taking a glance at 
> the {{SearchHandler}} source code based on a hunch, it became clear that my 
> custom list of search components (created based on the list in the reference 
> guide) was to blame.
> A patch for the documentation gap is attached, but I think there are some 
> other issues with the naming/documentation around the two faceting APIs that 
> may be worth discussing:
>  * The names {{facet_module}} / {{FacetModule}} are very misleading, since 
> the documentation is always talking about the "JSON Facet API", but the term 
> "JSON" does not appear in the name of the component nor does the component 
> have any documentation attached that mentions this
>  * Why is the {{FacetModule}} class located in the {{search.facet}} package 
> while every single other search component included in the core is located in 
> the {{handler.component}} package?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9497) Integerate Error Prone ( Static Analysis Tool ) during compilation

2020-09-04 Thread Varun Thacker (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190810#comment-17190810
 ] 

Varun Thacker commented on LUCENE-9497:
---

> error prone uses regular SuppressWarnings annotation with custom names for 
> each category. 

yep! We'll use this when we start enabling warnings to suppress legit uses

> Integerate Error Prone ( Static Analysis Tool ) during compilation
> --
>
> Key: LUCENE-9497
> URL: https://issues.apache.org/jira/browse/LUCENE-9497
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Varun Thacker
>Priority: Minor
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Integrate [https://github.com/google/error-prone] during compilation of our 
> source code to catch mistakes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


vthacker commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483720288



##
File path: gradle/validation/error-prone.gradle
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'net.ltgt.errorprone'
+
+dependencies {
+  errorprone("com.google.errorprone:error_prone_core")
+}
+
+tasks.withType(JavaCompile) { task ->
+  options.errorprone.errorproneArgs = [
+  // test
+  '-Xep:ExtendingJUnitAssert:OFF',

Review comment:
   I am okay with either of the two styles. Ideally we'd want this list to 
get much shorter soon :) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #1827: SOLR-14792: Remove VelocityResponseWriter

2020-09-04 Thread GitBox


dsmiley commented on pull request #1827:
URL: https://github.com/apache/lucene-solr/pull/1827#issuecomment-687245022


   BTW the more up do date https://github.com/erikhatcher/solritas is in the 
next few days, the better as I'll be doing a recorded Activate session on this 
September 10th with a demo of it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


vthacker commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483719049



##
File path: lucene/core/src/java/org/apache/lucene/analysis/CharArrayMap.java
##
@@ -523,6 +523,7 @@ public void clear() {
* @throws NullPointerException
*   if the given map is null.
*/
+  @SuppressWarnings("ReferenceEquality")

Review comment:
   This was the example you wanted to try out on how to suppress legitimate 
warnings of ReferenceEquality ( or any other warnings ) when we start enabling 
the checks ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] vthacker commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


vthacker commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483717480



##
File path: gradle/validation/error-prone.gradle
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'net.ltgt.errorprone'
+
+dependencies {
+  errorprone("com.google.errorprone:error_prone_core")
+}
+
+tasks.withType(JavaCompile) { task ->
+  options.errorprone.errorproneArgs = [

Review comment:
   Let's add this ? I probably missed it in my PR





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14439) Upgrade to Tika 1.24.1

2020-09-04 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated SOLR-14439:
--
Attachment: SOLR-14339.patch

> Upgrade to Tika 1.24.1
> --
>
> Key: SOLR-14439
> URL: https://issues.apache.org/jira/browse/SOLR-14439
> Project: Solr
>  Issue Type: Task
>  Components: contrib - DataImportHandler
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Attachments: SOLR-14339.patch
>
>
> We recently released 1.24.1 with several fixes for DoS vulnerabilities we 
> found via fuzzing: CVE-2020-9489 https://seclists.org/oss-sec/2020/q2/69



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server

2020-09-04 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190787#comment-17190787
 ] 

David Smiley commented on SOLR-7632:


I don't think an URP makes sense for this because Tika needs the entire binary 
input stream.  URPs operate on SolrInputDocument.  A RequestHandler is perfect.

> Change the ExtractingRequestHandler to use Tika-Server
> --
>
> Key: SOLR-7632
> URL: https://issues.apache.org/jira/browse/SOLR-7632
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Chris A. Mattmann
>Priority: Major
>  Labels: gsoc2017, memex
>
> It's a pain to upgrade Tika's jars all the times when we release, and if Tika 
> fails it messes up the ExtractingRequestHandler (e.g., the document type 
> caused Tika to fail, etc). A more reliable way and also separated, and easier 
> to deploy version of the ExtractingRequestHandler would make a network call 
> to the Tika JAXRS server, and then call Tika on the Solr server side, get the 
> results and then index the information that way. I have a patch in the works 
> from the DARPA Memex project and I hope to post it soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] erikhatcher commented on pull request #1827: SOLR-14792: Remove VelocityResponseWriter

2020-09-04 Thread GitBox


erikhatcher commented on pull request #1827:
URL: https://github.com/apache/lucene-solr/pull/1827#issuecomment-687232685


   > @erikhatcher BTW, the .adoc format renders nicely in Github if you were to 
pull the ref guide docs over to https://github.com/erikhatcher/solritas. We 
could also update the link in the solr.cool entry to point directly to them, 
instead of the general github README page ;-)
   
   Thanks for that tip!   I'll definitely be pulling the docs over and 
adjusting.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


dweiss commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483689014



##
File path: gradle/validation/error-prone.gradle
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'net.ltgt.errorprone'
+
+dependencies {
+  errorprone("com.google.errorprone:error_prone_core")
+}
+
+tasks.withType(JavaCompile) { task ->
+  options.errorprone.errorproneArgs = [

Review comment:
   I suggested the same to Varun.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1743: Gradual naming convention enforcement.

2020-09-04 Thread GitBox


dweiss commented on a change in pull request #1743:
URL: https://github.com/apache/lucene-solr/pull/1743#discussion_r483684981



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/util/VerifyTestClassNamingConvention.java
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import com.carrotsearch.randomizedtesting.RandomizedContext;
+import org.junit.Assume;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.UncheckedIOException;
+import java.io.Writer;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.regex.Pattern;
+
+/**
+ * Enforce test naming convention.
+ */
+public class VerifyTestClassNamingConvention extends AbstractBeforeAfterRule {
+  public static final Pattern ALLOWED_CONVENTION = 
Pattern.compile("(.+?)\\.Test[^.]+");
+
+  private static Set exceptions;
+  static {
+try {
+  exceptions = new HashSet<>();
+  try (BufferedReader is =
+ new BufferedReader(
+ new InputStreamReader(
+   
VerifyTestClassNamingConvention.class.getResourceAsStream("test-naming-exceptions.txt"),
+   StandardCharsets.UTF_8))) {
+is.lines().forEach(exceptions::add);
+  }
+} catch (IOException e) {
+  throw new UncheckedIOException(e);
+}
+  }
+
+  @Override
+  protected void before() throws Exception {
+if (TestRuleIgnoreTestSuites.isRunningNested()) {
+  // Ignore nested test suites that test the test framework itself.
+  return;
+}
+
+String suiteName = RandomizedContext.current().getTargetClass().getName();
+
+// You can use this helper method to dump all suite names to a file.
+// Run gradle with one worker so that it doesn't try to append to the same
+// file from multiple processes:
+//
+// gradlew  test --max-workers 1 -Dtests.useSecurityManager=false
+//
+// dumpSuiteNamesOnly(suiteName);
+
+if (!ALLOWED_CONVENTION.matcher(suiteName).matches()) {
+  // if this class exists on the exception list, leave it.

Review comment:
   Same here, really. It was just an example of how it can be solved, not a 
final solution.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1743: Gradual naming convention enforcement.

2020-09-04 Thread GitBox


dweiss commented on a change in pull request #1743:
URL: https://github.com/apache/lucene-solr/pull/1743#discussion_r483684317



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
##
@@ -613,6 +613,7 @@ public static TestRuleIgnoreAfterMaxFailures 
replaceMaxFailureRule(TestRuleIgnor
 RuleChain r = RuleChain.outerRule(new TestRuleIgnoreTestSuites())
   .around(ignoreAfterMaxFailures)
   .around(suiteFailureMarker = new TestRuleMarkFailure())
+  .around(new VerifyTestClassNamingConvention())

Review comment:
   This is just code, anything can be changed... In the example I wrote it 
can't be turned off.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1743: Gradual naming convention enforcement.

2020-09-04 Thread GitBox


cpoerschke commented on a change in pull request #1743:
URL: https://github.com/apache/lucene-solr/pull/1743#discussion_r483679806



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
##
@@ -613,6 +613,7 @@ public static TestRuleIgnoreAfterMaxFailures 
replaceMaxFailureRule(TestRuleIgnor
 RuleChain r = RuleChain.outerRule(new TestRuleIgnoreTestSuites())
   .around(ignoreAfterMaxFailures)
   .around(suiteFailureMarker = new TestRuleMarkFailure())
+  .around(new VerifyTestClassNamingConvention())

Review comment:
   question: would this convention automatically and always apply to all 
classes derived from `LuceneTestCase` including any non-`org.apache` name 
spaces or would it be possible to opt-out (without an exclusion list) somehow 
for custom code that might perhaps have chosen a different convention?

##
File path: 
lucene/test-framework/src/java/org/apache/lucene/util/VerifyTestClassNamingConvention.java
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import com.carrotsearch.randomizedtesting.RandomizedContext;
+import org.junit.Assume;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.UncheckedIOException;
+import java.io.Writer;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.regex.Pattern;
+
+/**
+ * Enforce test naming convention.
+ */
+public class VerifyTestClassNamingConvention extends AbstractBeforeAfterRule {
+  public static final Pattern ALLOWED_CONVENTION = 
Pattern.compile("(.+?)\\.Test[^.]+");
+
+  private static Set exceptions;
+  static {
+try {
+  exceptions = new HashSet<>();
+  try (BufferedReader is =
+ new BufferedReader(
+ new InputStreamReader(
+   
VerifyTestClassNamingConvention.class.getResourceAsStream("test-naming-exceptions.txt"),
+   StandardCharsets.UTF_8))) {
+is.lines().forEach(exceptions::add);
+  }
+} catch (IOException e) {
+  throw new UncheckedIOException(e);
+}
+  }
+
+  @Override
+  protected void before() throws Exception {
+if (TestRuleIgnoreTestSuites.isRunningNested()) {
+  // Ignore nested test suites that test the test framework itself.
+  return;
+}
+
+String suiteName = RandomizedContext.current().getTargetClass().getName();
+
+// You can use this helper method to dump all suite names to a file.
+// Run gradle with one worker so that it doesn't try to append to the same
+// file from multiple processes:
+//
+// gradlew  test --max-workers 1 -Dtests.useSecurityManager=false
+//
+// dumpSuiteNamesOnly(suiteName);
+
+if (!ALLOWED_CONVENTION.matcher(suiteName).matches()) {
+  // if this class exists on the exception list, leave it.

Review comment:
   It's possible (though rare) that both `TestFooBar.java` and 
`FooBarTest.java` classes co-exist. I wonder if the `ALLOW_CONVENTION` and 
`test-naming-exceptions.txt` logic might be mutually exclusive i.e. when 
something is on the exclusion list then its opposite is not valid i.e. the 
excluded test may be renamed (and removed from the exclusion list) but until 
that is done the conventional naming is discouraged to avoid confusion between 
the two variants?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on pull request #1823: SOLR-14510: Remove deprecations added with BMW support

2020-09-04 Thread GitBox


cpoerschke commented on pull request #1823:
URL: https://github.com/apache/lucene-solr/pull/1823#issuecomment-687200873


   > Remove deprecations added with BMW support 
   
   > ... It would be useful to add this deprecation information ...
   
   +1 to make it a clearer what is being deprecated. "BMW support" at first 
glance made me think of the car manufacturer but no, it's not that but 
"BlockMax WAND support" instead.
   
   To the reader of the deprecation information, does it matter why the thing 
that is being removed was deprecated, I wonder? If not then something like 
_"Remove deprecated writeStartDocumentList variant in TextResponseWriter and 
its sub-classes."_ could work perhaps, though it's rather long, hmm.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1828: LUCENE-9497: add google error prone checks

2020-09-04 Thread GitBox


madrob commented on a change in pull request #1828:
URL: https://github.com/apache/lucene-solr/pull/1828#discussion_r483663516



##
File path: gradle/validation/error-prone.gradle
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'net.ltgt.errorprone'
+
+dependencies {
+  errorprone("com.google.errorprone:error_prone_core")
+}
+
+tasks.withType(JavaCompile) { task ->
+  options.errorprone.errorproneArgs = [

Review comment:
   I would also like to see 
`options.errorprone.disableWarningsInGeneratedCode = true`

##
File path: gradle/validation/error-prone.gradle
##
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+allprojects { prj ->
+  plugins.withType(JavaPlugin) {
+prj.apply plugin: 'net.ltgt.errorprone'
+
+dependencies {
+  errorprone("com.google.errorprone:error_prone_core")
+}
+
+tasks.withType(JavaCompile) { task ->
+  options.errorprone.errorproneArgs = [
+  // test
+  '-Xep:ExtendingJUnitAssert:OFF',

Review comment:
   Personal style, but I think ```options.errorprone {
   disable 'ExtendingJUnitAssert'
   }```
   is more clear than using `-Xep`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on pull request #1684: SOLR-14613: strongly typed placement plugin interface and implementation

2020-09-04 Thread GitBox


HoustonPutman commented on pull request #1684:
URL: https://github.com/apache/lucene-solr/pull/1684#issuecomment-687193971


   I completely agree with David and Andrzej on all points. No one is going 
through the comments with a fine tooth comb, it is blatantly disrespectful. And 
doubling down instead of acknowledging and apologizing makes it even worse. 
   
   To your point Noble, Apache's motto is "Community over Code". There is no 
reason to put up with rudeness because someone graces us with a PR review. It 
is easier to review a PR and be dismissive and rude, but it is infinitely 
healthier and more constructive to be empathetic and kind. It also leads to a 
community that is more willing to contribute and collaborate. It's reasonable 
to expect mutual respect within the Lucene/Solr community. If we are in the 
place where we should accept any type of language when someone graces us with a 
review, then that is something we need to seriously address.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Markus Kalkbrenner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190758#comment-17190758
 ] 

Markus Kalkbrenner commented on SOLR-13973:
---

I just wanted to emphasize Solr's usage in general in the PHP world and not 
pretend that the removal of Tika will break thousands of installations:
{quote}For sure, just a few of all these installations will use Tika indirectly 
via the extraction handler.
{quote}
With Solarium and Search API Solr we always focus the latest Solr version! For 
both we recently had to go back to 8.5 because of SOLR-14768 because of the 
test failures.

BTW I think I should contribute to your documentation regarding libraries for 
different programming languages. Nothing else than solarium should be mentioned 
anymore for PHP. Most major CMS, Shop Systems, ... agreed to base their Solr 
integration on this library.

But this gets off-topic here.

I understand that you want to get Tika out of the VM and the out of the build 
dependencies. Go for it :)
I reached my goal to create some awareness for third party concerns. And it 
seems that SOLR-7632 is a reasonable compromise.

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9451) Sort.rewrite doesn't always return this when unchanged

2020-09-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190755#comment-17190755
 ] 

ASF subversion and git services commented on LUCENE-9451:
-

Commit 6c94ca9cb33795cdc29797ff2d17f1869813d3f9 in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6c94ca9 ]

LUCENE-9451 Sort.rewrite does not always return this when unchanged (#1731)



> Sort.rewrite doesn't always return this when unchanged
> --
>
> Key: LUCENE-9451
> URL: https://issues.apache.org/jira/browse/LUCENE-9451
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.7
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Sort.rewrite doesn't always return {{this}} as advertised in the Javadoc even 
> if the underlying fields are unchanged. This is because the comparison uses 
> reference equality.
> There are two solutions we can do here, 1) switch from reference equality to 
> object equality, and 2) fix some of the underlying sort fields to not create 
> unnecessary objects.
> cc: [~jpountz] [~romseygeek]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9451) Sort.rewrite doesn't always return this when unchanged

2020-09-04 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved LUCENE-9451.
---
Fix Version/s: master (9.0)
   Resolution: Fixed

> Sort.rewrite doesn't always return this when unchanged
> --
>
> Key: LUCENE-9451
> URL: https://issues.apache.org/jira/browse/LUCENE-9451
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.7
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Sort.rewrite doesn't always return {{this}} as advertised in the Javadoc even 
> if the underlying fields are unchanged. This is because the comparison uses 
> reference equality.
> There are two solutions we can do here, 1) switch from reference equality to 
> object equality, and 2) fix some of the underlying sort fields to not create 
> unnecessary objects.
> cc: [~jpountz] [~romseygeek]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1731: LUCENE-9451 Sort.rewrite does not always return this when unchanged

2020-09-04 Thread GitBox


madrob merged pull request #1731:
URL: https://github.com/apache/lucene-solr/pull/1731


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke opened a new pull request #1832: SOLR-14831: remove deprecated-and-unused "facet.distrib.mco" constant

2020-09-04 Thread GitBox


cpoerschke opened a new pull request #1832:
URL: https://github.com/apache/lucene-solr/pull/1832


   https://issues.apache.org/jira/browse/SOLR-14831



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14831) remove deprecated-and-unused "facet.distrib.mco" constant

2020-09-04 Thread Christine Poerschke (Jira)
Christine Poerschke created SOLR-14831:
--

 Summary: remove deprecated-and-unused "facet.distrib.mco" constant
 Key: SOLR-14831
 URL: https://issues.apache.org/jira/browse/SOLR-14831
 Project: Solr
  Issue Type: Task
  Components: SolrJ
Reporter: Christine Poerschke
Assignee: Christine Poerschke


This is ready for removal e.g. as per the 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/solrj/src/java/org/apache/solr/common/params/FacetParams.java#L139-L144
 comment:

{code}
   * @deprecated
   * This option is no longer used nor will if affect any queries as the fix 
has been built in. (SOLR-11711)
   * This will be removed entirely in 8.0.0
   */
  @Deprecated
  public static final String FACET_DISTRIB_MCO = FACET_DISTRIB + ".mco";
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server

2020-09-04 Thread Alexandre Rafalovitch (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190751#comment-17190751
 ] 

Alexandre Rafalovitch commented on SOLR-7632:
-

I agree on the critical path. I was just wondering whether, given the number of 
internal changes and explanations required on release, it makes sense to also 
make it into a more flexible architecture on the Solr side.

Making it URP, I think would allow to compose it with other pipeline elements 
in different order (e.g. preprocess file name, feed to Tika, apply DateParser), 
or possibly even distribute the load by running it on each node, instead of as 
first step. But that's just an idea. If others do not see the benefits, it is 
not worth chasing.

> Change the ExtractingRequestHandler to use Tika-Server
> --
>
> Key: SOLR-7632
> URL: https://issues.apache.org/jira/browse/SOLR-7632
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Chris A. Mattmann
>Priority: Major
>  Labels: gsoc2017, memex
>
> It's a pain to upgrade Tika's jars all the times when we release, and if Tika 
> fails it messes up the ExtractingRequestHandler (e.g., the document type 
> caused Tika to fail, etc). A more reliable way and also separated, and easier 
> to deploy version of the ExtractingRequestHandler would make a network call 
> to the Tika JAXRS server, and then call Tika on the Solr server side, get the 
> results and then index the information that way. I have a patch in the works 
> from the DARPA Memex project and I hope to post it soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server

2020-09-04 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190745#comment-17190745
 ] 

Erick Erickson commented on SOLR-7632:
--

The critical bit is moving it out of the Solr JVM. How would moving it to a URP 
help that issue?

> Change the ExtractingRequestHandler to use Tika-Server
> --
>
> Key: SOLR-7632
> URL: https://issues.apache.org/jira/browse/SOLR-7632
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Chris A. Mattmann
>Priority: Major
>  Labels: gsoc2017, memex
>
> It's a pain to upgrade Tika's jars all the times when we release, and if Tika 
> fails it messes up the ExtractingRequestHandler (e.g., the document type 
> caused Tika to fail, etc). A more reliable way and also separated, and easier 
> to deploy version of the ExtractingRequestHandler would make a network call 
> to the Tika JAXRS server, and then call Tika on the Solr server side, get the 
> results and then index the information that way. I have a patch in the works 
> from the DARPA Memex project and I hope to post it soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Alexandre Rafalovitch (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190728#comment-17190728
 ] 

Alexandre Rafalovitch commented on SOLR-13973:
--

Sanity check on the links and numbers (I use both Drupal 8/9 and Solr, though 
not currently together):

1) [https://www.drupal.org/project/apachesolr] is for Drupal 7 and before. 
Those people will also continue using Solr 4 or whatever was the configuration 
last updated to (2 years ago at the latest)

2) [https://www.drupal.org/project/search_api_solr] starting from v4 release 
does support Drupal 8.8+/9 (only) and Solr versions more recent than Solr 6.4. 
It was released in May 2020 and updated since. The current adoption is probably 
still fairly low, but will accelerate next year, once previous release version 
is no longer supported (notice said December). 

Drupal was never known for chasing latest Solr version as they have their own 
configuration that is designed for field definitions with wildcards and maybe 
only recently (if at all) with managed-schema API manipulation. They can also 
can keep using Solr 8 for another 5-6 years with Tika built in.

If Tika is removed from Solr (in version 9 the earliest), this will only affect 
the choices of those setting up new Drupal installation and wanting new 
features of Solr 9. At that point (say in 4 years), we can figure something out 
for Solr 11. Most likely a variation on preconfigured Solr and Tika colocated 
in a Docker container.

 

On the other hand, I honestly don't know much about solarium library directly. 
Perhaps it is a serious issue there, though we have to look again at number of 
active installations*percentage of those using /extract handler*percentage of 
people able to run _latest_ Solr process but not a second (also Java) process.

So, to me, this sounds less like a -1, then as an awareness for a bit of an 
extra education around that edge case. And, yes, awareness of the greater 
community; something we really need to pay more attention to in general.

 

Of course, any improvement of workflow we can do between Solr and Tika, both 
standalone, would be very good regardless of this particular use case.

 

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Markus Kalkbrenner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190720#comment-17190720
 ] 

Markus Kalkbrenner commented on SOLR-13973:
---

{quote}So that'd be SOLR-7632 as [~erickerickson] pointed out?
{quote}
Yes, sounds like it.

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14704) Add download option to solr/cloud-dev/cloud.sh

2020-09-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190718#comment-17190718
 ] 

ASF subversion and git services commented on SOLR-14704:


Commit 65da5ed32c940529b27a518deb8ffd1e61aa2e96 in lucene-solr's branch 
refs/heads/master from Gus Heck
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=65da5ed ]

SOLR-14704 add download option to cloud.sh (#1715)



> Add download option to solr/cloud-dev/cloud.sh
> --
>
> Key: SOLR-14704
> URL: https://issues.apache.org/jira/browse/SOLR-14704
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For easier testing of things like RC artifacts I'm adding an option to 
> cloud.sh which will curl a tarball down from the web instead of building it 
> locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gus-asf merged pull request #1715: SOLR-14704 add download option to cloud.sh

2020-09-04 Thread GitBox


gus-asf merged pull request #1715:
URL: https://github.com/apache/lucene-solr/pull/1715


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190716#comment-17190716
 ] 

Tim Allison commented on SOLR-13973:


So that'd be SOLR-7632 as [~erickerickson] pointed out?

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Markus Kalkbrenner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190714#comment-17190714
 ] 

Markus Kalkbrenner commented on SOLR-13973:
---

{quote}to get Tika out of Solr's jvm
{quote}
I understand that goal.
{quote}I've been thinking about adding an "indexer" endpoint to Tika. You'd 
configure your Solr/ES connection info and error handling choices via json at 
startup and then send the bytes to tika-server's /indexer endpoint. It would 
parse the file and forward the result to Solr. Would that simplify anything?
{quote}
I think that makes sense. A good approach would be if Solr keeps its "API" for 
the clients, in other words the extraction handler. The new implementation of 
the extraction handler would forward the document to the new endpoint of the 
standalone Tika server and handle its response.
This approach would keep the complexity of a new connection with its own new 
API away from the clients.
the new handler should be available when the old one gets deprecated.

And don't get me wrong. I really appreciate all your hard work! And our PHP 
stuff would be nothing without Solr ;)

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14821) docValuesTermsFilter should support single-valued docValues fields

2020-09-04 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190712#comment-17190712
 ] 

Jason Gerlowski commented on SOLR-14821:


Ah, hmm, I can see it locally now - I initially couldn't because I was 
specifying {{docValuesTermsFIlter}} but didn't have enough terms for 
{{docValuesTermsFilterTopLevel}} to be chosen - which is where the actual 
problem is.  Trivial to reproduce locally if you specify dVTFTL as your method 
directly.  The test code I mentioned in my comment above must actually be 
multi-valued for {{author_s}}?

In any case your fix is straightforward and correct.  Just need to fix the test 
up before committing.

> docValuesTermsFilter should support single-valued docValues fields
> --
>
> Key: SOLR-14821
> URL: https://issues.apache.org/jira/browse/SOLR-14821
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Anatolii Siuniaev
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-14821.patch
>
>
> SOLR-13890 introduced a post-filter implementation for docValuesTermsFilter 
> in  TermsQParserPlugin. But now it supports only multi-valued docValues 
> fields (i.e. SORTED_SET type DocValues)
> It doesn't work for single-valued docValues fields (i.e. SORTED type 
> DocValues), though it should. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190711#comment-17190711
 ] 

Tim Allison commented on SOLR-13973:


[~mkalkbrenner] I've been thinking about adding an "indexer" endpoint to Tika.  
You'd configure your Solr/ES connection info and error handling choices via 
json at startup and then send the bytes to tika-server's /indexer endpoint.  It 
would parse the file and forward the result to Solr.  Would that simplify 
anything?

I'm thoroughly on board with "don't break the user experience", but we've got 
to get Tika out of Solr's jvm.

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9507) Custom order for leaves in DirectoryReader, IndexWriter and searcher

2020-09-04 Thread Jim Ferenczi (Jira)
Jim Ferenczi created LUCENE-9507:


 Summary: Custom order for leaves in DirectoryReader, IndexWriter 
and searcher
 Key: LUCENE-9507
 URL: https://issues.apache.org/jira/browse/LUCENE-9507
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Jim Ferenczi


Now that we're able [to skip documents efficiently when sorting by a numeric 
field|https://issues.apache.org/jira/browse/LUCENE-9280], I was wondering if we 
could optimize sorted queries further by also sorting the leaf readers based on 
the primary sort.

For time-based indices in Elasticsearch, we've implemented an optimization that 
does that at query time. If the query is sorted by a numeric docvalue field, 
prior to search, we sort the leaves according to the query sort. When sorting 
by timestamp this small optimization can have a big impact since early 
termination can be reached much faster if the sort values in the segments don't 
overlap too much. Applying this optimization at query time is challenging , it 
has the benefit to work on any numeric field sort and order but it requires to 
use a multi-reader that will reorganize the segments. It can also be deceptive 
that after a force merge to 1 segment sorted queries may be slower since there 
is nothing to sort anymore.

So, another option that I look at is to add the ability to provide a leaf order 
directly in the IndexWriter and DirectoryReader. That could be similar to an 
index sort or even complementary to it since sorting segments based on the 
index sort could also help at query time. For time-based indices that cannot 
afford index sorting but have lots of sorted queries on timestamp, forcing the 
order of segments could speed up sorted queries significantly. 

The advantage of forcing a single leaf sort in the writer/reader is that we can 
also use it to influence the merges by putting the segments with the highest 
value first. That would help with the case of indices that are merged to a 
single segment but would like to keep the sorted queries fast but also for the 
multi-segments case since big segments would have more chance to have highest 
values first too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190704#comment-17190704
 ] 

Erick Erickson commented on SOLR-13973:
---

7632 is at least related...

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-7633) Change the ExtractingRequestHandler to use Tika-Server

2020-09-04 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190703#comment-17190703
 ] 

Erick Erickson commented on SOLR-7633:
--

Might be able to close this one, there's more commentary on 7632.

> Change the ExtractingRequestHandler to use Tika-Server
> --
>
> Key: SOLR-7633
> URL: https://issues.apache.org/jira/browse/SOLR-7633
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Chris A. Mattmann
>Priority: Major
>  Labels: memex
> Fix For: 5.0.1
>
>
> It's a pain to upgrade Tika's jars all the times when we release, and if Tika 
> fails it messes up the ExtractingRequestHandler (e.g., the document type 
> caused Tika to fail, etc). A more reliable way and also separated, and easier 
> to deploy version of the ExtractingRequestHandler would make a network call 
> to the Tika JAXRS server, and then call Tika on the Solr server side, get the 
> results and then index the information that way. I have a patch in the works 
> from the DARPA Memex project and I hope to post it soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on pull request #1813: SOLR-14613: No new APIs. use the existing APIs

2020-09-04 Thread GitBox


noblepaul commented on pull request #1813:
URL: https://github.com/apache/lucene-solr/pull/1813#issuecomment-687126415


   >Really? This is like saying that we only ever need collection admin APIs 
and we don't need any autoscaling. 
   
   I don't think I made myself clear.
   
   Users would definitely like Solr to place the replicas correctly. But if it 
means implementing a some plugin in java and packaging it in a jar & deploying 
it in their cluster, they would rather not do it.
   If it's as easy as writing down some DSL, they may use it



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


murblanc commented on a change in pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483593482



##
File path: solr/core/src/java/org/apache/solr/cluster/scheduler/Schedulable.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.scheduler;
+
+/**
+ * Component to be scheduled and executed according to the schedule.
+ */
+public interface Schedulable {
+
+  Schedule getSchedule();
+
+  /**
+   * Execute the component.
+   * NOTE: this should be a lightweight method that executes quickly, to 
avoid blocking the
+   * execution of other schedules. If it requires more work it should do this 
in a separate thread.

Review comment:
   If a scheduled component starts a new thread to do its work, the 
schedule is going to get skewed pretty quickly and we might have multiple 
"copies" of the scheduled component being started in parallel. We'd be 
delegating the responsibility of insuring a single executing instance of the 
component (on a given node where it was registered) to the component itself.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


murblanc commented on a change in pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483592144



##
File path: 
solr/core/src/java/org/apache/solr/cluster/scheduler/SolrScheduler.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.scheduler;
+
+/**
+ *
+ */
+public interface SolrScheduler {
+
+  void registerSchedulable(Schedulable schedulable);

Review comment:
   How is this method going to be called in practice? Do we assume each 
node (in the SolrCloud cluster) will register all tasks locally (those that are 
not `ClusterSingleton`) and that multiple instances are going to run in 
parallel? Or will a `ClusterSingleton` task be registering other scheduled 
tasks that as a consequence will only have a single instance running on the 
cluster?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14821) docValuesTermsFilter should support single-valued docValues fields

2020-09-04 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190701#comment-17190701
 ] 

Jason Gerlowski commented on SOLR-14821:


We do have a test for this it turns out, here's the relevant bit from 
TestTermsQParserPlugin:

{code}
  @Test
  public void testTermsMethodEquivalency() {
// Run queries with a variety of 'method' and postfilter options.
final TermsParams[] methods = new TermsParams[] {
...
new TermsParams("docValuesTermsFilter", true),
new TermsParams("docValuesTermsFilter", false),
new TermsParams("docValuesTermsFilterTopLevel", true),
new TermsParams("docValuesTermsFilterTopLevel", false),
new TermsParams("docValuesTermsFilterPerSegment", true),
new TermsParams("docValuesTermsFilterPerSegment", false)
};

for (TermsParams method : methods) {
  // Single-valued field, single term value
  ModifiableSolrParams params = new ModifiableSolrParams();
  params.add("q", method.buildQuery("author_s", "Robert Jordan"));
  params.add("sort", "id asc");
  assertQ(req(params, "indent", "on"), "*[count(//doc)=2]",
  "//result/doc[1]/str[@name='id'][.='2']",
  "//result/doc[2]/str[@name='id'][.='3']"
  );

  // Single-valued field, multiple term values
  params = new ModifiableSolrParams();
  params.add("q", method.buildQuery("author_s", "Robert Jordan,Isaac 
Asimov"));
  params.add("sort", "id asc");
  assertQ(req(params, "indent", "on"), "*[count(//doc)=3]",
  "//result/doc[1]/str[@name='id'][.='2']",
  "//result/doc[2]/str[@name='id'][.='3']",
  "//result/doc[3]/str[@name='id'][.='7']"
  );
  ...
}
  }
{code}

I'm not able to reproduce locally on master either.  Do you have an easy way to 
reproduce this [~anatolii_siuniaev]?

> docValuesTermsFilter should support single-valued docValues fields
> --
>
> Key: SOLR-14821
> URL: https://issues.apache.org/jira/browse/SOLR-14821
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Reporter: Anatolii Siuniaev
>Assignee: Jason Gerlowski
>Priority: Minor
> Attachments: SOLR-14821.patch
>
>
> SOLR-13890 introduced a post-filter implementation for docValuesTermsFilter 
> in  TermsQParserPlugin. But now it supports only multi-valued docValues 
> fields (i.e. SORTED_SET type DocValues)
> It doesn't work for single-valued docValues fields (i.e. SORTED type 
> DocValues), though it should. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


murblanc commented on a change in pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483590300



##
File path: 
solr/core/src/java/org/apache/solr/cluster/scheduler/impl/SolrSchedulerImpl.java
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.scheduler.impl;
+
+import java.lang.invoke.MethodHandles;
+import java.time.Instant;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.solr.cloud.ClusterSingleton;
+import org.apache.solr.cluster.scheduler.Schedule;
+import org.apache.solr.cluster.scheduler.Schedulable;
+import org.apache.solr.cluster.scheduler.SolrScheduler;
+import org.apache.solr.common.util.SolrNamedThreadFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Scheduled executions are triggered at most with {@link 
#SCHEDULE_INTERVAL_SEC} interval.
+ * Each registered {@link Schedulable} is processed sequentially and if its 
next execution time
+ * is in the past its {@link Schedulable#run()} method will be invoked.
+ * NOTE: If the total time of execution of all registered Schedulable-s 
exceeds any schedule
+ * interval then exact execution times will be silently missed.
+ */
+public class SolrSchedulerImpl implements SolrScheduler, ClusterSingleton {
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final int SCHEDULE_INTERVAL_SEC = 10;

Review comment:
   Instead of polling for tasks to run every 10 seconds could we be smart 
and set the next execution time of the scheduler to when the next job needs to 
be run? Is it worth the additional investment?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13652) Remove update from initParams in example solrconfig files that only mention "df"

2020-09-04 Thread Alexandre Rafalovitch (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190697#comment-17190697
 ] 

Alexandre Rafalovitch commented on SOLR-13652:
--

Added a section on Learning vs Production vs Kitchen Sink schema. Still needs 
more thought, especially on production part.

> Remove update from initParams in example solrconfig files that only mention 
> "df"
> 
>
> Key: SOLR-13652
> URL: https://issues.apache.org/jira/browse/SOLR-13652
> Project: Solr
>  Issue Type: Improvement
>  Components: examples
>Reporter: Erick Erickson
>Priority: Minor
>  Labels: easyfix, newbie
>
> At least some of the solrconfig files we ship have this entry:
>  path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,update">
>     
>   text
>     
>   
>  
> which has lead at least one user to wonder if there's some kind of automatic 
> way to have the df field populated for updates. I don't even know how you'd 
> send an update that didn't have a specific field. We should remove the 
> "update/**".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


murblanc commented on a change in pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483584416



##
File path: 
solr/core/src/java/org/apache/solr/cluster/scheduler/impl/CompiledSchedule.java
##
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.scheduler.impl;
+
+import java.lang.invoke.MethodHandles;
+import java.text.ParseException;
+import java.time.Instant;
+import java.time.format.DateTimeFormatter;
+import java.time.format.DateTimeFormatterBuilder;
+import java.time.temporal.ChronoField;
+import java.util.Date;
+import java.util.Locale;
+import java.util.TimeZone;
+
+import org.apache.solr.cluster.scheduler.Schedule;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.util.DateMathParser;
+import org.apache.solr.util.TimeZoneUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A version of {@link Schedule} where some of the fields are already resolved.

Review comment:
   Given this class does not implement `Schedule` (didn't get yet to the 
point where it's used) maybe its name or this comment should be clarified?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


murblanc commented on a change in pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483583874



##
File path: 
solr/core/src/java/org/apache/solr/cluster/scheduler/impl/CompiledSchedule.java
##
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cluster.scheduler.impl;
+
+import java.lang.invoke.MethodHandles;
+import java.text.ParseException;
+import java.time.Instant;
+import java.time.format.DateTimeFormatter;
+import java.time.format.DateTimeFormatterBuilder;
+import java.time.temporal.ChronoField;
+import java.util.Date;
+import java.util.Locale;
+import java.util.TimeZone;
+
+import org.apache.solr.cluster.scheduler.Schedule;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.util.DateMathParser;
+import org.apache.solr.util.TimeZoneUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A version of {@link Schedule} where some of the fields are already resolved.
+ */
+class CompiledSchedule {
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final String name;
+  final TimeZone timeZone;
+  final Instant startTime;
+  final String interval;
+  final DateMathParser dateMathParser;
+
+  Instant lastRunAt;
+
+  /**
+   * Compile a schedule.
+   * @param schedule schedule.
+   * @throws Exception if startTime or interval cannot be parsed.
+   */
+  CompiledSchedule(Schedule schedule) throws Exception {
+this.name = schedule.getName();
+this.timeZone = TimeZoneUtils.getTimeZone(schedule.getTimeZone());
+this.startTime = parseStartTime(new Date(), schedule.getStartTime(), 
timeZone);
+this.lastRunAt = startTime;
+this.interval = schedule.getInterval();
+this.dateMathParser = new DateMathParser(timeZone);
+// this is just to verify that the interval math is valid
+shouldRun();
+  }
+
+  private Instant parseStartTime(Date now, String startTimeStr, TimeZone 
timeZone) throws Exception {
+try {
+  // try parsing startTime as an ISO-8601 date time string
+  return DateMathParser.parseMath(now, startTimeStr).toInstant();
+} catch (SolrException e) {
+  if (e.code() != SolrException.ErrorCode.BAD_REQUEST.code) {
+throw new Exception("startTime: error parsing value '" + startTimeStr 
+ "': " + e.toString());
+  }
+}
+DateTimeFormatter dateTimeFormatter = new DateTimeFormatterBuilder()
+
.append(DateTimeFormatter.ISO_LOCAL_DATE).appendPattern("['T'[HH[:mm[:ss")
+.parseDefaulting(ChronoField.HOUR_OF_DAY, 0)
+.parseDefaulting(ChronoField.MINUTE_OF_HOUR, 0)
+.parseDefaulting(ChronoField.SECOND_OF_MINUTE, 0)
+.toFormatter(Locale.ROOT).withZone(timeZone.toZoneId());
+try {
+  return Instant.from(dateTimeFormatter.parse(startTimeStr));
+} catch (Exception e) {
+  throw new Exception("startTime: error parsing startTime '" + 
startTimeStr + "': " + e.toString());
+}
+  }
+
+  /**
+   * Returns true if the last run + run interval is already in the past.
+   */
+  boolean shouldRun() {
+dateMathParser.setNow(new Date(lastRunAt.toEpochMilli()));
+Instant nextRunTime;
+try {
+  Date next = dateMathParser.parseMath(interval);
+  nextRunTime = next.toInstant();
+} catch (ParseException e) {
+  log.warn("Invalid math expression, skipping: " + e);
+  return false;
+}
+if (Instant.now().isAfter(nextRunTime)) {
+  return true;
+} else {
+  return false;
+}
+  }
+
+  /**
+   * This setter MUST be invoked after each run.
+   * @param lastRunAt time when the schedule was last run.
+   */
+  void setLastRunAt(Instant lastRunAt) {

Review comment:
   Unclear if `lastRunAt` is the time the schedule last started or last 
completed. If completed, can we simplify by not passing an instant and building 
it here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


murblanc commented on a change in pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483581489



##
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##
@@ -775,6 +779,42 @@ private void doCompatCheck(BiConsumer 
consumer) {
 }
   }
 
+  /**
+   * Start {@link ClusterSingleton} plugins when we become the leader.
+   */
+  private void startClusterSingletons() {
+PluginBag handlers = 
getCoreContainer().getRequestHandlers();
+if (handlers == null) {
+  return;
+}
+handlers.keySet().forEach(handlerName -> {
+  SolrRequestHandler handler = handlers.get(handlerName);
+  if (handler instanceof ClusterSingleton) {
+try {
+  ((ClusterSingleton) handler).start();
+} catch (Exception e) {
+  log.warn("Exception starting ClusterSingleton " + handler, e);
+}
+  }
+});
+  }
+
+  /**
+   * Stop {@link ClusterSingleton} plugins when we lose leadership.
+   */
+  private void stopClusterSingletons() {
+PluginBag handlers = 
getCoreContainer().getRequestHandlers();

Review comment:
   Should we stop currently configured `ClusterSingleton` handlers or 
rather stop all those we've started? Is the configuration immutable?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] murblanc commented on a change in pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


murblanc commented on a change in pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831#discussion_r483580777



##
File path: solr/core/src/java/org/apache/solr/cloud/ClusterSingleton.java
##
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.cloud;
+
+/**
+ * Intended for {@link org.apache.solr.core.CoreContainer} plugins that should 
be
+ * enabled only one instance per cluster.
+ * Components that implement this interface are always in one of two states:
+ * 
+ *   STOPPED - the default state. The component is idle and does not 
perform

Review comment:
   Should we add `STARTING` and `STOPPING` states? Assuming the call to 
`start()` waits until the plugin has completed its startup (and similarly the 
call to `stop()` waiting for it to stop) might be expensive, and as implemented 
`start()` delays the `Overseer` starting in general.
   
   Possibly waiting for `stop()` to complete makes sense (in order to guarantee 
no two `ClusterSingleton` plugins are running concurrently on the cluster, so 
maybe state `STOPPING` is not needed), but I'd think we don't need to wait for 
`start()` to have completed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] sigram opened a new pull request #1831: SOLR-14749 the scheduler part

2020-09-04 Thread GitBox


sigram opened a new pull request #1831:
URL: https://github.com/apache/lucene-solr/pull/1831


   See PR 1758 for the background on this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on pull request #1815: SOLR-14151: Bug fixes

2020-09-04 Thread GitBox


noblepaul commented on pull request #1815:
URL: https://github.com/apache/lucene-solr/pull/1815#issuecomment-687092388


   > Noble, could you be more specific about the bugs that this is fixing? 
   
   I'm not sure exactly what the bug is `TestBulkSchemaConcurrent` was failing 
consistently after the original commit.So, I ensured that it passes consistently



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1815: SOLR-14151: Bug fixes

2020-09-04 Thread GitBox


noblepaul commented on a change in pull request #1815:
URL: https://github.com/apache/lucene-solr/pull/1815#discussion_r483562559



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -1582,6 +1582,13 @@ private CoreDescriptor 
reloadCoreDescriptor(CoreDescriptor oldDesc) {
   public void reload(String name) {
 reload(name, null);
   }
+  public void reload(String name, UUID coreId, boolean async) {
+if(async) {
+  runAsync(() -> reload(name, coreId));
+} else {
+  reload(name, coreId);
+}
+  }

Review comment:
   It's a generic method. I thought I would use it. But there are bugs in 
our core reloading. So, if I use asyn reload, some tests  fail





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1815: SOLR-14151: Bug fixes

2020-09-04 Thread GitBox


noblepaul commented on a change in pull request #1815:
URL: https://github.com/apache/lucene-solr/pull/1815#discussion_r483561919



##
File path: solr/core/src/java/org/apache/solr/core/ConfigSetService.java
##
@@ -81,8 +81,7 @@ public final ConfigSet loadConfigSet(CoreDescriptor dcore) {
   ) ? false: true;
 
   SolrConfig solrConfig = createSolrConfig(dcore, coreLoader, trusted);
-  IndexSchema indexSchema = createIndexSchema(dcore, solrConfig, false);
-  return new ConfigSet(configSetName(dcore), solrConfig, force -> 
indexSchema, properties, trusted);
+  return new ConfigSet(configSetName(dcore), solrConfig, force -> 
createIndexSchema(dcore, solrConfig, force), properties, trusted);

Review comment:
   Yes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Markus Kalkbrenner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190681#comment-17190681
 ] 

Markus Kalkbrenner edited comment on SOLR-13973 at 9/4/20, 11:18 AM:
-

{quote}Perhaps even a simple Tika integration in SolrJ would make sense, making 
it super simple to do the extraction on client side, which is probably what 
most users should consider anyway.
{quote}
As maintainer of Solarium, the major PHP Client for Solr, and of the 
Solr-Drupal-Integration I know that there're users and Solr Service Providers 
who rely on the ExtractionHandler and the out-of-the-box experience as 
[~AndrewGr] described. Even if I understand your motivation as a developer, 
moving the workflow to the client side  will put a significant work load on 
other developers, even if you add Tika support to SolrJ.
 Maybe the amount of people who use Solr in combination with a different 
programming language is higher compared to the amount of Java projects which 
use SolrJ.

There're more than 58,000 active Drupal installations using Solr as search 
backend today:
 [https://www.drupal.org/project/usage/search_api_solr]
https://www.drupal.org/project/usage/apachesolr

github lists 895 repositories that directly depend on the PHP solarium library:
 [https://github.com/solariumphp/solarium/network/dependents]

These includes packages from other PHP frameworks like symfony, laravel, typo3, 
wordpress, ...

Nearly 200,000 composer based build processes of PHP projects pulled the 
solarium library within the last 30 days:
 [https://packagist.org/packages/solarium/solarium/stats#major/all]

For sure, just a few of all these installations will use Tika indirectly via 
the extraction handler. But it won't be an easy task to add a stand alone Tika 
server to their stack. I know a lot of hosters who don't provide it yet to 
their customers.

I won't say that you shouldn't deprecate the embedded Tika at all. But take 
careful steps and be aware of the fact that the community of Solr users might 
be much greater as you think due to the out-of-the-box solutions that exist, 
especially in the PHP world.

BTW SOLR-14768 has been detected automatically by the automated integration 
tests of the solarium library and also  by the automated integration tests of 
the Search API Solr Drupal module!

 


was (Author: mkalkbrenner):
{quote}Perhaps even a simple Tika integration in SolrJ would make sense, making 
it super simple to do the extraction on client side, which is probably what 
most users should consider anyway.
{quote}
As maintainer of Solarium, the major PHP Client for Solr, and of the 
Solr-Drupal-Integration I know that there're users and Solr Service Providers 
who rely on the ExtractionHandler and the out-of-the-box experience as 
[~AndrewGr] described. Even if I understand your motivation as a developer, 
moving the workflow to the client side  will put a significant work load on 
other developers, even if you add Tika support to SolrJ.
Maybe the amount of people who use Solr in combination with a different 
programming language is higher compared to the amount of Java projects which 
use SolrJ.

There're more than 40,000 active Drupal installations using Solr as search 
backend today:
[https://www.drupal.org/project/usage/search_api_solr]

github lists 895 repositories that directly depend on the PHP solarium library:
[https://github.com/solariumphp/solarium/network/dependents]

These includes packages from other PHP frameworks like symfony, laravel, typo3, 
wordpress, ...

Nearly 200,000 composer based build processes of PHP projects pulled the 
solarium library within the last 30 days:
[https://packagist.org/packages/solarium/solarium/stats#major/all]

For sure, just a few of all these installations will use Tika indirectly via 
the extraction handler. But it won't be an easy task to add a stand alone Tika 
server to their stack. I know a lot of hosters who don't provide it yet to 
their customers.

I won't say that you shouldn't deprecate the embedded Tika at all. But take 
careful steps and be aware of the fact that the community of Solr users might 
be much greater as you think due to the out-of-the-box solutions that exist, 
especially in the PHP world.

BTW SOLR-14768 has been detected automatically by the automated integration 
tests of the solarium library and also  by the automated integration tests of 
the Search API Solr Drupal module!

 

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on 

[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Markus Kalkbrenner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190681#comment-17190681
 ] 

Markus Kalkbrenner commented on SOLR-13973:
---

{quote}Perhaps even a simple Tika integration in SolrJ would make sense, making 
it super simple to do the extraction on client side, which is probably what 
most users should consider anyway.
{quote}
As maintainer of Solarium, the major PHP Client for Solr, and of the 
Solr-Drupal-Integration I know that there're users and Solr Service Providers 
who rely on the ExtractionHandler and the out-of-the-box experience as 
[~AndrewGr] described. Even if I understand your motivation as a developer, 
moving the workflow to the client side  will put a significant work load on 
other developers, even if you add Tika support to SolrJ.
Maybe the amount of people who use Solr in combination with a different 
programming language is higher compared to the amount of Java projects which 
use SolrJ.

There're more than 40,000 active Drupal installations using Solr as search 
backend today:
[https://www.drupal.org/project/usage/search_api_solr]

github lists 895 repositories that directly depend on the PHP solarium library:
[https://github.com/solariumphp/solarium/network/dependents]

These includes packages from other PHP frameworks like symfony, laravel, typo3, 
wordpress, ...

Nearly 200,000 composer based build processes of PHP projects pulled the 
solarium library within the last 30 days:
[https://packagist.org/packages/solarium/solarium/stats#major/all]

For sure, just a few of all these installations will use Tika indirectly via 
the extraction handler. But it won't be an easy task to add a stand alone Tika 
server to their stack. I know a lot of hosters who don't provide it yet to 
their customers.

I won't say that you shouldn't deprecate the embedded Tika at all. But take 
careful steps and be aware of the fact that the community of Solr users might 
be much greater as you think due to the out-of-the-box solutions that exist, 
especially in the PHP world.

BTW SOLR-14768 has been detected automatically by the automated integration 
tests of the solarium library and also  by the automated integration tests of 
the Search API Solr Drupal module!

 

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jimczi commented on a change in pull request #1725: LUCENE-9449 Skip docs with _doc sort and "after"

2020-09-04 Thread GitBox


jimczi commented on a change in pull request #1725:
URL: https://github.com/apache/lucene-solr/pull/1725#discussion_r483531827



##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java
##
@@ -290,5 +299,114 @@ public void testFloatSortOptimization() throws 
IOException {
 dir.close();
   }
 
+  public void testDocSortOptimizationWithAfter() throws IOException {
+final Directory dir = newDirectory();
+final IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig());
+final int numDocs = atLeast(1500);
+for (int i = 0; i < numDocs; ++i) {
+  final Document doc = new Document();
+  writer.addDocument(doc);
+  if ((i > 0) && (i % 500 == 0)) {
+writer.commit();
+  }
+}
+final IndexReader reader = DirectoryReader.open(writer);
+IndexSearcher searcher = new IndexSearcher(reader);
+final int numHits = 3;
+final int totalHitsThreshold = 3;
+final int searchAfter = 1400;
+
+// sort by _doc with search after should trigger optimization
+{
+  final Sort sort = new Sort(FIELD_DOC);
+  FieldDoc after = new FieldDoc(searchAfter, Float.NaN, new 
Integer[]{searchAfter});
+  final TopFieldCollector collector = TopFieldCollector.create(sort, 
numHits, after, totalHitsThreshold);
+  searcher.search(new MatchAllDocsQuery(), collector);
+  TopDocs topDocs = collector.topDocs();
+  assertEquals(topDocs.scoreDocs.length, numHits);
+  for (int i = 0; i < numHits; i++) {
+int expectedDocID = searchAfter + 1 + i;
+assertEquals(expectedDocID, topDocs.scoreDocs[i].doc);
+  }
+  assertTrue(collector.isEarlyTerminated());
+  // check that very few hits were collected, and most hits before 
searchAfter were skipped
+  assertTrue(topDocs.totalHits.value < (numDocs - searchAfter));
+}
+
+// sort by _doc + _score with search after should trigger optimization
+{
+  final Sort sort = new Sort(FIELD_DOC, FIELD_SCORE);
+  FieldDoc after = new FieldDoc(searchAfter, Float.NaN, new 
Object[]{searchAfter, 1.0f});
+  final TopFieldCollector collector = TopFieldCollector.create(sort, 
numHits, after, totalHitsThreshold);
+  searcher.search(new MatchAllDocsQuery(), collector);
+  TopDocs topDocs = collector.topDocs();
+  assertEquals(topDocs.scoreDocs.length, numHits);
+  for (int i = 0; i < numHits; i++) {
+int expectedDocID = searchAfter + 1 + i;
+assertEquals(expectedDocID, topDocs.scoreDocs[i].doc);
+  }
+  assertTrue(collector.isEarlyTerminated());
+  // assert that very few hits were collected, and most hits before 
searchAfter were skipped
+  assertTrue(topDocs.totalHits.value < (numDocs - searchAfter));
+}
+
+// sort by _doc desc should not trigger optimization
+{
+  final Sort sort = new Sort(new SortField(null, SortField.Type.DOC, 
true));
+  FieldDoc after = new FieldDoc(searchAfter, Float.NaN, new 
Integer[]{searchAfter});
+  final TopFieldCollector collector = TopFieldCollector.create(sort, 
numHits, after, totalHitsThreshold);
+  searcher.search(new MatchAllDocsQuery(), collector);
+  TopDocs topDocs = collector.topDocs();
+  for (int i = 0; i < numHits; i++) {
+int expectedDocID = searchAfter - 1 - i;
+assertEquals(expectedDocID, topDocs.scoreDocs[i].doc);
+  }
+  assertEquals(topDocs.scoreDocs.length, numHits);
+  // assert that many hits were collected including all hits before 
searchAfter
+  assertTrue(topDocs.totalHits.value > searchAfter);
+
+}
+
+writer.close();
+reader.close();
+dir.close();
+  }
+
+
+  public void testDocSortOptimization() throws IOException {
+final Directory dir = newDirectory();
+final IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig());
+final int numDocs = atLeast(1500);

Review comment:
   why do you need that many documents ? `100` should be enough, no ?

##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java
##
@@ -290,5 +299,114 @@ public void testFloatSortOptimization() throws 
IOException {
 dir.close();
   }
 
+  public void testDocSortOptimizationWithAfter() throws IOException {
+final Directory dir = newDirectory();
+final IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig());
+final int numDocs = atLeast(1500);
+for (int i = 0; i < numDocs; ++i) {
+  final Document doc = new Document();
+  writer.addDocument(doc);
+  if ((i > 0) && (i % 500 == 0)) {
+writer.commit();
+  }
+}
+final IndexReader reader = DirectoryReader.open(writer);
+IndexSearcher searcher = new IndexSearcher(reader);
+final int numHits = 3;
+final int totalHitsThreshold = 3;
+final int searchAfter = 1400;
+
+// sort by _doc with search after should trigger optimization
+{
+  final Sort sort 

[GitHub] [lucene-solr] mayya-sharipova commented on pull request #1725: LUCENE-9449 Skip docs with _doc sort and "after"

2020-09-04 Thread GitBox


mayya-sharipova commented on pull request #1725:
URL: https://github.com/apache/lucene-solr/pull/1725#issuecomment-687065707


   @jimczi Thanks for the review so far, I am wondering if you have any further 
comments?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on pull request #1827: SOLR-14792: Remove VelocityResponseWriter

2020-09-04 Thread GitBox


epugh commented on pull request #1827:
URL: https://github.com/apache/lucene-solr/pull/1827#issuecomment-687060583


   @erikhatcher BTW, the .adoc format renders nicely in Github if you were to 
pull the ref guide docs over to https://github.com/erikhatcher/solritas.   We 
could also update the link in the solr.cool entry to point directly to them, 
instead of the general github README page ;-)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?

2020-09-04 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190659#comment-17190659
 ] 

Dawid Weiss commented on LUCENE-9500:
-

bq. I missed to run precommit after cherry-picking.

No, not you, Uwe! :) [yellow card emoji]

> Did we hit a DEFLATE bug?
> -
>
> Key: LUCENE-9500
> URL: https://issues.apache.org/jira/browse/LUCENE-9500
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.x, master (9.0), 8.7
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Critical
>  Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, 
> jdk15
> Fix For: 8.x, master (9.0), 8.7
>
> Attachments: PresetDictTest.java, test_data.txt
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I've been digging 
> [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/]
>  all day and managed to isolate a simple reproduction that shows the problem. 
> I've been starring at it all day and can't find what we are doing wrong, 
> which makes me wonder whether we're calling DEFLATE the wrong way or whether 
> we hit a DEFLATE bug. I've looked at it so much that I may be missing the 
> most obvious stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?

2020-09-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190658#comment-17190658
 ] 

Uwe Schindler commented on LUCENE-9500:
---

Ah, ECJ was complaining :-) Tahnks for fixing. I missed to run precommit after 
cherry-picking.

> Did we hit a DEFLATE bug?
> -
>
> Key: LUCENE-9500
> URL: https://issues.apache.org/jira/browse/LUCENE-9500
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.x, master (9.0), 8.7
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Critical
>  Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, 
> jdk15
> Fix For: 8.x, master (9.0), 8.7
>
> Attachments: PresetDictTest.java, test_data.txt
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I've been digging 
> [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/]
>  all day and managed to isolate a simple reproduction that shows the problem. 
> I've been starring at it all day and can't find what we are doing wrong, 
> which makes me wonder whether we're calling DEFLATE the wrong way or whether 
> we hit a DEFLATE bug. I've looked at it so much that I may be missing the 
> most obvious stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?

2020-09-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190653#comment-17190653
 ] 

Uwe Schindler commented on LUCENE-9500:
---

Thanks Adrien! Was this causing javadocs build failure?

> Did we hit a DEFLATE bug?
> -
>
> Key: LUCENE-9500
> URL: https://issues.apache.org/jira/browse/LUCENE-9500
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.x, master (9.0), 8.7
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Critical
>  Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, 
> jdk15
> Fix For: 8.x, master (9.0), 8.7
>
> Attachments: PresetDictTest.java, test_data.txt
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I've been digging 
> [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/]
>  all day and managed to isolate a simple reproduction that shows the problem. 
> I've been starring at it all day and can't find what we are doing wrong, 
> which makes me wonder whether we're calling DEFLATE the wrong way or whether 
> we hit a DEFLATE bug. I've looked at it so much that I may be missing the 
> most obvious stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9500) Did we hit a DEFLATE bug?

2020-09-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190652#comment-17190652
 ] 

ASF subversion and git services commented on LUCENE-9500:
-

Commit d7299890c75bfe403f14390a0dfb70e2689fdf3c in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d729989 ]

LUCENE-9500: There is no setDictionary(ByteBuffer) in JDK8.


> Did we hit a DEFLATE bug?
> -
>
> Key: LUCENE-9500
> URL: https://issues.apache.org/jira/browse/LUCENE-9500
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.x, master (9.0), 8.7
>Reporter: Adrien Grand
>Assignee: Uwe Schindler
>Priority: Critical
>  Labels: Java13, Java14, Java15, java11, jdk11, jdk13, jdk14, 
> jdk15
> Fix For: 8.x, master (9.0), 8.7
>
> Attachments: PresetDictTest.java, test_data.txt
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I've been digging 
> [https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/23/]
>  all day and managed to isolate a simple reproduction that shows the problem. 
> I've been starring at it all day and can't find what we are doing wrong, 
> which makes me wonder whether we're calling DEFLATE the wrong way or whether 
> we hit a DEFLATE bug. I've looked at it so much that I may be missing the 
> most obvious stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…

2020-09-04 Thread GitBox


dweiss commented on pull request #1830:
URL: https://github.com/apache/lucene-solr/pull/1830#issuecomment-687038375


   @uschindler you know these checks better - to me I refactored this stuff in 
the same way as before (and they're running much faster now since they can run 
in parallel). If you have a spare minute to eyeball though it'd be good.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss opened a new pull request #1830: LUCENE-9506: Gradle: split validateSourcePatterns into per-project an…

2020-09-04 Thread GitBox


dweiss opened a new pull request #1830:
URL: https://github.com/apache/lucene-solr/pull/1830


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9506) Gradle: split validateSourcePatterns into per-project and root-specific tasks (allow parallelism)

2020-09-04 Thread Dawid Weiss (Jira)
Dawid Weiss created LUCENE-9506:
---

 Summary: Gradle: split validateSourcePatterns into per-project and 
root-specific tasks (allow parallelism)
 Key: LUCENE-9506
 URL: https://issues.apache.org/jira/browse/LUCENE-9506
 Project: Lucene - Core
  Issue Type: Task
Reporter: Dawid Weiss
Assignee: Dawid Weiss






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9505) Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation

2020-09-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190641#comment-17190641
 ] 

ASF subversion and git services commented on LUCENE-9505:
-

Commit d31a42763be26fcaee886ea2249a4d8d4bc0a119 in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d31a427 ]

LUCENE-9505: add dummy outputs. (#1829)



> Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation
> --
>
> Key: LUCENE-9505
> URL: https://issues.apache.org/jira/browse/LUCENE-9505
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have several tasks that only have inputs and no outputs. For incremental 
> builds, this means that they are only re-run if:
> * the inputs change,
> * --rerun-tasks is given on command line.
> Gradle has a built-in rule for "cleaning" the outputs of a task - a 
> "clean[TaskName]" rule, so in theory you could clean the outputs of a single 
> task and re-run the entire build with only that task being re-run. It would 
> be sometimes convenient.
> We could add a dummy output to these tasks instead of upToDateWhen (for 
> example, touch an empty file at the end of the task's execution). Then 
> cleanXXX should work for them (and so would incremental builds).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9505) Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation

2020-09-04 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9505.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation
> --
>
> Key: LUCENE-9505
> URL: https://issues.apache.org/jira/browse/LUCENE-9505
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have several tasks that only have inputs and no outputs. For incremental 
> builds, this means that they are only re-run if:
> * the inputs change,
> * --rerun-tasks is given on command line.
> Gradle has a built-in rule for "cleaning" the outputs of a task - a 
> "clean[TaskName]" rule, so in theory you could clean the outputs of a single 
> task and re-run the entire build with only that task being re-run. It would 
> be sometimes convenient.
> We could add a dummy output to these tasks instead of upToDateWhen (for 
> example, touch an empty file at the end of the task's execution). Then 
> cleanXXX should work for them (and so would incremental builds).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss merged pull request #1829: LUCENE-9505: add dummy outputs to tasks with no outputs

2020-09-04 Thread GitBox


dweiss merged pull request #1829:
URL: https://github.com/apache/lucene-solr/pull/1829


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss opened a new pull request #1829: LUCENE-9505: add dummy outputs to tasks with no outputs

2020-09-04 Thread GitBox


dweiss opened a new pull request #1829:
URL: https://github.com/apache/lucene-solr/pull/1829


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9505) Gradle tasks with outputs.upToDateWhen {true} are hard to re-run in separation

2020-09-04 Thread Dawid Weiss (Jira)
Dawid Weiss created LUCENE-9505:
---

 Summary: Gradle tasks with outputs.upToDateWhen {true} are hard to 
re-run in separation
 Key: LUCENE-9505
 URL: https://issues.apache.org/jira/browse/LUCENE-9505
 Project: Lucene - Core
  Issue Type: Task
Reporter: Dawid Weiss
Assignee: Dawid Weiss


We have several tasks that only have inputs and no outputs. For incremental 
builds, this means that they are only re-run if:
* the inputs change,
* --rerun-tasks is given on command line.

Gradle has a built-in rule for "cleaning" the outputs of a task - a 
"clean[TaskName]" rule, so in theory you could clean the outputs of a single 
task and re-run the entire build with only that task being re-run. It would be 
sometimes convenient.

We could add a dummy output to these tasks instead of upToDateWhen (for 
example, touch an empty file at the end of the task's execution). Then cleanXXX 
should work for them (and so would incremental builds).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9418) Ordered intervals can give inaccurate hits on interleaved terms

2020-09-04 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190630#comment-17190630
 ] 

Alan Woodward commented on LUCENE-9418:
---

Hi [~Brain2000], I think you have a different problem there; this issue 
concerns Interval queries, whereas you look to have a problem with a sorting 
collector.  Can you open a new issue, with a reproducible test failure if 
possible?

> Ordered intervals can give inaccurate hits on interleaved terms
> ---
>
> Key: LUCENE-9418
> URL: https://issues.apache.org/jira/browse/LUCENE-9418
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Given the text 'A B A C', an ordered interval over 'A B C' will return the 
> inaccurate interval [2, 3], due to the way minimization is handled after 
> matches are found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9497) Integerate Error Prone ( Static Analysis Tool ) during compilation

2020-09-04 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190628#comment-17190628
 ] 

Dawid Weiss commented on LUCENE-9497:
-

The compilation problem was caused by the fact that we exclude error prone from 
published/ runtime dependencies (transitive dependency from guava) but at the 
same time if it's missing, error prone's annotation processor can complain 
about missing annotation types. Not to mention each and every of these 
dependencies uses a different version of error_prone_annotations... this is 
confusing like hell, sigh.

I think I fixed most of these issues here:
https://github.com/apache/lucene-solr/pull/1828

I also looked at how individual pieces of code can be marked as valid - error 
prone uses regular SuppressWarnings annotation with custom names for each 
category. This is fine, I think.


> Integerate Error Prone ( Static Analysis Tool ) during compilation
> --
>
> Key: LUCENE-9497
> URL: https://issues.apache.org/jira/browse/LUCENE-9497
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Varun Thacker
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Integrate [https://github.com/google/error-prone] during compilation of our 
> source code to catch mistakes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya reassigned SOLR-13973:
---

Assignee: Ishan Chattopadhyaya

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190623#comment-17190623
 ] 

Ishan Chattopadhyaya commented on SOLR-13973:
-

[~mkalkbrenner], I think we should fix this (SOLR-14768) in 8.7, while 
simultaneously deprecating it. 

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13973) Deprecate Tika

2020-09-04 Thread Markus Kalkbrenner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190622#comment-17190622
 ] 

Markus Kalkbrenner commented on SOLR-13973:
---

In fact you don't need to deprecate the feature in 8.7 anymore as you already 
broke it in 8.6 ;)
see SOLR-14768

> Deprecate Tika
> --
>
> Key: SOLR-13973
> URL: https://issues.apache.org/jira/browse/SOLR-13973
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Priority: Blocker
> Fix For: 8.7
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr's primary responsibility should be to focus on search and scalability. 
> Having to deal with the problems (CVEs) of Velocity, Tika etc. can slow us 
> down. I propose that we deprecate it going forward.
> Tika can be run outside Solr. Going forward, if someone wants to use these, 
> it should be possible to bring them into third party packages and installed 
> via package manager.
> Plan is to just to throw warnings in logs and add deprecation notes in 
> reference guide for now. Removal can be done in 9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1816: LUCENE-9497: Integerate Error Prone ( Static Analysis Tool ) during compilation

2020-09-04 Thread GitBox


dweiss commented on pull request #1816:
URL: https://github.com/apache/lucene-solr/pull/1816#issuecomment-686969828


   I made a few changes to consolidate the version used across compilations. 
Still don't know what the original problem was, let's see if this passes though.
   https://github.com/apache/lucene-solr/pull/1828



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   >