Re: Solr 7.2.1 OOME

2019-11-06 Thread Paras Lehana
Hi Antony,

Each replica runs on a 10G heap.


Is the min and max heap size same 10G? If full GCs are taking time, have
you tried decreasing the heap to say 4-6GB? Nice article

by Shawn. Also, your image will probably not reach everyone - try hosting
on another site. There are many GC reporting sites too.

On Thu, 7 Nov 2019 at 09:20, Antony Alphonse 
wrote:

> Hi,
>
> I am trying to get some help on frequent OOME I am seeing in my
> collection. I have a single shard with four replicas. Each replica runs on
> a 10G heap. I have around 12 mil documents with size on disk around 15G.
> From the plot below, it looks to me that full GC are taking longer time. I
> am looking for suggestion whether I should look into G1 or tune the heap.
>
>
> [image: image.png]
>
> Thanks
> AA
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: [Q] Ref Guide - What is Multi-Term Expansion?

2019-11-06 Thread Paras Lehana
Thank you so much, Erick and Alex!

Strange how I could not understand it when wildcard were the first things
we used in queries when I migrated Auto-Suggest to Solr. I remember how
much we faced Stemming not working on partial user queries (servicin* not
matching with services). We started using "partialQuery OR partialQuery*"
but it scored exactly matching terms more. We experimented with many more
options like KeywordRepeat before actually moving to EdgeNGrams. Erick's
articles contributed so much in the journey!

Anyways, I'm clear with the definition now. For future reference, this is
the summary of the text:

Many filters won't work with multi-term expansion (expansion of terms due
to wildcard or regex, for example, run* -> run, running, runner) and those
filters will give the input unchanged. If you want to specify how your
chain behaves differently for multi-terms, define tokenizers/filters in
 additionally.

Here is a nice list of analyzers supporting multi-term expansion (credits
to Alexandre Rafalovitch): http://www.solr-start.com/info/analyzers/



On Wed, 6 Nov 2019 at 19:04, Erick Erickson  wrote:

> Say you want to search for “run*”. That should match “run”, “runner”,
> “running”, “runs” etc. one term->many == multiterm expansion. Conceptually,
> the search becomes (run OR runner OR running OR runs), all terms actually
> found in the index that have the prefix “run”.
>
> My advice would be to ignore it completely, that’s an expert level option
> that came about because we got really tired of explaining that wildcards
> didn’t used to have _any_ analysis done, so searching for “Run*" would not
> match “run” due to the case difference.
>
> Best,
> Erick
> > On Nov 6, 2019, at 7:21 AM, Alexandre Rafalovitch 
> wrote:
> >
> > It mentions it in the start  paragraph "Prefix, Wildcard, Regex, etc."
> >
> > So, if you search for "abc*" it expands to all terms that start from
> > "abc", but then not everything can handle this situation as it is a
> > lot of terms in the same position. So, not all analyzers can handle
> > that and normally it is just an automatically built subset of safe
> > ones.
> >
> > I mark them with "(multi)" in my - very out of date, but still useful
> > - resource: http://www.solr-start.com/info/analyzers/
> >
> > Regards,
> >   Alex.
> >
> > On Wed, 6 Nov 2019 at 21:19, Paras Lehana 
> wrote:
> >>
> >> Hi Community,
> >>
> >> In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
> >> Multi-Term Expansion*
> >> <
> https://lucene.apache.org/solr/guide/8_3/analyzers.html#analysis-for-multi-term-expansion
> >,
> >> the text talks about multi-term expansion and explicit use of *analyzer
> >> type="multiterm"*.
> >>
> >> I could not understand what exactly is multi-term expansion and what are
> >> the use cases for using "multiterm". *[Q1]*
> >>
> >> --
> >> --
> >> Regards,
> >>
> >> *Paras Lehana* [65871]
> >> Development Engineer, Auto-Suggest,
> >> IndiaMART Intermesh Ltd.
> >>
> >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> >> Noida, UP, IN - 201303
> >>
> >> Mob.: +91-9560911996
> >> Work: 01203916600 | Extn:  *8173*
> >>
> >> --
> >> IMPORTANT:
> >> NEVER share your IndiaMART OTP/ Password with anyone.
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Ref Guide - Precision & Recall of Analyzers

2019-11-06 Thread Paras Lehana
Hey Mikhail,

My doubt was regarding doing this on the query side. I think the text
probably meant adding the filter on index side then.

If willing to do this on the index side, as you suggested, we can capture
all-caps by a regex like ^[A-Z]*$. But how do we proceed? Here is what I
can think of:

   1. Capture all-caps in a copyField during index time. Replace with some
   signal like RAM -> RAM. Keep the copyField query analysis
   same and query on both the fields. In the original field, remove the
   all-caps token so that it doesn't match with any lowercase token.
   2. Mark all-caps KEYWORD if there's any method so that LowerCase after
   it doesn't work on all-caps. Use KeywordRepeat for keeping the lowercase
   token as well.
   3. Use PatternReplace to make all-caps proper acronyms (RAM -> R.A.M.)
   and use something like TypeAsPayload
   

   to mark the token it as .

I'm still curious to find the proper way because all of my suggestions
would actually be workarounds only even if they work.

*If there's no simpler way, can we have a JIRA requirement for having a
filter that marks acronym KEYWORD so that further analysis on it doesn't
work or even better, have an argument in LowerCase (like excludeAllCaps)
which doesn't convert all-caps to lowercase? *


On Wed, 6 Nov 2019 at 20:42, Mikhail Khludnev  wrote:

> Hello, Audrey.
>
> Can you create a regexp capturing all-caps for
>
> https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#pattern-replace-filter
>  ?
>
> On Wed, Nov 6, 2019 at 6:36 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com
>  wrote:
>
> > I would also love to know what filter to use to ignore capitalized
> > acronyms... which one can do this OOTB?
> >
> > --
> > Audrey Lorberfeld
> > Data Scientist, w3 Search
> > IBM
> > audrey.lorberf...@ibm.com
> >
> >
> > On 11/6/19, 3:54 AM, "Paras Lehana"  wrote:
> >
> > Hi Community,
> >
> > In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters*
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F3_understanding-2Danalyzers-2Dtokenizers-2Dand-2Dfilters.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=yEGsn7-9_UxyVA_itjyjmvW4UAAO1WE_p0rDKTnULaE&s=dmVDu9CjG_4iJDG59qtuPB4vaj8769FPo7NwGyVPc9g&e=
> > >
> > section, the text talks about precision and recall depending on how
> > you use
> > analyzers during query and index time:
> >
> > For indexing, you often want to simplify, or normalize, words. For
> > example,
> > > setting all letters to lowercase, eliminating punctuation and
> > accents,
> > > mapping words to their stems, and so on. Doing so can *increase
> > recall *because,
> > > for example, "ram", "Ram" and "RAM" would all match a query for
> > "ram". To *increase
> > > query-time precision*, a filter could be employed to narrow the
> > matches
> > > by, for example, *ignoring all-cap acronyms* if you’re interested
> in
> > male
> > > sheep, but not Random Access Memory.
> >
> >
> > In first case (about Recall), is it assumed that "ram" should match
> to
> > all
> > three? *[Q1] *Because, to increase recall, we have to decrease false
> > negatives (documents not retrieved but are relevant). In other case
> > (if the
> > three are not intended to match the query), precision is actually
> > decreased
> > here (false positives are increased).
> >
> > This makes sense for the second case, where precision should increase
> > as we
> > are decreasing false positives (documents marked relevant wrongly).
> >
> > However, the text talks about the method of "employing a filter that
> > ignores all-cap acronyms". How are we supposed to do that on query
> > time?
> > *[Q2]* Weren't we supposed to remove filter (LCF) during the index
> > time?
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, Auto-Suggest,
> > IndiaMART Intermesh Ltd.
> >
> > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> > Noida, UP, IN - 201303
> >
> > Mob.: +91-9560911996
> > Work: 01203916600 | Extn:  *8173*
> >
> > --
> > IMPORTANT:
> > NEVER share your IndiaMART OTP/ Password with anyone.
> >
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Query regarding truncated Date Sort

2019-11-06 Thread Inderjeet Singh
Hi

I am currently using solr 7.1.0.  I have indexed a few documents which have
a date associated with it.
The Managed schema configuration for that field is :
 
 

Example of few values are :
  "Published_Date":"2019-10-25T00:00:00Z"
"Published_Date":"2019-10-21T10:00:00Z"

I want to sort the documents based on these Published_Date parameters but
only on the day(not the time/timezones)
Sorting on the basis of '2019-10-25'

Please help me in finding how I could achieve this.

Regards
Inderjeet Singh


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Paras Lehana
Hi Guilherme.

I am sending they analysis result and the json result as requested.


Thanks for the effort. Luckily, I can see your attachments (low quality
though).

>From the analysis screen, the analysis is working as expected. One of the
reasons for query="lymphoid and *a* non-lymphoid cell" not matching
document containing "Lymphoid and a non-Lymphoid cell" I can initially
think of is: the stopword "a" is probably present in post-analysis either
of query or index. Did you tweak your index time analysis after indexing?

Do two things:

   1. Post the analysis screen for and index=*"Immunoregulatory
   interactions between a Lymphoid and a non-Lymphoid cell"* and
"query=*"lymphoid
   and a non-lymphoid cell"*. Try hosting the image and providing the link
   here.
   2. Give the same JSON output as you have sent but this time with
   *"echoParams=all"*. Also, post the exact Solr query url.



On Wed, 6 Nov 2019 at 21:07, Erick Erickson  wrote:

> I don’t see the attachments, maybe I deleted old e-mails or some such. The
> Apache server is fairly aggressive about stripping attachments though, so
> it’s also possible they didn’t make it through.
>
> > On Nov 6, 2019, at 9:28 AM, Guilherme Viteri  wrote:
> >
> > Thanks Erick.
> >
> >> First, your index and analysis chains are considerably different, this
> can easily be a source of problems. In particular, using two different
> tokenizers is a huge red flag. I _strongly_ recommend against this unless
> you’re totally sure you understand the consequences. Additionally, your use
> of the length filter is suspicious, especially since your problem statement
> is about the addition of a single letter term and the min length allowed on
> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is
> filtered out in both cases, but maybe you’ve found something odd about the
> interactions.
> > I will investigate the min length and post the results later.
> >
> >> Second, I have no idea what this will do. Are the equal signs typos?
> Used by custom code?
> > This the url in my application, not solr params. That's the query string.
> >
> >> What does “species=“ do? That’s not Solr syntax, so it’s likely that
> all the params with an equal-sign are totally ignored unless it’s just a
> typo.
> > This is part of the application. Species will be used later on in solr
> to filter out the result. That's not solr. That my app params.
> >
> >> Third, the easiest way to see what’s happening under the covers is to
> add “&debug=true” to the query and look at the parsed query. Ignore all the
> relevance calculations for the nonce, or specify “&debug=query” to skip
> that part.
> > The two json files i've sent, they are debugQuery=on and the explain tag
> is present.
> > I will try the searching the way you mentioned.
> >
> > Thank for your inputs
> >
> > Guilherme
> >
> >> On 6 Nov 2019, at 14:14, Erick Erickson 
> wrote:
> >>
> >> Fwd to another server
> >>
> >> First, your index and analysis chains are considerably different, this
> can easily be a source of problems. In particular, using two different
> tokenizers is a huge red flag. I _strongly_ recommend against this unless
> you’re totally sure you understand the consequences. Additionally, your use
> of the length filter is suspicious, especially since your problem statement
> is about the addition of a single letter term and the min length allowed on
> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is
> filtered out in both cases, but maybe you’ve found something odd about the
> interactions.
> >>
> >> Second, I have no idea what this will do. Are the equal signs typos?
> Used by custom code?
> >>
> 
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >>
> >> What does “species=“ do? That’s not Solr syntax, so it’s likely that
> all the params with an equal-sign are totally ignored unless it’s just a
> typo.
> >>
> >> Third, the easiest way to see what’s happening under the covers is to
> add “&debug=true” to the query and look at the parsed query. Ignore all the
> relevance calculations for the nonce, or specify “&debug=query” to skip
> that part.
> >>
> >> 90% + of the time, the question “why didn’t this query do what I
> expect” is answered by looking at the “&debug=query” output and the
> analysis page in the admin UI. NOTE: for the analysis page be sure to look
> at _both_ the query and index output. Also, and very important about the
> analysis page (and this is confusing) is that this _assumes_ that what you
> put in the text boxes have made it through the query parser intact and is
> analyzed by the field selected. Consider the search "q=field:word1 word2".
> Now you type “word1 word2” into the analysis text box and it looks like
> what you expect. That’s misleading because the query is _parsed_ as
> "field:word1 default_search_field:word2”. This is where “&debug=query”
> helps.
> >>

Re: Good Open Source Front End for Solr

2019-11-06 Thread Alexandre Rafalovitch
For what purpose?

Because, for example, Solr is not designed to serve direct to the browser,
just like Mysql is not. So, usually, there is a custom middleware.

On the other hand, Solr can serve as JDBC engine so you could use JDBC
frontends to explore data. Or as an engine for visualisations. Etc.

And of course, it ships with Admin UI foe internal purposes.

What's your specific use case?

Regards,
Alex

On Thu, Nov 7, 2019, 3:17 PM Java Developer,  wrote:

> Hi,
>
> What is the best open source front-end for Solr
>
> Thanks
>


subscribe - renew

2019-11-06 Thread Antony Alphonse



Good Open Source Front End for Solr

2019-11-06 Thread Java Developer
Hi,

What is the best open source front-end for Solr

Thanks


Solr 7.2.1 OOME

2019-11-06 Thread Antony Alphonse
Hi,

I am trying to get some help on frequent OOME I am seeing in my collection.
I have a single shard with four replicas. Each replica runs on a 10G heap.
I have around 12 mil documents with size on disk around 15G. From the plot
below, it looks to me that full GC are taking longer time. I am looking for
suggestion whether I should look into G1 or tune the heap.


[image: image.png]

Thanks
AA


Solr healthcheck fails all the time

2019-11-06 Thread amruth
I am running Solr Cloud 6.6 and all the nodes fail healthcheck too frequently
with *Read Timed out * error. Here is the stacktrace,

http://solr-host1:8983/solr/collection1/admin/ping is DOWN, error:
HTTPConnectionPool(host='solr-host1', port=8983): Read timed out. (read
timeout=1). Connection failed after 1001 ms

Can someone please say why it fails all the time?(at least once every 10min)



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Error with Solr Suggester using lookupIml = FreeTextLookupFactory

2019-11-06 Thread Erick Erickson
This setting is pretty dangerous. It’ll build the suggester every time the Solr 
instance starts. The DocumentDictionaryFactory will read _every_ document in 
your index to extract the stored “suggest” field and create the dictionary.

But this points to the fact that you hadn’t built the dictionary when you 
originally posted the problem, you can do this by issuing a command (curl it 
in, or on the browser line) like:

blahblahblahblah/solr/collection/suggest?suggest.build=true&suggest=true

Best,
Erick

> On Nov 6, 2019, at 4:32 PM, Tyrone Tse  wrote:
> 
> It's working now that I changed the solrconfig.xml to
> 
>
>
>mySuggester
>FreeTextLookupFactory
>DocumentDictionaryFactory
>suggest
>3
> 
> name="suggestFreeTextAnalyzerFieldType">text_en_splitting
>*true*
>
>
> 
> On Wed, Nov 6, 2019 at 3:11 PM Tyrone Tse  wrote:
> 
>> What's the command to build it
>> 
>> 
>> On Wed, Nov 6, 2019 at 3:06 PM Mikhail Khludnev  wrote:
>> 
>>> Hello,
>>> 
>>> Have you build suggester before requesting?
>>> 
>>> On Wed, Nov 6, 2019 at 12:50 PM Tyrone Tse  wrote:
>>> 
 Solr version 8.1.1
 
 My schema
 
 >>> multiValued="false" indexed="true"/>
 
 
 solconfig.xml
 


mySuggester
FreeTextLookupFactory
DocumentDictionaryFactory
suggest
3
 
>>> name="suggestFreeTextAnalyzerFieldType">text_en_splitting
false


 
>>>startup="lazy" >

true
10


suggest


 
 The suggest query
 
 
>>> http://localhost:8983/solr/catalog/suggest?suggest=true&suggest.dictionary=mySuggester&suggest.q=gin
 
 works on Red Hat Enterprise Linux 7.6
 
 it returns
 
 {
  "responseHeader":{
"status":0,
"QTime":0},
  "suggest":{"mySuggester":{
  "gin":{
"numFound":10,
"suggestions":[{
"term":"gin",
"weight":13613207305387128,
"payload":""},
  {
"term":"ginjo",
"weight":3986422076966947,
"payload":""},
 ...
 
 But when I  on my Mac with OS High Sierra
 Generates the error
 
 "Lookup not supported at this time"
 
 "java.lang.IllegalStateException: Lookup not supported at this
>>> time\n\tat
 
 
>>> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:428)\n\tat
 
 
>>> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:399)\n\tat
 
 
>>> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:388)\n\tat
 
 
>>> org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:243)\n\tat
 
 
>>> org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:264)\n\tat
 
 
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)\n\tat
 
 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat
 
>>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat
 org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat
 
>>> 
>>> 
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> 
>> 



Re: Error with Solr Suggester using lookupIml = FreeTextLookupFactory

2019-11-06 Thread Tyrone Tse
It's working now that I changed the solrconfig.xml to



mySuggester
FreeTextLookupFactory
DocumentDictionaryFactory
suggest
3
 
text_en_splitting
*true*



On Wed, Nov 6, 2019 at 3:11 PM Tyrone Tse  wrote:

> What's the command to build it
>
>
> On Wed, Nov 6, 2019 at 3:06 PM Mikhail Khludnev  wrote:
>
>> Hello,
>>
>> Have you build suggester before requesting?
>>
>> On Wed, Nov 6, 2019 at 12:50 PM Tyrone Tse  wrote:
>>
>> > Solr version 8.1.1
>> >
>> > My schema
>> >
>> > > > multiValued="false" indexed="true"/>
>> > 
>> >
>> > solconfig.xml
>> >
>> > 
>> > 
>> > mySuggester
>> > FreeTextLookupFactory
>> > DocumentDictionaryFactory
>> > suggest
>> > 3
>> >  
>> > > > name="suggestFreeTextAnalyzerFieldType">text_en_splitting
>> > false
>> > 
>> > 
>> >
>> > > > startup="lazy" >
>> > 
>> > true
>> > 10
>> > 
>> > 
>> > suggest
>> > 
>> > 
>> >
>> > The suggest query
>> >
>> >
>> http://localhost:8983/solr/catalog/suggest?suggest=true&suggest.dictionary=mySuggester&suggest.q=gin
>> >
>> > works on Red Hat Enterprise Linux 7.6
>> >
>> > it returns
>> >
>> > {
>> >   "responseHeader":{
>> > "status":0,
>> > "QTime":0},
>> >   "suggest":{"mySuggester":{
>> >   "gin":{
>> > "numFound":10,
>> > "suggestions":[{
>> > "term":"gin",
>> > "weight":13613207305387128,
>> > "payload":""},
>> >   {
>> > "term":"ginjo",
>> > "weight":3986422076966947,
>> > "payload":""},
>> > ...
>> >
>> > But when I  on my Mac with OS High Sierra
>> > Generates the error
>> >
>> > "Lookup not supported at this time"
>> >
>> > "java.lang.IllegalStateException: Lookup not supported at this
>> time\n\tat
>> >
>> >
>> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:428)\n\tat
>> >
>> >
>> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:399)\n\tat
>> >
>> >
>> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:388)\n\tat
>> >
>> >
>> org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:243)\n\tat
>> >
>> >
>> org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:264)\n\tat
>> >
>> >
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)\n\tat
>> >
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
>> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat
>> >
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat
>> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat
>> >
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>


Re: Solr 8.3 Solrj streaming expressions do not return all field values

2019-11-06 Thread Jörn Franke
I created a JIRA for this:
https://issues.apache.org/jira/browse/SOLR-13894

On Wed, Nov 6, 2019 at 10:45 AM Jörn Franke  wrote:

> I have checked now Solr 8.3 server in admin UI. Same issue.
>
> Reproduction:
> select(search(testcollection,q=“test”,df=“Default”,defType=“edismax”,fl=“id”,
> qt=“/export”, sort=“id asc”),id,if(eq(1,1),Y,N) as found)
>
> In 8.3 it returns only the id field.
> In 8.2 it returns id,found field.
>
> Since found is generated by select (and not coming from the collection)
> there must be an issue with select.
>
> Any idea why this is happening.
>
> Debug logs do not show any error and the expression is correctly received
> by Solr.
>
> Thank you.
>
> Best regards
>
> > Am 05.11.2019 um 14:59 schrieb Jörn Franke :
> >
> > Thanks I will check and come back to you. As far as I remember (but
> have to check) the queries generated by Solr were correct
> >
> > Just to be clear the same thing works with Solr 8.2 server and Solr 8.2
> client.
> >
> > It show the odd behaviour with Solr 8.2 server and Solr 8.3 client.
> >
> >> Am 05.11.2019 um 14:49 schrieb Joel Bernstein :
> >>
> >> I'll probably need some more details. One thing that's useful is to
> look at
> >> the logs and see the underlying Solr queries that are generated. Then
> try
> >> those underlying queries against the Solr index and see what comes
> back. If
> >> you're not seeing the fields with the plain Solr queries then we know
> it's
> >> something going on below streaming expressions. If you are seeing the
> >> fields then it's the expressions themselves that are not handling the
> data
> >> as expected.
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >>
>  On Mon, Nov 4, 2019 at 9:09 AM Jörn Franke 
> wrote:
> >>>
> >>> Most likely this issue can bei also reproduced in the admin UI for the
> >>> streaming handler of a collection.
> >>>
> > Am 04.11.2019 um 13:32 schrieb Jörn Franke :
> 
>  Hi,
> 
>  I use streaming expressions, e.g.
>  Sort(Select(search(...),id,if(eq(1,1),Y,N) as found), by=“field A
> asc”)
>  (Using export handler, sort is not really mandatory , I will remove it
> >>> later anyway)
> 
>  This works perfectly fine if I use Solr 8.2.0 (server + client). It
> >>> returns Tuples in the form { “id”,”12345”, “found”:”Y”}
> 
>  However, if I use Solr 8.2.0 as server and Solr 8.3.0 as client then
> the
> >>> above statement only returns the id field, but not the found field.
> 
>  Questions:
>  1) is this expected behavior, ie Solr client 8.3.0 is in this case not
> >>> compatible with Solr 8.2.0 and server upgrade to Solr 8.3.0 will fix
> this?
>  2) has the syntax for the above expression changed? If so how?
>  3) is this not expected behavior and I should create a Jira for it?
> 
>  Thank you.
>  Best regards
> >>>
>


Re: Error with Solr Suggester using lookupIml = FreeTextLookupFactory

2019-11-06 Thread Tyrone Tse
What's the command to build it


On Wed, Nov 6, 2019 at 3:06 PM Mikhail Khludnev  wrote:

> Hello,
>
> Have you build suggester before requesting?
>
> On Wed, Nov 6, 2019 at 12:50 PM Tyrone Tse  wrote:
>
> > Solr version 8.1.1
> >
> > My schema
> >
> >  > multiValued="false" indexed="true"/>
> > 
> >
> > solconfig.xml
> >
> > 
> > 
> > mySuggester
> > FreeTextLookupFactory
> > DocumentDictionaryFactory
> > suggest
> > 3
> >  
> >  > name="suggestFreeTextAnalyzerFieldType">text_en_splitting
> > false
> > 
> > 
> >
> >  > startup="lazy" >
> > 
> > true
> > 10
> > 
> > 
> > suggest
> > 
> > 
> >
> > The suggest query
> >
> >
> http://localhost:8983/solr/catalog/suggest?suggest=true&suggest.dictionary=mySuggester&suggest.q=gin
> >
> > works on Red Hat Enterprise Linux 7.6
> >
> > it returns
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":0},
> >   "suggest":{"mySuggester":{
> >   "gin":{
> > "numFound":10,
> > "suggestions":[{
> > "term":"gin",
> > "weight":13613207305387128,
> > "payload":""},
> >   {
> > "term":"ginjo",
> > "weight":3986422076966947,
> > "payload":""},
> > ...
> >
> > But when I  on my Mac with OS High Sierra
> > Generates the error
> >
> > "Lookup not supported at this time"
> >
> > "java.lang.IllegalStateException: Lookup not supported at this time\n\tat
> >
> >
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:428)\n\tat
> >
> >
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:399)\n\tat
> >
> >
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:388)\n\tat
> >
> >
> org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:243)\n\tat
> >
> >
> org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:264)\n\tat
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)\n\tat
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Error with Solr Suggester using lookupIml = FreeTextLookupFactory

2019-11-06 Thread Mikhail Khludnev
Hello,

Have you build suggester before requesting?

On Wed, Nov 6, 2019 at 12:50 PM Tyrone Tse  wrote:

> Solr version 8.1.1
>
> My schema
>
>  multiValued="false" indexed="true"/>
> 
>
> solconfig.xml
>
> 
> 
> mySuggester
> FreeTextLookupFactory
> DocumentDictionaryFactory
> suggest
> 3
>  
>  name="suggestFreeTextAnalyzerFieldType">text_en_splitting
> false
> 
> 
>
>  startup="lazy" >
> 
> true
> 10
> 
> 
> suggest
> 
> 
>
> The suggest query
>
> http://localhost:8983/solr/catalog/suggest?suggest=true&suggest.dictionary=mySuggester&suggest.q=gin
>
> works on Red Hat Enterprise Linux 7.6
>
> it returns
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":0},
>   "suggest":{"mySuggester":{
>   "gin":{
> "numFound":10,
> "suggestions":[{
> "term":"gin",
> "weight":13613207305387128,
> "payload":""},
>   {
> "term":"ginjo",
> "weight":3986422076966947,
> "payload":""},
> ...
>
> But when I  on my Mac with OS High Sierra
> Generates the error
>
> "Lookup not supported at this time"
>
> "java.lang.IllegalStateException: Lookup not supported at this time\n\tat
>
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:428)\n\tat
>
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:399)\n\tat
>
> org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:388)\n\tat
>
> org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:243)\n\tat
>
> org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:264)\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat
>


-- 
Sincerely yours
Mikhail Khludnev


Error with Solr Suggester using lookupIml = FreeTextLookupFactory

2019-11-06 Thread Tyrone Tse
Solr version 8.1.1

My schema




solconfig.xml



mySuggester
FreeTextLookupFactory
DocumentDictionaryFactory
suggest
3
 
text_en_splitting
false





true
10


suggest



The suggest query
http://localhost:8983/solr/catalog/suggest?suggest=true&suggest.dictionary=mySuggester&suggest.q=gin

works on Red Hat Enterprise Linux 7.6

it returns

{
  "responseHeader":{
"status":0,
"QTime":0},
  "suggest":{"mySuggester":{
  "gin":{
"numFound":10,
"suggestions":[{
"term":"gin",
"weight":13613207305387128,
"payload":""},
  {
"term":"ginjo",
"weight":3986422076966947,
"payload":""},
...

But when I  on my Mac with OS High Sierra
Generates the error

"Lookup not supported at this time"

"java.lang.IllegalStateException: Lookup not supported at this time\n\tat
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:428)\n\tat
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:399)\n\tat
org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.lookup(FreeTextSuggester.java:388)\n\tat
org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:243)\n\tat
org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:264)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat


Re: Need some help on solr versions (LTS vs stable)

2019-11-06 Thread Erick Erickson
Pretty much correct. The only change I’d make is that 7x is not actively being 
supported in the sense that only seriously critical bugs will be addressed.   
You’ll note that the last release of 7x was 7.7.2 in early June. Increased 
functionality, speedups, etc won’t be back-ported. 

So I can’t think of any reason to go with 7x over 8x if you’re starting 
something new.

Best,
Erick

> On Nov 6, 2019, at 11:58 AM, suyog joshi  wrote:
> 
> Hi Erick,
> 
> Thank you so much for sharing detailed information, indeed its really
> helpful for us to plan out the things. Really appreciate your guidance.
> 
> So we can say its better to go with latest stable version (8.x) instead of
> 7.x, which is LTS right now, but can soon become EOL post launching of 9.x
> sometime early next year.
> 
> Kindly correct me, if missed out something !
> 
> Will reach out to you/community, in case any additional info is needed.
> 
> Once again, thanks much !!
> 
> Regards,
> Suyog Joshi
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Need some help on solr versions (LTS vs stable)

2019-11-06 Thread suyog joshi
Hi Erick,

Thank you so much for sharing detailed information, indeed its really
helpful for us to plan out the things. Really appreciate your guidance.

So we can say its better to go with latest stable version (8.x) instead of
7.x, which is LTS right now, but can soon become EOL post launching of 9.x
sometime early next year.

Kindly correct me, if missed out something !

Will reach out to you/community, in case any additional info is needed.

Once again, thanks much !!

Regards,
Suyog Joshi



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Filtering point fields filters everything.

2019-11-06 Thread Webster Homer
My company has been using solr for searching our product catalog. We migrated 
the data from Solr 6.6 to Solr 7.2. I am investigating the changes needed to 
migrate to Solr 8.*. Our current schema has a number of fields using the trie 
data types which are deprecated in 7 and gone in 8. I went through the schema 
and changed the trie fields to their point equivalent.
For example we have these field types and fields defined:
  
  
  
  
  
  
  
  
  
  
  




These last two were converted from the older types, they were originally 
defined as:



In the process of the update I changed the version of the schema


And the lucene match version
7.2.0

After making these changes I created a new collection and used our ETL to load 
it. We saw no errors during the data load.

The problem I see  is that if I try to filter on facet_fwght I get no results
"fq":"facet_fwght:[100 TO 200]", returns no documents, nor does
facet_fwght:*

Even more bizarre when I used the Admin Console schema browser, it sees the 
fields but when I try to load the term info for any point field, nothing loads.

On the other hand, I can facet on facet_fwght, I just cannot filter on it. I 
couldn't get values for index_date either even though every record has it set 
with the default of NOW

So what am I doing wrong with the point fields? I expected to be able to do 
just about everything with the point fields I could do with the deprecated trie 
fields.

Regards,
Webster Homer
This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer 
to access the German, French, Spanish and Portuguese versions of this 
disclaimer.


Re: ConcurrentModificationException in SolrInputDocument writeMap

2019-11-06 Thread Mikhail Khludnev
Hello, Tim.
Please confirm my understanding. Does exception happens in standalone Java
ingesting app?
If, it's so, Does it reuse either SolrInputDocument instances of
fields/values collections between update calls?

On Wed, Nov 6, 2019 at 8:00 AM Tim Swetland  wrote:

> Nevermind my comment on not having this problem in 8.1. We do have it there
> as well, I just didn't look far enough back in our logs on my initial
> search. Would still appreciate whatever thoughts anyone might have on the
> exception.
>
> On Wed, Nov 6, 2019 at 10:17 AM Tim Swetland  wrote:
>
> > I'm currently running into a ConcurrentModificationException ingesting
> > data as we attempt to upgrade from Solr 8.1 to 8.2. It's not every
> > document, but it definitely appears regularly in our logs. We didn't run
> > into this problem in 8.1, so I'm not sure what might have changed. I feel
> > like this is probably a bug, but if there's a workaround or if there's an
> > idea of something I might be doing wrong, please let me know.
> >
> > Stack trace:
> > o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
> > SolrCmdDistributor$Req: cmd=add{_version=,id=};
> node=StdNode:
> > https:///solr/coll_shard1_replica_n2/ to https://
> /solr/coll_shard1_replica_n2/
> > => java.util.ConcurrentModificationException
> > at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
> > java.util.ConcurrentModificationException: null
> >   at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
> >   at
> >
> org.apache.solr.common.SolrInputDocument.writeMap(SolrInputDocument.java:51)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:658)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383)
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:813)
> >
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:411)
> >
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:750)
> >
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:395)
> >
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:248)
> >
> >   at
> >
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355)
> >
> >   at
> > org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
> >   at
> > org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:167)
> >   at
> >
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:102)
> >   at
> >
> org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
> >   at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:338)
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:231)
> >
> >   at
> >
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176)
> >
> >   at
> >
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> >   at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil
> > .java:209)
> >   at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >   at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >
> >   at java.lang.Thread.run(Thread.java:748)
> >
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: ConcurrentModificationException in SolrInputDocument writeMap

2019-11-06 Thread Tim Swetland
Nevermind my comment on not having this problem in 8.1. We do have it there
as well, I just didn't look far enough back in our logs on my initial
search. Would still appreciate whatever thoughts anyone might have on the
exception.

On Wed, Nov 6, 2019 at 10:17 AM Tim Swetland  wrote:

> I'm currently running into a ConcurrentModificationException ingesting
> data as we attempt to upgrade from Solr 8.1 to 8.2. It's not every
> document, but it definitely appears regularly in our logs. We didn't run
> into this problem in 8.1, so I'm not sure what might have changed. I feel
> like this is probably a bug, but if there's a workaround or if there's an
> idea of something I might be doing wrong, please let me know.
>
> Stack trace:
> o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
> SolrCmdDistributor$Req: cmd=add{_version=,id=}; node=StdNode:
> https:///solr/coll_shard1_replica_n2/ to 
> https:///solr/coll_shard1_replica_n2/
> => java.util.ConcurrentModificationException
> at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
> java.util.ConcurrentModificationException: null
>   at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
>   at
> org.apache.solr.common.SolrInputDocument.writeMap(SolrInputDocument.java:51)
>   at
> org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:658)
>   at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383)
>   at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
>   at
> org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:813)
>
>   at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:411)
>
>   at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
>   at
> org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:750)
>
>   at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:395)
>
>   at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
>   at
> org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:248)
>
>   at
> org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355)
>
>   at
> org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
>   at
> org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:167)
>   at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:102)
>   at
> org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
>   at
> org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:338)
>
>   at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:231)
>
>   at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176)
>
>   at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
>   at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil
> .java:209)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>   at java.lang.Thread.run(Thread.java:748)
>
>


Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Erick Erickson
I don’t see the attachments, maybe I deleted old e-mails or some such. The 
Apache server is fairly aggressive about stripping attachments though, so it’s 
also possible they didn’t make it through.

> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri  wrote:
> 
> Thanks Erick.
> 
>> First, your index and analysis chains are considerably different, this can 
>> easily be a source of problems. In particular, using two different 
>> tokenizers is a huge red flag. I _strongly_ recommend against this unless 
>> you’re totally sure you understand the consequences. Additionally, your use 
>> of the length filter is suspicious, especially since your problem statement 
>> is about the addition of a single letter term and the min length allowed on 
>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is 
>> filtered out in both cases, but maybe you’ve found something odd about the 
>> interactions.
> I will investigate the min length and post the results later.
> 
>> Second, I have no idea what this will do. Are the equal signs typos? Used by 
>> custom code?
> This the url in my application, not solr params. That's the query string.
> 
>> What does “species=“ do? That’s not Solr syntax, so it’s likely that all the 
>> params with an equal-sign are totally ignored unless it’s just a typo.
> This is part of the application. Species will be used later on in solr to 
> filter out the result. That's not solr. That my app params.
> 
>> Third, the easiest way to see what’s happening under the covers is to add 
>> “&debug=true” to the query and look at the parsed query. Ignore all the 
>> relevance calculations for the nonce, or specify “&debug=query” to skip that 
>> part. 
> The two json files i've sent, they are debugQuery=on and the explain tag is 
> present.
> I will try the searching the way you mentioned.
> 
> Thank for your inputs
> 
> Guilherme
> 
>> On 6 Nov 2019, at 14:14, Erick Erickson  wrote:
>> 
>> Fwd to another server
>> 
>> First, your index and analysis chains are considerably different, this can 
>> easily be a source of problems. In particular, using two different 
>> tokenizers is a huge red flag. I _strongly_ recommend against this unless 
>> you’re totally sure you understand the consequences. Additionally, your use 
>> of the length filter is suspicious, especially since your problem statement 
>> is about the addition of a single letter term and the min length allowed on 
>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is 
>> filtered out in both cases, but maybe you’ve found something odd about the 
>> interactions.
>> 
>> Second, I have no idea what this will do. Are the equal signs typos? Used by 
>> custom code?
>> 
 https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> 
>> What does “species=“ do? That’s not Solr syntax, so it’s likely that all the 
>> params with an equal-sign are totally ignored unless it’s just a typo.
>> 
>> Third, the easiest way to see what’s happening under the covers is to add 
>> “&debug=true” to the query and look at the parsed query. Ignore all the 
>> relevance calculations for the nonce, or specify “&debug=query” to skip that 
>> part. 
>> 
>> 90% + of the time, the question “why didn’t this query do what I expect” is 
>> answered by looking at the “&debug=query” output and the analysis page in 
>> the admin UI. NOTE: for the analysis page be sure to look at _both_ the 
>> query and index output. Also, and very important about the analysis page 
>> (and this is confusing) is that this _assumes_ that what you put in the text 
>> boxes have made it through the query parser intact and is analyzed by the 
>> field selected. Consider the search "q=field:word1 word2". Now you type 
>> “word1 word2” into the analysis text box and it looks like what you expect. 
>> That’s misleading because the query is _parsed_ as "field:word1 
>> default_search_field:word2”. This is where “&debug=query” helps.
>> 
>> Best,
>> Erick
>> 
>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana  wrote:
>>> 
>>> Hi Walter,
>>> 
>>> The solr.StopFilter removes all tokens that are stopwords. Those words will
 not be in the index, so they can never match a query.
>>> 
>>> 
>>> I think the OP's concern is different results when adding a stopword. I
>>> think he's using the filter factory correctly - the query chain includes
>>> the filter as well so it should remove "a" while querying.
>>> 
>>> *@Guilherme*, please post results for both the query, the document in
>>> result you are concerned about and post full result of analysis screen (for
>>> both query and index).
>>> 
>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood  wrote:
>>> 
 No.
 
 The solr.StopFilter removes all tokens that are stopwords. Those words
 will not be in the index, so they can never match a query.
 
 1. Remove the lines with solr.StopFilter from every analysis chain in
 

Re: Leader node on specific host machines?

2019-11-06 Thread Koen De Groote
Hello Erick,

Sorry for the late reply. I worked with this setting a bit and it works as
expected.

Indeed, I was not aware of the leader/follower task distribution and what
you say shines a different light on things.

Regardless, I now know about this property and can use it effectively,
which I could not before.

Thanks!

Best regards,
Koen De Groote


On Mon, Oct 28, 2019 at 12:51 PM Erick Erickson 
wrote:

> There’s the preferredLeader property, see:
> https://lucene.apache.org/solr/guide/6_6/collections-api.html
>
> That said, this was put in for situations where there were 100s of shards
> with replicas from many shards hosted on any given machine, so it was
> possible in that setup to have 100 or more leaders on a single node.
>
> In the usual case, the leader role doesn’t do very much extra work, and
> the extra work is mostly distributing the incoming documents to the
> followers during indexing (mostly I/O). During query time, the leader has
> no extra duties at all. So if “heavy use” means heavy querying, it
> shouldn’t make any appreciable difference.
>
> I would urge you to have evidence that this was worth the effort before
> spending time on it. And, the “preferredLeader” property is just that, a
> preference all things being equal. It’s still possible for a leader to be a
> different replica, otherwise you’d defeat the whole point of trying for HA.
>
> For TLOG and PULL setups, the leader will always be a TLOG replica, so you
> could strategically place them to get what you want. In this case, the
> leader indeed has a lot more work to do than the follower so it makes more
> sense.
>
> Best,
> Erick
>
> > On Oct 28, 2019, at 6:13 AM, Koen De Groote 
> wrote:
> >
> > Hello,
> >
> > I'm looking for a way to configure my collections as such that the leader
> > nodes of specific collections never share the same host.
> >
> > This as a way to prevent several large and/or heavy-usage collections on
> > the same machine.
> >
> > Is this something I can set in solrconfig.xml? Or are there rules for
> this?
> >
> > Kind regards,
> > Koen De Groote
>
>


ConcurrentModificationException in SolrInputDocument writeMap

2019-11-06 Thread Tim Swetland
I'm currently running into a ConcurrentModificationException ingesting data
as we attempt to upgrade from Solr 8.1 to 8.2. It's not every document, but
it definitely appears regularly in our logs. We didn't run into this
problem in 8.1, so I'm not sure what might have changed. I feel like this
is probably a bug, but if there's a workaround or if there's an idea of
something I might be doing wrong, please let me know.

Stack trace:
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
SolrCmdDistributor$Req: cmd=add{_version=,id=}; node=StdNode:
https:///solr/coll_shard1_replica_n2/ to
https:///solr/coll_shard1_replica_n2/
=> java.util.ConcurrentModificationException
at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
java.util.ConcurrentModificationException: null
  at java.util.LinkedHashMap.forEach(LinkedHashMap.java:686)
  at
org.apache.solr.common.SolrInputDocument.writeMap(SolrInputDocument.java:51)
  at
org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:658)
  at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:383)
  at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
  at
org.apache.solr.common.util.JavaBinCodec.writeMapEntry(JavaBinCodec.java:813)

  at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:411)

  at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
  at
org.apache.solr.common.util.JavaBinCodec.writeIterator(JavaBinCodec.java:750)

  at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:395)

  at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
  at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:248)

  at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:355)

  at
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:253)
  at
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:167)
  at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:102)
  at
org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:83)
  at
org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:338)

  at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:231)

  at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176)

  at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
  at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil
.java:209)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

  at java.lang.Thread.run(Thread.java:748)


Re: Ref Guide - Precision & Recall of Analyzers

2019-11-06 Thread Mikhail Khludnev
Hello, Audrey.

Can you create a regexp capturing all-caps for
https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#pattern-replace-filter
 ?

On Wed, Nov 6, 2019 at 6:36 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com
 wrote:

> I would also love to know what filter to use to ignore capitalized
> acronyms... which one can do this OOTB?
>
> --
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>
>
> On 11/6/19, 3:54 AM, "Paras Lehana"  wrote:
>
> Hi Community,
>
> In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters*
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F3_understanding-2Danalyzers-2Dtokenizers-2Dand-2Dfilters.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=yEGsn7-9_UxyVA_itjyjmvW4UAAO1WE_p0rDKTnULaE&s=dmVDu9CjG_4iJDG59qtuPB4vaj8769FPo7NwGyVPc9g&e=
> >
> section, the text talks about precision and recall depending on how
> you use
> analyzers during query and index time:
>
> For indexing, you often want to simplify, or normalize, words. For
> example,
> > setting all letters to lowercase, eliminating punctuation and
> accents,
> > mapping words to their stems, and so on. Doing so can *increase
> recall *because,
> > for example, "ram", "Ram" and "RAM" would all match a query for
> "ram". To *increase
> > query-time precision*, a filter could be employed to narrow the
> matches
> > by, for example, *ignoring all-cap acronyms* if you’re interested in
> male
> > sheep, but not Random Access Memory.
>
>
> In first case (about Recall), is it assumed that "ram" should match to
> all
> three? *[Q1] *Because, to increase recall, we have to decrease false
> negatives (documents not retrieved but are relevant). In other case
> (if the
> three are not intended to match the query), precision is actually
> decreased
> here (false positives are increased).
>
> This makes sense for the second case, where precision should increase
> as we
> are decreasing false positives (documents marked relevant wrongly).
>
> However, the text talks about the method of "employing a filter that
> ignores all-cap acronyms". How are we supposed to do that on query
> time?
> *[Q2]* Weren't we supposed to remove filter (LCF) during the index
> time?
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Ref Guide - Precision & Recall of Analyzers

2019-11-06 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I would also love to know what filter to use to ignore capitalized acronyms... 
which one can do this OOTB?

-- 
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
 

On 11/6/19, 3:54 AM, "Paras Lehana"  wrote:

Hi Community,

In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters*


section, the text talks about precision and recall depending on how you use
analyzers during query and index time:

For indexing, you often want to simplify, or normalize, words. For example,
> setting all letters to lowercase, eliminating punctuation and accents,
> mapping words to their stems, and so on. Doing so can *increase recall 
*because,
> for example, "ram", "Ram" and "RAM" would all match a query for "ram". To 
*increase
> query-time precision*, a filter could be employed to narrow the matches
> by, for example, *ignoring all-cap acronyms* if you’re interested in male
> sheep, but not Random Access Memory.


In first case (about Recall), is it assumed that "ram" should match to all
three? *[Q1] *Because, to increase recall, we have to decrease false
negatives (documents not retrieved but are relevant). In other case (if the
three are not intended to match the query), precision is actually decreased
here (false positives are increased).

This makes sense for the second case, where precision should increase as we
are decreasing false positives (documents marked relevant wrongly).

However, the text talks about the method of "employing a filter that
ignores all-cap acronyms". How are we supposed to do that on query time?
*[Q2]* Weren't we supposed to remove filter (LCF) during the index time?


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.




Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Guilherme Viteri
Thanks Erick.

> First, your index and analysis chains are considerably different, this can 
> easily be a source of problems. In particular, using two different tokenizers 
> is a huge red flag. I _strongly_ recommend against this unless you’re totally 
> sure you understand the consequences. Additionally, your use of the length 
> filter is suspicious, especially since your problem statement is about the 
> addition of a single letter term and the min length allowed on that filter is 
> 2. That said, it’s reasonable to suppose that the ’a’ is filtered out in both 
> cases, but maybe you’ve found something odd about the interactions.
I will investigate the min length and post the results later.

> Second, I have no idea what this will do. Are the equal signs typos? Used by 
> custom code?
This the url in my application, not solr params. That's the query string.

> What does “species=“ do? That’s not Solr syntax, so it’s likely that all the 
> params with an equal-sign are totally ignored unless it’s just a typo.
This is part of the application. Species will be used later on in solr to 
filter out the result. That's not solr. That my app params.

> Third, the easiest way to see what’s happening under the covers is to add 
> “&debug=true” to the query and look at the parsed query. Ignore all the 
> relevance calculations for the nonce, or specify “&debug=query” to skip that 
> part. 
The two json files i've sent, they are debugQuery=on and the explain tag is 
present.
I will try the searching the way you mentioned.

Thank for your inputs

Guilherme

> On 6 Nov 2019, at 14:14, Erick Erickson  wrote:
> 
> Fwd to another server
> 
> First, your index and analysis chains are considerably different, this can 
> easily be a source of problems. In particular, using two different tokenizers 
> is a huge red flag. I _strongly_ recommend against this unless you’re totally 
> sure you understand the consequences. Additionally, your use of the length 
> filter is suspicious, especially since your problem statement is about the 
> addition of a single letter term and the min length allowed on that filter is 
> 2. That said, it’s reasonable to suppose that the ’a’ is filtered out in both 
> cases, but maybe you’ve found something odd about the interactions.
> 
> Second, I have no idea what this will do. Are the equal signs typos? Used by 
> custom code?
> 
>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> 
> What does “species=“ do? That’s not Solr syntax, so it’s likely that all the 
> params with an equal-sign are totally ignored unless it’s just a typo.
> 
> Third, the easiest way to see what’s happening under the covers is to add 
> “&debug=true” to the query and look at the parsed query. Ignore all the 
> relevance calculations for the nonce, or specify “&debug=query” to skip that 
> part. 
> 
> 90% + of the time, the question “why didn’t this query do what I expect” is 
> answered by looking at the “&debug=query” output and the analysis page in the 
> admin UI. NOTE: for the analysis page be sure to look at _both_ the query and 
> index output. Also, and very important about the analysis page (and this is 
> confusing) is that this _assumes_ that what you put in the text boxes have 
> made it through the query parser intact and is analyzed by the field 
> selected. Consider the search "q=field:word1 word2". Now you type “word1 
> word2” into the analysis text box and it looks like what you expect. That’s 
> misleading because the query is _parsed_ as "field:word1 
> default_search_field:word2”. This is where “&debug=query” helps.
> 
> Best,
> Erick
> 
>> On Nov 6, 2019, at 2:36 AM, Paras Lehana  wrote:
>> 
>> Hi Walter,
>> 
>> The solr.StopFilter removes all tokens that are stopwords. Those words will
>>> not be in the index, so they can never match a query.
>> 
>> 
>> I think the OP's concern is different results when adding a stopword. I
>> think he's using the filter factory correctly - the query chain includes
>> the filter as well so it should remove "a" while querying.
>> 
>> *@Guilherme*, please post results for both the query, the document in
>> result you are concerned about and post full result of analysis screen (for
>> both query and index).
>> 
>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood  wrote:
>> 
>>> No.
>>> 
>>> The solr.StopFilter removes all tokens that are stopwords. Those words
>>> will not be in the index, so they can never match a query.
>>> 
>>> 1. Remove the lines with solr.StopFilter from every analysis chain in
>>> schema.xml.
>>> 2. Reload the collection, restart Solr, or whatever to read the new config.
>>> 3. Reindex all of the documents.
>>> 
>>> When indexed with the new analysis chain, the stopwords will not be
>>> removed and they will be searchable.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Nov 5, 2019

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Erick Erickson
First, your index and analysis chains are considerably different, this can 
easily be a source of problems. In particular, using two different tokenizers 
is a huge red flag. I _strongly_ recommend against this unless you’re totally 
sure you understand the consequences. Additionally, your use of the length 
filter is suspicious, especially since your problem statement is about the 
addition of a single letter term and the min length allowed on that filter is 
2. That said, it’s reasonable to suppose that the ’a’ is filtered out in both 
cases, but maybe you’ve found something odd about the interactions.

Second, I have no idea what this will do. Are the equal signs typos? Used by 
custom code?

>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true

What does “species=“ do? That’s not Solr syntax, so it’s likely that all the 
params with an equal-sign are totally ignored unless it’s just a typo.

Third, the easiest way to see what’s happening under the covers is to add 
“&debug=true” to the query and look at the parsed query. Ignore all the 
relevance calculations for the nonce, or specify “&debug=query” to skip that 
part. 

90% + of the time, the question “why didn’t this query do what I expect” is 
answered by looking at the “&debug=query” output and the analysis page in the 
admin UI. NOTE: for the analysis page be sure to look at _both_ the query and 
index output. Also, and very important about the analysis page (and this is 
confusing) is that this _assumes_ that what you put in the text boxes have made 
it through the query parser intact and is analyzed by the field selected. 
Consider the search "q=field:word1 word2". Now you type “word1 word2” into the 
analysis text box and it looks like what you expect. That’s misleading because 
the query is _parsed_ as "field:word1 default_search_field:word2”. This is 
where “&debug=query” helps.

Best,
Erick

> On Nov 6, 2019, at 2:36 AM, Paras Lehana  wrote:
> 
> Hi Walter,
> 
> The solr.StopFilter removes all tokens that are stopwords. Those words will
>> not be in the index, so they can never match a query.
> 
> 
> I think the OP's concern is different results when adding a stopword. I
> think he's using the filter factory correctly - the query chain includes
> the filter as well so it should remove "a" while querying.
> 
> *@Guilherme*, please post results for both the query, the document in
> result you are concerned about and post full result of analysis screen (for
> both query and index).
> 
> On Tue, 5 Nov 2019 at 21:38, Walter Underwood  wrote:
> 
>> No.
>> 
>> The solr.StopFilter removes all tokens that are stopwords. Those words
>> will not be in the index, so they can never match a query.
>> 
>> 1. Remove the lines with solr.StopFilter from every analysis chain in
>> schema.xml.
>> 2. Reload the collection, restart Solr, or whatever to read the new config.
>> 3. Reindex all of the documents.
>> 
>> When indexed with the new analysis chain, the stopwords will not be
>> removed and they will be searchable.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri  wrote:
>>> 
>>> Ok. I am kind a lost now.
>>> If I open up the console > analysis and perform it, that's the final
>> result.
>>> 
>>> 
>>> Your suggestion is: get rid of the  in the
>> schema.xml and during index phase replaceAll("in stopwords.txt"," ") then
>> add to solr. Is that correct ?
>>> 
>>> Thanks David
>>> 
 On 5 Nov 2019, at 14:48, David Hastings > > wrote:
 
 Fwd to another server
 
 no,
  >>> words="stopwords.txt"/>
 
 is still using stopwords and should be removed, in my opinion of course,
 based on your use case may be different, but i generally axe any
>> reference
 to them at all
 
 On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri > > wrote:
 
> Thanks.
> Haven't I done this here ?
>  positionIncrementGap="100" omitNorms="false" >
>  
>  
>  
>  > max="20"/>
>  
>   words="stopwords.txt"/>
>  
> 
> 
>> On 5 Nov 2019, at 14:15, David Hastings > >
> wrote:
>> 
>> Fwd to another server
>> 
>> The first thing you should do is remove any reference to stop words
>> and
>> never use them, then re-index your data and try it again.
>> 
>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri > >
> wrote:
>> 
>>> Hi,
>>> 
>>> I am performing a search to match a name (text_field), however this
>> term
>>> contains 'and' and 'a' and it doesn't return any records. If i remove
> 'a'
>>> then it works.
>>> e.g
>>> 

Re: Need some help on solr versions (LTS vs stable)

2019-11-06 Thread Erick Erickson
It’s variable. The policy is that we try very hard to maintain one major 
version back-compat. So generally, if you start with, say, 7x upgrading to 8x 
should be relatively straightforward. However, you will _not_ be able to 
upgrade from 7x to 9x, you must re-index everything from scratch.

The development process is this:

- People work on “master”, the future 9.0

- most changes are back-ported to the current one-less-major-version, in this 
case 8x. Periodically (on no fixed schedule, but usually 3-4 times a year) a 
new 8x version is released. Some changes to master are not backported as they 
are major changes that would be difficult/impossible to backport.

- at some point, especially when enough non-backported changes have 
accumulated, we decide to release 9.0 and everything bumps up one, i.e. master 
is the future 10.0, work is done there and backported to the stable 9x 

- In the current situation, where work is done on the future 9.0 and 8.x is the 
stable branch, there will be _no_ work done on 7x excepting egregious problems 
which at this point are pretty much exclusively security vulnerabilities. 

- As I said, it’s variable. I expect 9.0 to happen sometime in the first half 
of next year, but there are no solid plans for that, it’s just how I personally 
think things are shaping up.

- Finally, the transition from the last release of a major version to the first 
release of a new major version is _usually_ not a huge deal. New major releases 
are free to remove deprecated methods and processes though, so that’s one thing 
to watch for.

So in a nutshell, if you are starting a new project you have two choices:

- use the latest 8.x. that’ll get you the longest period when fixes will be 
made to that branch, although development will taper off on that branch as 9.0 
gets released. A variant here is to start with 8x, and if 9.0 gets released 
before go-live, try upgrading part way through the project.

- If your time-frame is long enough, start with master (the future 9.0) which 
you’ll have to compile yourself, understanding that
  - it may be unstable
  - the timeframe for an official release is not fixed.



> On Nov 6, 2019, at 1:00 AM, suyog joshi  wrote:
> 
> Hi Team,
> 
> Can you please guide us on below queries for solr versions ?
> 
> 1. Are there any major differences (for security, platform stability etc)
> between  current LTS and Stable Solr version ?
> 2. How long a version remains in LTS before becoming EoL ?
> 3. How frequently LTS version gets changed ?
> 3. What will be the next LTS version for Solr(current is 7.7.x) ?
> 
> 
> Kindly advice, you guidance will be really helpful for us to select correct
> version in our infra.
> 
> Regards,
> Suyog Joshi
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr 8.3 admin ui collection selector disabled

2019-11-06 Thread Erick Erickson
Whew! I often work in a private window to lessen these kinds of “surprises”…..

> On Nov 6, 2019, at 4:35 AM, Jörn Franke  wrote:
> 
> Never mind. Restart of browser worked.
> 
>> Am 06.11.2019 um 10:32 schrieb Jörn Franke :
>> 
>> Hi,
>> 
>> After upgrading to Solr 8.3 I observe that in the Admin UI the collection 
>> selector is greyed out. I am using Chrome. The core selector works as 
>> expected.
>> 
>> Any idea why this is happening?
>> 
>> Thank you.
>> 
>> Best regards



Re: Questions about corrupted Segments files.

2019-11-06 Thread Dmitry Kan
Hi Kaya,

Try luke:
http://dmitrykan.blogspot.com/2018/01/new-luke-on-javafx.html

Best,

Dmitry

On Wed 6. Nov 2019 at 3.24, Kayak28  wrote:

> Hello, Community members:
>
> I am using Solr 7.7.2.
> On the other day, while indexing to the Solr, my computer powered off.
> As a result, there are corrupted segment files.
>
> Is there any way to fix the corrupted segment files without re-indexing?
>
> I have read a blog post (in Japanese) writing about checkIndex method
> which can be used to determine/fix corrupted segment files, but when I
> tried to run the following command, I got the error message.
> So, I am not sure if checkIndex can actually fix the index files.
>
>
> java -cp lucene-core-7.7.2.jar -ea:org.apache.lucene...
> org.apache.lucene.index.CheckIndex solr/server/solr/basic_copy/data/index
> -fix
>
>
> ERROR: unexpected extra argument '-fix'
>
>
>
> If anybody knows about either a way to fix corrupted segment files or a
> way to use checkIndex '-fix' option correctly, could you please let me
> know?
>
> Any clue will be very appreciated.
>
> Sincerely,
> Kaya Ota
>
>
>
-- 
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: [Q] Ref Guide - What is Multi-Term Expansion?

2019-11-06 Thread Erick Erickson
Say you want to search for “run*”. That should match “run”, “runner”, 
“running”, “runs” etc. one term->many == multiterm expansion. Conceptually, the 
search becomes (run OR runner OR running OR runs), all terms actually found in 
the index that have the prefix “run”.

My advice would be to ignore it completely, that’s an expert level option that 
came about because we got really tired of explaining that wildcards didn’t used 
to have _any_ analysis done, so searching for “Run*" would not match “run” due 
to the case difference. 

Best,
Erick
> On Nov 6, 2019, at 7:21 AM, Alexandre Rafalovitch  wrote:
> 
> It mentions it in the start  paragraph "Prefix, Wildcard, Regex, etc."
> 
> So, if you search for "abc*" it expands to all terms that start from
> "abc", but then not everything can handle this situation as it is a
> lot of terms in the same position. So, not all analyzers can handle
> that and normally it is just an automatically built subset of safe
> ones.
> 
> I mark them with "(multi)" in my - very out of date, but still useful
> - resource: http://www.solr-start.com/info/analyzers/
> 
> Regards,
>   Alex.
> 
> On Wed, 6 Nov 2019 at 21:19, Paras Lehana  wrote:
>> 
>> Hi Community,
>> 
>> In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
>> Multi-Term Expansion*
>> ,
>> the text talks about multi-term expansion and explicit use of *analyzer
>> type="multiterm"*.
>> 
>> I could not understand what exactly is multi-term expansion and what are
>> the use cases for using "multiterm". *[Q1]*
>> 
>> --
>> --
>> Regards,
>> 
>> *Paras Lehana* [65871]
>> Development Engineer, Auto-Suggest,
>> IndiaMART Intermesh Ltd.
>> 
>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>> Noida, UP, IN - 201303
>> 
>> Mob.: +91-9560911996
>> Work: 01203916600 | Extn:  *8173*
>> 
>> --
>> IMPORTANT:
>> NEVER share your IndiaMART OTP/ Password with anyone.



Re: [Q] Ref Guide - What is Multi-Term Expansion?

2019-11-06 Thread Alexandre Rafalovitch
It mentions it in the start  paragraph "Prefix, Wildcard, Regex, etc."

So, if you search for "abc*" it expands to all terms that start from
"abc", but then not everything can handle this situation as it is a
lot of terms in the same position. So, not all analyzers can handle
that and normally it is just an automatically built subset of safe
ones.

I mark them with "(multi)" in my - very out of date, but still useful
- resource: http://www.solr-start.com/info/analyzers/

Regards,
   Alex.

On Wed, 6 Nov 2019 at 21:19, Paras Lehana  wrote:
>
> Hi Community,
>
> In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
> Multi-Term Expansion*
> ,
> the text talks about multi-term expansion and explicit use of *analyzer
> type="multiterm"*.
>
> I could not understand what exactly is multi-term expansion and what are
> the use cases for using "multiterm". *[Q1]*
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.


[Contribute] Merge Tokenizers/Filters About and Description Sections

2019-11-06 Thread Paras Lehana
Hi Community,

In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and
Filters Section*
,
I see that after giving general information about Analyzers, there are
subsections in order "*About Tokenizers", "About Filters", "Tokenizers" *and*
"Filter Descriptions"*. Note how Description for Tokenizer is given after
briefing about Filters. The About
 and
Description 
sections overlaps duplicate information. Also, as the text stresses,
Tokenization and Filters are actually part of Analysis.

What I want to suggest is to have one (About) Analysis subsection with the
text of About Filters and About Tokenizers merged. The division will look
nicer with having more descriptive texts in following subsections. The
order could be:


*Analyzers  > Tokenizers (Description) > Filters (Description) > ...*
If we are willing to keep tokenizer and filter definitions separately, we
can merge the About and Description sections of Filters/Description
together.

I'm reading the reference guide for the third time and everytime, this flow
seemed odd to me and thus, shared with you all.

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


[Q] Ref Guide - What is Multi-Term Expansion?

2019-11-06 Thread Paras Lehana
Hi Community,

In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
Multi-Term Expansion*
,
the text talks about multi-term expansion and explicit use of *analyzer
type="multiterm"*.

I could not understand what exactly is multi-term expansion and what are
the use cases for using "multiterm". *[Q1]*

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Solr 8.3 Solrj streaming expressions do not return all field values

2019-11-06 Thread Jörn Franke
I have checked now Solr 8.3 server in admin UI. Same issue.

Reproduction:
select(search(testcollection,q=“test”,df=“Default”,defType=“edismax”,fl=“id”, 
qt=“/export”, sort=“id asc”),id,if(eq(1,1),Y,N) as found)

In 8.3 it returns only the id field.
In 8.2 it returns id,found field.

Since found is generated by select (and not coming from the collection) there 
must be an issue with select. 

Any idea why this is happening.

Debug logs do not show any error and the expression is correctly received by 
Solr.

Thank you.

Best regards

> Am 05.11.2019 um 14:59 schrieb Jörn Franke :
> 
> Thanks I will check and come back to you. As far as I remember (but have to 
> check) the queries generated by Solr were correct
> 
> Just to be clear the same thing works with Solr 8.2 server and Solr 8.2 
> client.
> 
> It show the odd behaviour with Solr 8.2 server and Solr 8.3 client.
> 
>> Am 05.11.2019 um 14:49 schrieb Joel Bernstein :
>> 
>> I'll probably need some more details. One thing that's useful is to look at
>> the logs and see the underlying Solr queries that are generated. Then try
>> those underlying queries against the Solr index and see what comes back. If
>> you're not seeing the fields with the plain Solr queries then we know it's
>> something going on below streaming expressions. If you are seeing the
>> fields then it's the expressions themselves that are not handling the data
>> as expected.
>> 
>> 
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>> 
>> 
 On Mon, Nov 4, 2019 at 9:09 AM Jörn Franke  wrote:
>>> 
>>> Most likely this issue can bei also reproduced in the admin UI for the
>>> streaming handler of a collection.
>>> 
> Am 04.11.2019 um 13:32 schrieb Jörn Franke :
 
 Hi,
 
 I use streaming expressions, e.g.
 Sort(Select(search(...),id,if(eq(1,1),Y,N) as found), by=“field A asc”)
 (Using export handler, sort is not really mandatory , I will remove it
>>> later anyway)
 
 This works perfectly fine if I use Solr 8.2.0 (server + client). It
>>> returns Tuples in the form { “id”,”12345”, “found”:”Y”}
 
 However, if I use Solr 8.2.0 as server and Solr 8.3.0 as client then the
>>> above statement only returns the id field, but not the found field.
 
 Questions:
 1) is this expected behavior, ie Solr client 8.3.0 is in this case not
>>> compatible with Solr 8.2.0 and server upgrade to Solr 8.3.0 will fix this?
 2) has the syntax for the above expression changed? If so how?
 3) is this not expected behavior and I should create a Jira for it?
 
 Thank you.
 Best regards
>>> 


Re: Solr 8.3 admin ui collection selector disabled

2019-11-06 Thread Jörn Franke
Never mind. Restart of browser worked.

> Am 06.11.2019 um 10:32 schrieb Jörn Franke :
> 
> Hi,
> 
> After upgrading to Solr 8.3 I observe that in the Admin UI the collection 
> selector is greyed out. I am using Chrome. The core selector works as 
> expected.
> 
> Any idea why this is happening?
> 
> Thank you.
> 
> Best regards


Solr 8.3 admin ui collection selector disabled

2019-11-06 Thread Jörn Franke
Hi,

After upgrading to Solr 8.3 I observe that in the Admin UI the collection 
selector is greyed out. I am using Chrome. The core selector works as expected.

Any idea why this is happening?

Thank you.

Best regards

Ref Guide - Precision & Recall of Analyzers

2019-11-06 Thread Paras Lehana
Hi Community,

In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters*

section, the text talks about precision and recall depending on how you use
analyzers during query and index time:

For indexing, you often want to simplify, or normalize, words. For example,
> setting all letters to lowercase, eliminating punctuation and accents,
> mapping words to their stems, and so on. Doing so can *increase recall 
> *because,
> for example, "ram", "Ram" and "RAM" would all match a query for "ram". To 
> *increase
> query-time precision*, a filter could be employed to narrow the matches
> by, for example, *ignoring all-cap acronyms* if you’re interested in male
> sheep, but not Random Access Memory.


In first case (about Recall), is it assumed that "ram" should match to all
three? *[Q1] *Because, to increase recall, we have to decrease false
negatives (documents not retrieved but are relevant). In other case (if the
three are not intended to match the query), precision is actually decreased
here (false positives are increased).

This makes sense for the second case, where precision should increase as we
are decreasing false positives (documents marked relevant wrongly).

However, the text talks about the method of "employing a filter that
ignores all-cap acronyms". How are we supposed to do that on query time?
*[Q2]* Weren't we supposed to remove filter (LCF) during the index time?


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.