Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Damien Kamerman
A suggester rebuild will mmap the entire index. So'll you need free memory
for depending on your index size.

On 19 September 2017 at 13:47, shamik  wrote:

> I agree, should have made it clear in my initial post. The reason I thought
> it's little trivial since the newly introduced collection has only few
> hundred documents and is not being used in search yet. Neither it's being
> indexed at a regular interval. The cache parameters are kept to a minimum
> as
> well. But there might be overheads of a simply creating a collection which
> I'm not aware of.
>
> I did bring down the heap size to 8gb, changed to G1 and reduced the cache
> params. The memory so far has been holding up but will wait for a while
> before passing on a judgment.
>
>  autowarmCount="0"/>
>  autowarmCount="0"/>
>  autowarmCount="0"/>
>  initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" />
>  showItems="0" />
>
> The change seemed to have increased the number of slow queries (1000 ms),
> but I'm willing to address the OOM over performance at this point. One
> thing
> I realized is that I provided the wrong index size here. It's 49gb instead
> of 25, which I mistakenly picked from one shard. I hope the heap size will
> continue to sustain for the index size.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Using SOLR J 5.5.4 with SOLR 6.5

2017-09-18 Thread Felix Stanley
Hi there,

 

We are planning to use SOLR J 5.5.4 to query from SOLR 6.5.

The reason was that we have to rely on JDK 1.7 at the client and as far as I
know SOLR J 6.x.x only support JDK 1.8.

I understood that SOLR J generally maintains backwards/forward compatibility
from this article:

 

https://wiki.apache.org/solr/Solrj

 

Would there though be any exception that we need to take caution of for this
specific version?

 

Thanks a lot.

 

 

Best Regards,

 

Felix Stanley

 


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread shamik
I agree, should have made it clear in my initial post. The reason I thought
it's little trivial since the newly introduced collection has only few
hundred documents and is not being used in search yet. Neither it's being
indexed at a regular interval. The cache parameters are kept to a minimum as
well. But there might be overheads of a simply creating a collection which
I'm not aware of.

I did bring down the heap size to 8gb, changed to G1 and reduced the cache
params. The memory so far has been holding up but will wait for a while
before passing on a judgment. 







The change seemed to have increased the number of slow queries (1000 ms),
but I'm willing to address the OOM over performance at this point. One thing
I realized is that I provided the wrong index size here. It's 49gb instead
of 25, which I mistakenly picked from one shard. I hope the heap size will
continue to sustain for the index size. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Dates and DataImportHandler

2017-09-18 Thread Jamie Jackson
Hi folks,

My DB server is on America/Chicago time. Solr (on Docker) is running on
UTC. Dates coming from my (MariaDB) data source seem to get translated
properly into the Solr index without me doing anything special.

However when doing delta imports using last_index_time (
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ), I
can't seem to get the date, which Solr provides, to be understood by the DB
as being UTC (and translated back, accordingly). In other words, the DB
thinks the Solr UTC date is local, so it thinks the date is ahead by six
hours.

'${dataimporter.request.clean}' != 'false'

or dt > '${dataimporter.last_index_time}'

I came up with this workaround, which seems to work:

'${dataimporter.request.clean}' != 'false'

/* ${user.timezone} is UTC, and the ${custom.dataimporter.datasource.tz}
property is set to America/Chicago */

or dt > CONVERT_TZ('${dataimporter.last_index_time}','${user.timezone}','${
custom.dataimporter.datasource.tz}')

However, isn't there a way for this translation to happen more naturally?

I thought maybe I could do something like this:



The above did set the property as expected (with a trailiing `+`), but
that didn't seem to help the DB understand/translate the date.

Thanks,
Jamie


Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Erick Erickson
Shamik:

bq: The part I'm trying to understand is whether the memory footprint
is higher for 6.6...

bq:  it has two collections, one being introduced with 6.6 upgrade

If I'm reading this right, you added another collection to the system
as part of the upgrade. Of course it will take more memory. Especially
if your new collection is configured to, say, inefficiently use
caches, or you group or sort or facet on fields that are not
docValues. Or.

That information would have saved people quite a bit of time if you'd
posted it first.

Best,
Erick

On Mon, Sep 18, 2017 at 9:03 AM, shamik  wrote:
> Walter, thanks again. Here's some information on the index and search
> feature.
>
> The index size is close to 25gb, with 20 million documents. it has two
> collections, one being introduced with 6.6 upgrade. The primary collection
> carries the bulk of the index, newly formed one being aimed at getting
> populated going forward. Besides keyword search, the search has a bunch of
> facets, which are configured to use docvalues. The notable search features
> being used are highlighter, query elevation, mlt and suggester. The other
> change from 5.5 was to replace Porter Stemmer with Lemmatizer in the
> analysis channel.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SOLR and string comparison functions

2017-09-18 Thread Shawn Heisey
On 9/18/2017 4:01 PM, Dariusz Wojtas wrote:
> There is one very important requirement.
> No marther how many parameters are out there, the total result score cannot
> exceed 1 (100%).

Right there, you've got an unrealistic requirement.

Scores are not absolute, they only have meaning relative to each other,
and only within a single query.  A really good match for one query might
have a really low score that normalizes to 30 percent, while a mediocre
match for another query might have a very high score that normalizes to
98 percent, but there are also a dozen other results that normalize to
higher than 98 percent.

Information from experts on why you should never do what you're trying
to do:

https://wiki.apache.org/lucene-java/ScoresAsPercentages

Your clarification email after Emir's reply makes it clear that this is
exactly what you want.

Thanks,
Shawn



Re: SOLR and string comparison functions

2017-09-18 Thread Dariusz Wojtas
Hi Emir,

I am calculating a "normalizzed" score, as it will be later used by
automatic decisioning processes to find if the result found "matches
enough". For example I might create rule to decide if found result score is
higher that 97% (matches), otherwise it is just a noise.
I've been thinking about the reranking query parser, but was not able to
create a real life working example, even something that would show the
concept on just 2 fields, then rerant the result.
I'd be happy to see such example.

I have found the answer for my original question, seems to work:
   {!func v=$global_search_function}
   sum(
  product($firstName.weight, strdist(literal($firstName), firstName,
edit)),
  map($id.weight, 0.0001, 1000, product($id.weight,
strdist(literal($id), id, edit)), 0),
  map($fullName.weight, 0.0001, 1000, product($fullName.weight,
query($fullName_filter)), 0),
 )
   {!edismax qf=fullName pf=fullName ps=10
v=$fullName}

Please see the fullName_filter definition and it's usage in the query()
above.

But now I am really worried about the performance, as there may be several
more filter fields that may affect the score.

Best regards,
Dariusz



On Tue, Sep 19, 2017 at 12:33 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Darius,
> This seems to me like misuse/misunderstanding of Solr. As you probably
> noticed, Solr score is not normalised - you cannot compare scores of two
> queries and tell if one result match better query than the other. There are
> some techniques to achieve something close, but that is not that straight
> forward and might depend on your case.
> In your case, you are trying to use function to query score, and depending
> on your index size, it might not perform well. You should probably be
> better with custom scorer.
> Back to your question: What do you try to achieve? When do you consider
> two names to match? Or you expect to calculate score for each document in
> the index and return top scored ones? Such solution will not scale.
> IMO, it would be the best if you rethink your requirement about score (or
> use reranking query parser https://cwiki.apache.org/
> confluence/display/solr/Query+Re-Ranking  confluence/display/solr/Query+Re-Ranking>) and set proper field analysis
> and edismax query parser.
> Otherwise good luck if you have a large index.
>
> Regards,
> Emir
>
> > On 19 Sep 2017, at 00:01, Dariusz Wojtas  wrote:
> >
> > Hi,
> > I am working on an application that searches for entries that may be
> > queried by multiple parameters.
> > These parameters may be sent to SOLR in different sets, each parameter
> with
> > it's own weight.
> >
> > Values for the example below might be as follows:
> > firstName=John&
> > firstName.weight=0.2&
> > id=Aw34563456WWA&
> > id.weight=0.5&
> > fullName=John Adreew Jr. Doe and Partners&
> > fullName.weight=0.3
> >
> >
> > There is one very important requirement.
> > No marther how many parameters are out there, the total result score
> cannot
> > exceed 1 (100%).
> > In every case I multiply param weight and result of string comparison.
> > A field may be used in comparison if it's weight is greater than 0 (in
> fact
> > greater than 0.0001).
> >
> >  {!func v=$global_search_function}
> >  sum(
> >product($firstName.weight, strdist(literal($firstName),
> > firstName, edit)),
> >map($id.weight, 0.0001, 1000, product($id.weight,
> > strdist(literal($id), id, edit)), 0),
> >map($fullName.weight, 0.0001, 1000,
> > product($fullName.weight, strdist(literal($fullName), fullName,
> ngram,10)),
> > 0),
> >)
> >
> > The question is about comparing fullName above.
> > What function should I use for comparison working on the fullName field
> the
> > same way as:
> >   "John Adreew Jr. Doe and Partners"~10^0.3
> > ?
> >
> > What are the functions that compare strings, other than strdist?
> > How do I create function similar to the "John Andrew ..." example above?
> >
> >
> > Best regards,
> > Dariusz Wojtas
>
>


Re: SOLR and string comparison functions

2017-09-18 Thread Emir Arnautović
Hi Darius,
This seems to me like misuse/misunderstanding of Solr. As you probably noticed, 
Solr score is not normalised - you cannot compare scores of two queries and 
tell if one result match better query than the other. There are some techniques 
to achieve something close, but that is not that straight forward and might 
depend on your case.
In your case, you are trying to use function to query score, and depending on 
your index size, it might not perform well. You should probably be better with 
custom scorer.
Back to your question: What do you try to achieve? When do you consider two 
names to match? Or you expect to calculate score for each document in the index 
and return top scored ones? Such solution will not scale.
IMO, it would be the best if you rethink your requirement about score (or use 
reranking query parser 
https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking 
) and set 
proper field analysis and edismax query parser.
Otherwise good luck if you have a large index.

Regards,
Emir

> On 19 Sep 2017, at 00:01, Dariusz Wojtas  wrote:
> 
> Hi,
> I am working on an application that searches for entries that may be
> queried by multiple parameters.
> These parameters may be sent to SOLR in different sets, each parameter with
> it's own weight.
> 
> Values for the example below might be as follows:
> firstName=John&
> firstName.weight=0.2&
> id=Aw34563456WWA&
> id.weight=0.5&
> fullName=John Adreew Jr. Doe and Partners&
> fullName.weight=0.3
> 
> 
> There is one very important requirement.
> No marther how many parameters are out there, the total result score cannot
> exceed 1 (100%).
> In every case I multiply param weight and result of string comparison.
> A field may be used in comparison if it's weight is greater than 0 (in fact
> greater than 0.0001).
> 
>  {!func v=$global_search_function}
>  sum(
>product($firstName.weight, strdist(literal($firstName),
> firstName, edit)),
>map($id.weight, 0.0001, 1000, product($id.weight,
> strdist(literal($id), id, edit)), 0),
>map($fullName.weight, 0.0001, 1000,
> product($fullName.weight, strdist(literal($fullName), fullName, ngram,10)),
> 0),
>)
> 
> The question is about comparing fullName above.
> What function should I use for comparison working on the fullName field the
> same way as:
>   "John Adreew Jr. Doe and Partners"~10^0.3
> ?
> 
> What are the functions that compare strings, other than strdist?
> How do I create function similar to the "John Andrew ..." example above?
> 
> 
> Best regards,
> Dariusz Wojtas



SOLR and string comparison functions

2017-09-18 Thread Dariusz Wojtas
Hi,
I am working on an application that searches for entries that may be
queried by multiple parameters.
These parameters may be sent to SOLR in different sets, each parameter with
it's own weight.

Values for the example below might be as follows:
firstName=John&
firstName.weight=0.2&
id=Aw34563456WWA&
id.weight=0.5&
fullName=John Adreew Jr. Doe and Partners&
fullName.weight=0.3


There is one very important requirement.
No marther how many parameters are out there, the total result score cannot
exceed 1 (100%).
In every case I multiply param weight and result of string comparison.
A field may be used in comparison if it's weight is greater than 0 (in fact
greater than 0.0001).

  {!func v=$global_search_function}
  sum(
product($firstName.weight, strdist(literal($firstName),
firstName, edit)),
map($id.weight, 0.0001, 1000, product($id.weight,
strdist(literal($id), id, edit)), 0),
map($fullName.weight, 0.0001, 1000,
product($fullName.weight, strdist(literal($fullName), fullName, ngram,10)),
0),
)

The question is about comparing fullName above.
What function should I use for comparison working on the fullName field the
same way as:
   "John Adreew Jr. Doe and Partners"~10^0.3
?

What are the functions that compare strings, other than strdist?
How do I create function similar to the "John Andrew ..." example above?


Best regards,
Dariusz Wojtas


RE: How to remove control characters in stored value at Solr side

2017-09-18 Thread Chris Hostetter

: But, can you then explain why Apache Nutch with SolrJ had this problem? 
: It seems that by default SolrJ does use XML as transport format. We have 
: always used SolrJ which i assumed would default to javabin, but we had 
: this exact problem anyway, and solved it by stripping non-character code 
: points.
: 
: When we use SolrJ for querying we clearly see wt=javabin in the logs, 
: but updates showed the problem. Can we fix it anywhere?

wt=javabin indicates what *response* format the client (ie: solrj) is 
requesting from the server ... the format used for the *request* body is 
determined by the client based on the Content-Type of the ContentStream 
it sends to Solr.

When using SolrJ, and sending an arbitrary/abstract SolrRequest objects, 
the "RequestWriter" configured on the SolrClient is what specifies the 
Content-Type to use (and is in charge of serializing the java objects 
appropriately)

BinaryRequestWriter (which uses javabin format to serialize SolrRequest 
objects when building ContentStreams) has been the default since Solr 
5.5/6.0 (see SOLR-8595)


-Hoss
http://www.lucidworks.com/


RE: How to remove control characters in stored value at Solr side

2017-09-18 Thread Markus Jelsma
I agree.

But, can you then explain why Apache Nutch with SolrJ had this problem? It 
seems that by default SolrJ does use XML as transport format. We have always 
used SolrJ which i assumed would default to javabin, but we had this exact 
problem anyway, and solved it by stripping non-character code points.

When we use SolrJ for querying we clearly see wt=javabin in the logs, but 
updates showed the problem. Can we fix it anywhere?

Thanks,
Markus
 
-Original message-
> From:Chris Hostetter 
> Sent: Monday 18th September 2017 20:29
> To: solr-user@lucene.apache.org
> Subject: RE: How to remove control characters in stored value at Solr side
> 
> 
> : You can not do this in Solr, you cannot even send non-character code 
> : points in the first place. For Apache Nutch we solved the problem by 
> 
> Strictly speak: this is false.  You *can* send control characters to solr 
> as field values -- assuming your transport format allows it.
> 
> Example: using javabin to send SolrInputDocuments from a SolrJ client 
> doesn't care if the field value Strings have control characters in them.  
> Likewise it should be possible to send many control characters when using 
> JSON formatted updates -- let alone using something like DIH to pull blog 
> data from a DB, or the Extracting Request handler which might find
> control-characters in MS-Word of PDF docs.
> 
> In all of those cases, an UpdateProcessor to strip out hte unwanted 
> characters can/will work well.
> 
> In the specific case discussed in this thread (based on the eventual stack 
> trace posted) and UpdateProcessor witll *not* work because the fundemental 
> problem is that the control characters in question mean that the "XML-ish" 
> lookin bytes being sent to Solr by the client are not actually valid XML 
> -- because by definition XML can not contain those invalid 
> control-characters.
> 
> 
> -Hoss
> http://www.lucidworks.com/
> 


RE: How to remove control characters in stored value at Solr side

2017-09-18 Thread Chris Hostetter

: You can not do this in Solr, you cannot even send non-character code 
: points in the first place. For Apache Nutch we solved the problem by 

Strictly speak: this is false.  You *can* send control characters to solr 
as field values -- assuming your transport format allows it.

Example: using javabin to send SolrInputDocuments from a SolrJ client 
doesn't care if the field value Strings have control characters in them.  
Likewise it should be possible to send many control characters when using 
JSON formatted updates -- let alone using something like DIH to pull blog 
data from a DB, or the Extracting Request handler which might find
control-characters in MS-Word of PDF docs.

In all of those cases, an UpdateProcessor to strip out hte unwanted 
characters can/will work well.

In the specific case discussed in this thread (based on the eventual stack 
trace posted) and UpdateProcessor witll *not* work because the fundemental 
problem is that the control characters in question mean that the "XML-ish" 
lookin bytes being sent to Solr by the client are not actually valid XML 
-- because by definition XML can not contain those invalid 
control-characters.


-Hoss
http://www.lucidworks.com/


CVE-2017-9803: Security vulnerability in kerberos delegation token functionality

2017-09-18 Thread Shalin Shekhar Mangar
CVE-2017-9803: Security vulnerability in kerberos delegation token functionality

Severity: Important

Vendor:
The Apache Software Foundation

Versions Affected:
Apache Solr 6.2.0 to 6.6.0

Description:

Solr's Kerberos plugin can be configured to use delegation tokens,
which allows an application to reuse the authentication of an end-user
or another application.
There are two issues with this functionality (when using
SecurityAwareZkACLProvider type of ACL provider e.g.
SaslZkACLProvider),

Firstly, access to the security configuration can be leaked to users
other than the solr super user. Secondly, malicious users can exploit
this leaked configuration for privilege escalation to further
expose/modify private data and/or disrupt operations in the Solr
cluster.

The vulnerability is fixed from Solr 6.6.1 onwards.

Mitigation:
6.x users should upgrade to 6.6.1

Credit:
This issue was discovered by Hrishikesh Gadre of Cloudera Inc.

References:
https://issues.apache.org/jira/browse/SOLR-11184
https://wiki.apache.org/solr/SolrSecurity


-- 
The Lucene PMC


Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Joe Obernberger
Very nice article - thank you!  Is there a similar article available 
when the index is on HDFS?  Sorry to hijack!  I'm very interested in how 
we can improve cache/general performance when running with HDFS.


-Joe


On 9/18/2017 11:35 AM, Erick Erickson wrote:



This is suspicious too. Each entry is up to about
maxDoc/8 bytes + (string size of fq clause) long
and you can have up to 20,000 of them. An autowarm count of 512 is
almost never  a good thing.

Walter's comments about your memory are spot on of course, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Mon, Sep 18, 2017 at 7:59 AM, Walter Underwood  wrote:

29G on a 30G machine is still a bad config. That leaves no space for the OS, 
file buffers, or any other processes.

Try with 8G.

Also, give us some information about the number of docs, size of the indexes, 
and the kinds of search features you are using.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Sep 18, 2017, at 7:55 AM, shamik  wrote:

Apologies, 290gb was a typo on my end, it should read 29gb instead. I started
with my 5.5 configurations of limiting the RAM to 15gb. But it started going
down once it reached the 15gb ceiling. I tried bumping it up to 29gb since
memory seemed to stabilize at 22gb after running for few hours, of course,
it didn't help eventually. I did try the G1 collector. Though garbage
collection was happening more efficiently compared to CMS, it brought the
nodes down after a while.

The part I'm trying to understand is whether the memory footprint is higher
for 6.6 and whether I need an instance with higher ram (>30gb in my case). I
haven't added any post 5.5 feature to rule out the possibility of a memory
leak.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

---
This email has been checked for viruses by AVG.
http://www.avg.com





Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread shamik
Walter, thanks again. Here's some information on the index and search
feature.

The index size is close to 25gb, with 20 million documents. it has two
collections, one being introduced with 6.6 upgrade. The primary collection
carries the bulk of the index, newly formed one being aimed at getting
populated going forward. Besides keyword search, the search has a bunch of
facets, which are configured to use docvalues. The notable search features
being used are highlighter, query elevation, mlt and suggester. The other
change from 5.5 was to replace Porter Stemmer with Lemmatizer in the
analysis channel.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread shamik
Thanks for your suggesting, I'm going to tune it and bring it down. It just
happened to carry over from 5.5 settings. Based on Walter's suggestion, I'm
going to reduce the heap size and see if it addresses the problem.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Erick Erickson


This is suspicious too. Each entry is up to about
maxDoc/8 bytes + (string size of fq clause) long
and you can have up to 20,000 of them. An autowarm count of 512 is
almost never  a good thing.

Walter's comments about your memory are spot on of course, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Mon, Sep 18, 2017 at 7:59 AM, Walter Underwood  wrote:
> 29G on a 30G machine is still a bad config. That leaves no space for the OS, 
> file buffers, or any other processes.
>
> Try with 8G.
>
> Also, give us some information about the number of docs, size of the indexes, 
> and the kinds of search features you are using.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Sep 18, 2017, at 7:55 AM, shamik  wrote:
>>
>> Apologies, 290gb was a typo on my end, it should read 29gb instead. I started
>> with my 5.5 configurations of limiting the RAM to 15gb. But it started going
>> down once it reached the 15gb ceiling. I tried bumping it up to 29gb since
>> memory seemed to stabilize at 22gb after running for few hours, of course,
>> it didn't help eventually. I did try the G1 collector. Though garbage
>> collection was happening more efficiently compared to CMS, it brought the
>> nodes down after a while.
>>
>> The part I'm trying to understand is whether the memory footprint is higher
>> for 6.6 and whether I need an instance with higher ram (>30gb in my case). I
>> haven't added any post 5.5 feature to rule out the possibility of a memory
>> leak.
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Walter Underwood
29G on a 30G machine is still a bad config. That leaves no space for the OS, 
file buffers, or any other processes.

Try with 8G.

Also, give us some information about the number of docs, size of the indexes, 
and the kinds of search features you are using.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 18, 2017, at 7:55 AM, shamik  wrote:
> 
> Apologies, 290gb was a typo on my end, it should read 29gb instead. I started
> with my 5.5 configurations of limiting the RAM to 15gb. But it started going
> down once it reached the 15gb ceiling. I tried bumping it up to 29gb since
> memory seemed to stabilize at 22gb after running for few hours, of course,
> it didn't help eventually. I did try the G1 collector. Though garbage
> collection was happening more efficiently compared to CMS, it brought the
> nodes down after a while.
> 
> The part I'm trying to understand is whether the memory footprint is higher
> for 6.6 and whether I need an instance with higher ram (>30gb in my case). I
> haven't added any post 5.5 feature to rule out the possibility of a memory
> leak.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread shamik
Apologies, 290gb was a typo on my end, it should read 29gb instead. I started
with my 5.5 configurations of limiting the RAM to 15gb. But it started going
down once it reached the 15gb ceiling. I tried bumping it up to 29gb since
memory seemed to stabilize at 22gb after running for few hours, of course,
it didn't help eventually. I did try the G1 collector. Though garbage
collection was happening more efficiently compared to CMS, it brought the
nodes down after a while.

The part I'm trying to understand is whether the memory footprint is higher
for 6.6 and whether I need an instance with higher ram (>30gb in my case). I
haven't added any post 5.5 feature to rule out the possibility of a memory
leak.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Walter Underwood
You are running with a 290 Gb heap () on a 30 Gb machine. That is the worst 
Java config I have ever seen.

Use this:

SOLR_JAVA_MEM="-Xms8g -Xmx8g”

That starts with an 8 Gb heap and stays there.

Also, you might think about simplifying the GC configuration. Or if you are on 
a recent release of Java 8, using the G1 collector. We’re getting great 
performance with this config:

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 18, 2017, at 7:24 AM, Shamik Bandopadhyay  wrote:
> 
> Hi,
> 
>   I recently upgraded to Solr 6.6 from 5.5. After running for a couple of
> days, the entire Solr cluster suddenly came down with OOM exception. Once
> the servers are being restarted, the memory footprint stays stable for a
> while before the sudden spike in memory occurs. The heap surges up quickly
> and hits the max causing the JVM to shut down due to OOM. It starts with
> one server but eventually trickles downs to the rest of the nodes, bringing
> the entire cluster down within a span of 10-15 mins.
> 
> The cluster consists of 6 nodes with two shards having 2 replicas each.
> There are two collections with total index size close to 24 gb. Each server
> has 8 CPUs with 30gb memory. Solr is running on an embedded jetty on jdk
> 1.8. The JVM parameters are identical to 5.5:
> 
> SOLR_JAVA_MEM="-Xms1000m -Xmx29m"
> 
> GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
>  -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> 
> GC_TUNE="-XX:NewRatio=3 \
> -XX:SurvivorRatio=4 \
> -XX:TargetSurvivorRatio=90 \
> -XX:MaxTenuringThreshold=8 \
> -XX:+UseConcMarkSweepGC \
> -XX:+UseParNewGC \
> -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
> -XX:+CMSScavengeBeforeRemark \
> -XX:PretenureSizeThreshold=64m \
> -XX:+UseCMSInitiatingOccupancyOnly \
> -XX:CMSInitiatingOccupancyFraction=50 \
> -XX:CMSMaxAbortablePrecleanTime=6000 \
> -XX:+CMSParallelRemarkEnabled \
> -XX:+ParallelRefProcEnabled"
> 
> I've tried G1GC based on Shawn's WIKI, but didn't make any difference.
> Though G1GC seemed to do well with GC initially, it showed similar
> behaviour during the spike. It prompted me to revert back to CMS.
> 
> I'm doing a hard commit every 5 mins.
> 
> SOLR_OPTS="$SOLR_OPTS -Xss256k"
> SOLR_OPTS="$SOLR_OPTS -Dsolr.autoCommit.maxTime=30"
> SOLR_OPTS="$SOLR_OPTS -Dsolr.clustering.enabled=true"
> SOLR_OPTS="$SOLR_OPTS -Dpkiauth.ttl=12"
> 
> Othe Solr configurations:
> 
> 
> ${solr.autoSoftCommit.maxTime:-1}
> 
> 
> Cache settings:
> 
> 4096
> 1000
>  autowarmCount="512"/>
>  autowarmCount="100"/>
>  autowarmCount="0"/>
>  initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" />
>  autowarmCount="4096" showItems="1024" />
>  class="solr.search.LRUCache" size="4096" initialSize="2048"
> autowarmCount="4096" regenerator="solr.search.NoOpRegenerator" />
> true
> 200
> 400
> 
> I'm not sure what has changed so drastically in 6.6 compared to 5.5. I
> never had a single OOM in 5.5 which has been running for a couple of years.
> Moreover, the memory footprint was much less with 15gb set as Xmx. All my
> facet parameters have docvalues enabled, it should handle the memory part
> efficiently.
> 
> I'm struggling to figure out the root cause. Does 6.6 command more memory
> than what is currently available on our servers (30gb)? What might be the
> probable cause for this sort of scenario? What are the best practices to
> troubleshoot such issues?
> 
> Any pointers will be appreciated.
> 
> Thanks,
> Shamik



Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-18 Thread Shamik Bandopadhyay
Hi,

   I recently upgraded to Solr 6.6 from 5.5. After running for a couple of
days, the entire Solr cluster suddenly came down with OOM exception. Once
the servers are being restarted, the memory footprint stays stable for a
while before the sudden spike in memory occurs. The heap surges up quickly
and hits the max causing the JVM to shut down due to OOM. It starts with
one server but eventually trickles downs to the rest of the nodes, bringing
the entire cluster down within a span of 10-15 mins.

The cluster consists of 6 nodes with two shards having 2 replicas each.
There are two collections with total index size close to 24 gb. Each server
has 8 CPUs with 30gb memory. Solr is running on an embedded jetty on jdk
1.8. The JVM parameters are identical to 5.5:

SOLR_JAVA_MEM="-Xms1000m -Xmx29m"

GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
  -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"

GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"

I've tried G1GC based on Shawn's WIKI, but didn't make any difference.
Though G1GC seemed to do well with GC initially, it showed similar
behaviour during the spike. It prompted me to revert back to CMS.

I'm doing a hard commit every 5 mins.

SOLR_OPTS="$SOLR_OPTS -Xss256k"
SOLR_OPTS="$SOLR_OPTS -Dsolr.autoCommit.maxTime=30"
SOLR_OPTS="$SOLR_OPTS -Dsolr.clustering.enabled=true"
SOLR_OPTS="$SOLR_OPTS -Dpkiauth.ttl=12"

Othe Solr configurations:


${solr.autoSoftCommit.maxTime:-1}


Cache settings:

4096
1000






true
200
400

I'm not sure what has changed so drastically in 6.6 compared to 5.5. I
never had a single OOM in 5.5 which has been running for a couple of years.
Moreover, the memory footprint was much less with 15gb set as Xmx. All my
facet parameters have docvalues enabled, it should handle the memory part
efficiently.

I'm struggling to figure out the root cause. Does 6.6 command more memory
than what is currently available on our servers (30gb)? What might be the
probable cause for this sort of scenario? What are the best practices to
troubleshoot such issues?

Any pointers will be appreciated.

Thanks,
Shamik


Re: solr Facet.contains

2017-09-18 Thread vobium
help me sove this problem



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr- Data search across multiple vores

2017-09-18 Thread Susheel Kumar
What fields do you want to search among two separate collections/cores and
provide some details on your use case.

Thnx

On Mon, Sep 18, 2017 at 1:42 AM, Agrawal, Harshal (GE Digital) <
harshal.agra...@ge.com> wrote:

> Hello Folks,
>
> I want to search data in two separate cores. Both cores are unidentical
> only few fields are common in between.
> I don't want to join data . Is it possible to search data from two cores.
>
> I read about distributed search concept but not able to understand that.
> Is it the only way to search across multiple cores?
>
> Regards
> Harshal
>


Re: Knn classifier doesn't work

2017-09-18 Thread alessandro.benedetti
Hi Tommaso,
you are definitely right!
I see that the method : MultiFields.getTerms
returns :
 if (termsPerLeaf.size() == 0) {
  return null;
}

As you correctly mentioned this is not handled in :

org/apache/lucene/classification/document/SimpleNaiveBayesDocumentClassifier.java:115
org/apache/lucene/classification/document/SimpleNaiveBayesDocumentClassifier.java:228
org/apache/lucene/classification/SimpleNaiveBayesClassifier.java:243

Can you do the change or should I open a Jira issue and attach the simple
patch for you to commit?
let me know,

Regards



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Learning-to-Rank with Bees: question answer follow-up

2017-09-18 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi everyone,

At my "Learning-to-Rank with Apache Solr and Bees" talk on Friday [1] there was 
one question that wasn't properly understood (by me) and so not fully answered 
in the room but later in individual conversation the question/answer became 
clearer. So here I just wanted to follow-up and share with everyone (using a 
fictional mini example).

Hope that helps.

Thanks,

Christine

---

Scenario:
* a schema with multiple text fields e.g. title, summary, details
* search queries consider the text fields

Intention:
* have features that capture how well the user query matches various text fields

Example queries and feature definitions:

* without LTR:
  select?q=developer
  select?q=chef

* with LTR:
  select?q=developer&rq={!ltr model=myDemoModel efi.userQuery=developer}
  select?q=chef&rq={!ltr model=myDemoModel efi.userQuery=chef}

Notice how in the above example the two users' queries pass different 
efi.userQuery values and how the feature definitions below include a 
${userQuery} placeholder.

myDemoFeatures.json

[
 {
  "name" : "userQueryTitle",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "title:${userQuery}" }
 },
 {
  "name" : "userQuerySummary",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "summary:${userQuery}" }
 },
 {
  "name" : "userQueryDetails",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "details:${userQuery}" }
 }
]

---

Links

[1] http://sched.co/BAwI
[2] http://lucene.apache.org/solr/guide/6_6/learning-to-rank.html
[3] https://github.com/cpoerschke/ltr-with-bees

Re: Apache Solr 4.10.x - Collection Reload times out

2017-09-18 Thread alessandro.benedetti
I finally have an explanation, I post it here for future reference :

The cause was a combination of :

1) /select request handler has default with the spellcheck ON and few
spellcheck options ( such as collationQuery ON and max collation tries set
to 5)

2) the firstSearcher has a warm-up query with a lot of terms

Basically when opening the searcher, I found that there was a thread stuck
in waiting and that thread was the one responsible for the collation query.
Basically the Searcher was never finishing to be opened, because of the
collation to be calculated over the big multi term warm-up query.

Lesson Learned : be careful with defaults in the default request handler, as
they may be used by other components ( then just user searches)

Thanks for the support!

Regards



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr - google like suggestion

2017-09-18 Thread alessandro.benedetti
If you are referring to the number of words per suggestion, you may need to
play with the free text lookup type [1]

[1] http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html