Re: solr core replication

2017-10-19 Thread Hendrik Haddorp

Hi Erick,

that is actually the call I'm using :-)
If you invoke 
http://solr_target_machine:port/solr/core/replication?command=details 
after that you can see the replication status. But even after a Solr 
restart the call still shows the replication relation and I would like 
to remove this so that the core looks "normal" again.


regards,
Hendrik

On 20.10.2017 02:31, Erick Erickson wrote:

Little known trick:

The fetchIndex replication API call can take any parameter you specify
in your config. So you don't have to configure replication at all on
your target collection, just issue the replication API command with
masterUrl, something like:

http://solr_target_machine:port/solr/core/replication?command=fetchindex&masterUrl=http://solr_source_machine:port/solr/core

NOTE, "core" above will be something like collection1_shard1_replica1

During the fetchindex, you won't be able to search on the target
collection although the source will be searchable.

Now, all that said this is just copying stuff. So let's say you've
indexed to your source cluster and set up your target cluster (but
don't index anything to the target or do the replication etc). Now if
you shut down the target cluster and just copy the entire data dir
from each source replica to each target replica then start all the
target Solr instances up you'll be fine.

Best,
Erick

On Thu, Oct 19, 2017 at 1:33 PM, Hendrik Haddorp
 wrote:

Hi,

I want to transfer a Solr collection from one SolrCloud to another one. For
that I create a collection in the target cloud using the same config set as
on the source cloud but with a replication factor of one. After that I'm
using the Solr core API with a "replication?command=fetchindex" command to
transfer the data. In the last step I'm increasing the replication factor.
This seems to work fine so far. When I invoke "replication?command=details"
I can see my replication setup and check if the replication is done. In the
end I would like to remove this relation again but there does not seem to be
an API call for that. Given that the replication should be a one time
replication according to the API on
https://lucene.apache.org/solr/guide/6_6/index-replication.html this should
not be a big problem. It just does not look clean to me to leave this in the
system. Is there anything I'm missing?

regards,
Hendrik




Re: Concern on solr commit

2017-10-19 Thread Leo Prince
Thank you Yonik.

Since we are using SoftCommits, the docs written will be in RAM until a
AutoCommit to reflect onto Disk, I just wanted to know what happens when
Solr restarts. Being said, I am using 4.10 and tomcat is handling the Solr,
when we restart the tomcat service just before an AutoCommit, what happens
to the temporary soft written docs which is in RAM. Will they gracefully
write to the disk before restart or should I have to do
"/solr/update?commit=true" manually every time before restarting Solr..?

On Wed, Oct 18, 2017 at 6:08 PM, Yonik Seeley  wrote:

> On Wed, Oct 18, 2017 at 5:09 AM, Leo Prince
>  wrote:
> > Is there any known negative impacts in setting up autoSoftCommit as 1
> > second other than RAM usage..?
>
> Briefly:
> Don't use autowarming (but keep caches enabled!)
> Use docValues for fields you will facet and sort on (this will avoid
> using FieldCache)
>
> -Yonik
>


Re: Solr nodes going into recovery mode and eventually failing

2017-10-19 Thread Erick Erickson
Once you hit an OOM, the behavior of Java is indeterminate. There's no
expectation that things will just pick up where they left off when
memory his freed up. Lots of production systems have OOM killer
scripts that automatically kill/restart Java apps that OOM for just
that reason.

Yes, each replica has it's own cache, but the JVM heap is used by them
all. That's why "times the number of replica". Perhaps a more complete
statement would be "times the number of replica hosted in the JVM".

Hmmm, 11M docs. Let's take 16M, that would give 2M bytes/filterCache
entry. Times 4096 gives around 8G that could be used up in by a cache
that size.

Yeah, your hit ratio is poor at 15%. It's relatively unusual to
require that many entries though, what do the fq clauses look like? Or
are you using something else that consumes cache (some facet methods
do for instance).

And do be sure to use docValues for any field you facet, sort or group on.

Best,
Erick

On Thu, Oct 19, 2017 at 2:24 PM, shamik  wrote:
> Thanks Emir. The index is equally split between the two shards, each having
> approx 35gb. The total number of documents is around 11 million which should
> be distributed equally among the two shards. So, each core should take 3gb
> of the heap for a full cache. Not sure I get the "multiply it by number of
> replica". Shouldn't each replica have its own cache of 3gb? Moreover, based
> on the SPM graph, the max filter cache size during the outages have been 1.5
> million max.
>
> Majority of our queries are heavily dependent on some implicit filter and
> user selected ones. By reducing the filter cache size to the current one of
> 4096 has taken a hit in performance. Earlier (in 5.5), I had a max cache
> size of 10,000 (running on 15gb allocated heap)  which produced a 95% hit
> rate. With the memory issues in 6.6,  I started reducing it to the current
> value. It reduced the % hit to 25. I tried earlier reducing the value to
>  autowarmCount="0"/>.
> It still didn't help which is when I decided to go for a higher RAM machine.
> What I've noticed is that the heap is consistently around 22-23gb mark out
> of which G1 old gen takes close to 13gb, G1 eden space around 6gb, rest
> shared by G Survivor space, Metaspace and Code cache.
>
> This issue has been bothering me as I seemed to be running out of possible
> tuning options. What I could see from the monitoring tool is the surge
> period saw around 400 requests/hr with 40 docs/sec getting indexed. Is it a
> really high volume of load to handle for a cluster size 6 nodes with 16 CPU
> / 64gb RAM? What are the other options I should be looking into?
>
> The other thing which I'm still confused is why the recovery fails when the
> memory has been freed up.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: solr core replication

2017-10-19 Thread Erick Erickson
Little known trick:

The fetchIndex replication API call can take any parameter you specify
in your config. So you don't have to configure replication at all on
your target collection, just issue the replication API command with
masterUrl, something like:

http://solr_target_machine:port/solr/core/replication?command=fetchindex&masterUrl=http://solr_source_machine:port/solr/core

NOTE, "core" above will be something like collection1_shard1_replica1

During the fetchindex, you won't be able to search on the target
collection although the source will be searchable.

Now, all that said this is just copying stuff. So let's say you've
indexed to your source cluster and set up your target cluster (but
don't index anything to the target or do the replication etc). Now if
you shut down the target cluster and just copy the entire data dir
from each source replica to each target replica then start all the
target Solr instances up you'll be fine.

Best,
Erick

On Thu, Oct 19, 2017 at 1:33 PM, Hendrik Haddorp
 wrote:
> Hi,
>
> I want to transfer a Solr collection from one SolrCloud to another one. For
> that I create a collection in the target cloud using the same config set as
> on the source cloud but with a replication factor of one. After that I'm
> using the Solr core API with a "replication?command=fetchindex" command to
> transfer the data. In the last step I'm increasing the replication factor.
> This seems to work fine so far. When I invoke "replication?command=details"
> I can see my replication setup and check if the replication is done. In the
> end I would like to remove this relation again but there does not seem to be
> an API call for that. Given that the replication should be a one time
> replication according to the API on
> https://lucene.apache.org/solr/guide/6_6/index-replication.html this should
> not be a big problem. It just does not look clean to me to leave this in the
> system. Is there anything I'm missing?
>
> regards,
> Hendrik


LTR feature and proximity search with Block Join Parent query Parser

2017-10-19 Thread Dariusz Wojtas
Hi,
I am working on features and my main document ('type:entity') has child
documents, some of them contain addresses ('type:entityAddress').

My feature definition:
{
  "store": "store_myStore",
  "name": "scoreAddressCity",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params":{ "q": "+{!parent which='type:entity'
score='max'}type:entityAddress +{!parent which='type:entity'
score='max'}address.city:${searchedCity}" }
}

Two sample searches where I search for city 'Warszawa'.
I am passing the searched city name with as efi.searchedCity .
a) the address document contains value 'Warszawa' in field 'address.city'
The result feature score is 1.98

b) the address document contains value 'WarszawaRado' in field
'address.city'
The result score is 0.0

How to return a score that finds some similarities between 'Warszawa' and
'WarszawaRado' in search b)?

Best regards,
Dariusz Wojtas


Re: Solr nodes going into recovery mode and eventually failing

2017-10-19 Thread shamik
Thanks Emir. The index is equally split between the two shards, each having
approx 35gb. The total number of documents is around 11 million which should
be distributed equally among the two shards. So, each core should take 3gb
of the heap for a full cache. Not sure I get the "multiply it by number of
replica". Shouldn't each replica have its own cache of 3gb? Moreover, based
on the SPM graph, the max filter cache size during the outages have been 1.5
million max.

Majority of our queries are heavily dependent on some implicit filter and
user selected ones. By reducing the filter cache size to the current one of
4096 has taken a hit in performance. Earlier (in 5.5), I had a max cache
size of 10,000 (running on 15gb allocated heap)  which produced a 95% hit
rate. With the memory issues in 6.6,  I started reducing it to the current
value. It reduced the % hit to 25. I tried earlier reducing the value to  
. 
It still didn't help which is when I decided to go for a higher RAM machine.
What I've noticed is that the heap is consistently around 22-23gb mark out
of which G1 old gen takes close to 13gb, G1 eden space around 6gb, rest
shared by G Survivor space, Metaspace and Code cache. 

This issue has been bothering me as I seemed to be running out of possible
tuning options. What I could see from the monitoring tool is the surge
period saw around 400 requests/hr with 40 docs/sec getting indexed. Is it a
really high volume of load to handle for a cluster size 6 nodes with 16 CPU
/ 64gb RAM? What are the other options I should be looking into? 

The other thing which I'm still confused is why the recovery fails when the
memory has been freed up.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


solr core replication

2017-10-19 Thread Hendrik Haddorp

Hi,

I want to transfer a Solr collection from one SolrCloud to another one. 
For that I create a collection in the target cloud using the same config 
set as on the source cloud but with a replication factor of one. After 
that I'm using the Solr core API with a "replication?command=fetchindex" 
command to transfer the data. In the last step I'm increasing the 
replication factor. This seems to work fine so far. When I invoke 
"replication?command=details" I can see my replication setup and check 
if the replication is done. In the end I would like to remove this 
relation again but there does not seem to be an API call for that. Given 
that the replication should be a one time replication according to the 
API on https://lucene.apache.org/solr/guide/6_6/index-replication.html 
this should not be a big problem. It just does not look clean to me to 
leave this in the system. Is there anything I'm missing?


regards,
Hendrik


Solr boost property through request handler in solrconfig.xml

2017-10-19 Thread ruby
If I'm not using edismax or dismax, is there a way to boost a specific
property through solrconfig.xml? I'm avoiding hard-coding boost in query.
Following is my the request handler  in solronfig.xml right now





 explicit
 10
 myFiled   
 OR
 fc
  




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr boost property through request handler in solrconfig.xml

2017-10-19 Thread ruby
If I'm not using edismax or dismax, is there a way to boost a specific
property through solrconfig.xml? I'm avoiding hard-coding boost in query.
Following is my the request handler  in solronfig.xml right now





 explicit
 10
 myFiled   
 OR
 fc
  




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: spell-check does not return collations when using search query with filter

2017-10-19 Thread Arnold Bronley
Let me know if I should open a JIRA issue for this. Thanks.

On Tue, Oct 17, 2017 at 10:40 AM, Arnold Bronley 
wrote:

> I tried spellcheck.q=polt and q=tag:polt. I get collations, but they are
> only for polt and not tag:polt. Because of that, the hits that I get back
> are for frequency of plot and not frequency of tag:plot
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 20,
> "params": {
>   "spellcheck.collateExtendedResults": "true",
>   "indent": "true",
>   "spellcheck.maxCollations": "3",
>   "spellcheck.maxCollationTries": "3",
>   "spellcheck.extendedResults": "true",
>   "q": "tag:polt",
>   "spellcheck.q": "polt",
>   "spellcheck": "true",
>   "spellcheck.accuracy": "0.72",
>   "spellcheck.onlyMorePopular": "true",
>   "spellcheck.count": "7",
>   "wt": "json",
>   "spellcheck.collate": "true"
> }
>   },
>   "response": {
> "numFound": 0,
> "start": 0,
> "docs": [
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "polt",
>   {
> "numFound": 7,
> "startOffset": 0,
> "endOffset": 4,
> "origFreq": 0,
> "suggestion": [
>   {
> "word": "plot",
> "freq": 5934
>   },
>   {
> "word": "port",
> "freq": 495
>   },
>   {
> "word": "post",
> "freq": 233
>   },
>   {
> "word": "poly",
> "freq": 216
>   },
>   {
> "word": "pole",
> "freq": 175
>   },
>   {
> "word": "poll",
> "freq": 12
>   },
>   {
> "word": "polm",
> "freq": 9
>   }
> ]
>   }
> ],
> "correctlySpelled": false,
> "collations": [
>   "collation",
>   {
> "collationQuery": "plot",
> "hits": 10538,
> "misspellingsAndCorrections": [
>   "polt",
>   "plot"
> ]
>   },
>   "collation",
>   {
> "collationQuery": "port",
> "hits": 754,
> "misspellingsAndCorrections": [
>   "polt",
>   "port"
> ]
>   },
>   "collation",
>   {
> "collationQuery": "post",
> "hits": 626,
> "misspellingsAndCorrections": [
>   "polt",
>   "post"
> ]
>   }
> ]
>   }
> }
>
> On Tue, Oct 17, 2017 at 5:01 AM, alessandro.benedetti <
> a.benede...@sease.io> wrote:
>
>> But you used :
>>
>> "spellcheck.q": "tag:polt",
>>
>> Instead of :
>> "spellcheck.q": "polt",
>>
>> Regards
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>


RE: Certificate issue ERR_SSL_VERSION_OR_CIPHER_MISMATCH

2017-10-19 Thread Younge, Kent A - Norman, OK - Contractor
Resolved the Cipher Mismatch error. 






Thank you,

Kent Younge
Systems Engineer
USPS MTSC IT Support
600 W. Rock Creek Rd, Norman, OK  73069-8357
O:405 573 2273


-Original Message-
From: Younge, Kent A - Norman, OK - Contractor 
[mailto:kent.a.you...@usps.gov.INVALID] 
Sent: Thursday, October 19, 2017 7:30 AM
To: 'solr-user@lucene.apache.org'
Subject: Certificate issue ERR_SSL_VERSION_OR_CIPHER_MISMATCH

Built a clean Solr server imported my certificates and when I go to the 
SSL/HTTPS page it tells me that I have ERR_SSL_VERSION_OR_CIPHER_MISMATCH in 
Chrome and in IE tells me that I need to TURN ON TLS 1.0, TLS 1.1, and TLS 1.2. 
 TLS is turned on and if I browse to the server name instead of the site name 
the SOLR app comes up with a certificate issue saying that the site certificate 
name is different.  I have also installed one of my other certificates that is 
working on one of my other SOLR servers on the server that is having the issue 
and the HTTPS site comes up just fine.This has been going on for over a 
month now and I do not know what to do next.  I have messed with the 
java.security file to see if maybe it was a cipher however, I do not think that 
is actually the problem b/c as I mentioned before if I take one of my other 
certificates and the SOLR HTTPS site comes up for that site name.  So I am 
thinking that the server is configured correctly.  I have requested my 
certificates at least 5 times to see if it is actually the certificate that is 
having the issue.   And none of the certificates for this site has actually 
worked.  I am at a loss at what to look at next.  If I modify the solr.in.sh 
and comment out the SSL settings the site comes up just fine.   I have also 
looked in DNS to see if that was maybe an issue and it is configured properly.  
 I believe another person is having the same issue as I am on the list as well. 








Re: Measuring time spent in analysis and writing to index

2017-10-19 Thread Zisis T.
I've worked in the past for a Solr 5.x custom plugin using AspectJ to track
the # of calls as well as the time spent inside /incrementToken()/ of all
Tokenizers and Filters used during indexing. I could get stats per Solr
indexing thread, not per indexing request though. In any case you could spot
the filter/tokenizer where most of the indexing time was spent. 
Not sure if there's something similar in Solr 6.x or 7.x.

You can see a sample of the output here
 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Measuring time spent in analysis and writing to index

2017-10-19 Thread Nawab Zada Asad Iqbal
Hi,

I want to analyze the time spent in different stages during add/update
document request. E.g., I want to compare time spend in analysis vs writing
to Lucene index. Does Solr provide any such thing? I have looked at
[core/admin/mbeans?stats=true&wt=json&indent=true]  which provides overall
stats but I am interested in breakdown for each index request.

Thanks
Nawab


Re: 3 color jvm memory usage bar

2017-10-19 Thread Nawab Zada Asad Iqbal
Thanks Erik

I see three colors in the JVM usage bar. Dark Gray, light Gray, white.
(left to right).  Only one dark and one light color made sense to me (as i
could interpret them as used vs available memory), but there is light gray
between dark gray and white parts.


Thanks
Nawab

On Thu, Oct 19, 2017 at 8:09 AM, Erick Erickson 
wrote:

> Nawab:
>
> Images are stripped aggressively by the Apache mail servers, your
> attachment didn't come through. You'll have to put it somewhere and
> provide a link.
>
> Generally the lighter color in each bar is the available resource and the
> darker shade is used.
>
> Best,
> Erick
>
> On Thu, Oct 19, 2017 at 7:27 AM, Nawab Zada Asad Iqbal 
> wrote:
> > Good morning,
> >
> >
> > What do the 3 colors mean in this bar on Solr dashboard page? (please see
> > attached) :
> >
> >
> > Regards
> > Nawab
>


Re: Schemaless detecting multivalued fields

2017-10-19 Thread Erick Erickson
Also, if you _know_ certain fields should be defined you can define
them explicitly and let schemaless figure out all the others.

That said, eventually you're going to have to control your schema,
schemaless is _not_ recommended for production systems unless you can
absolutely guarantee the input is in a specific format. And by
"specific format" I mean no field first encountered as, say, an int
later comes through as a float. All date fields are of acceptable
formats, no field first encountered as a single valued field is every
multivalued later etc.

And if you can guarantee that you can create an explicitly defined
schema anyway.

Best,
Erick

On Thu, Oct 19, 2017 at 2:00 AM, Emir Arnautović
 wrote:
> Hi John,
> You should be able to do that with custom update request processor chain and 
> https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
>  
> 
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 19 Oct 2017, at 08:00, John Davis  wrote:
>>
>> Hi,
>> I know about the schemaless configuration defaulting to multivalued fields
>> of the corresponding type.
>>
>> I was just wondering if there was a way to first detect if the incoming
>> value is list or singleton, and based on it pick the corresponding types.
>> Ideally if the value is an long then use tlong while if it is list of longs
>> then use tlongS.
>>
>> Thanks!
>> John
>


Re: Deploy Solr to production: best practices

2017-10-19 Thread Walter Underwood
I recommend the “Taking Solr to Production” chapter in the official Solr 
reference guide. That was my first hit for “solr production” in Google.

https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html 


I recommend using a recent version of Java 8 and the G1 garbage collector. We 
use that with parameters suggested on this list. This is from our solr.in.sh.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 18, 2017, at 10:32 PM, maximka19  wrote:
> 
> Hi everyone!
> 
> I started learning full-text search engines and chosen Solr. I'm introduced
> with Solr, but now I'v having troubles to move Solr to production. 
> 
> 
> 
> *1.* Container: from Solr 5 there is now .WAR-file provided in package. I
> couldn't deploy Solr 7.1 to Tomcat 9. None of existing tutorials or guides
> helped. No such information for newer versions.
> 
> So, does this mean that officially Solr isn't support other containers like
> Tomcat? Can we use Jetty as a main container in production issues? And it's
> officially recommended by developers/maintainers? If so, how can I host Solr
> as a service in Windows Server? There are not any scripts in package for
> Windows, only for .nix machines. How to do that? What a best practices? NO
> information, tutorials, guides are provided in such question, especially for
> Windows users.
> 
> *2.* Other things that should be known in deploying Solr to production:
> which? Anything else that Solr users should know?
> 
> 
> Sirs, guys, I've searched to whole Web, bought and read 4 books about Solr,
> but none of them helped me. Everything is based in older version <5 and much
> more for .nix-OS users than Windows users. No relevant information. Even the
> official documentation contains a small information and doesn't answer such
> questions.
> 
> Please, help me, give some advices, tutorials, opinions and show the right
> way.
> 
> Thank You
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: TermsQuery Result Ordering

2017-10-19 Thread Erick Erickson
If it's worth the effort to you, you could write a custom scorer that "somehow"
pulled these terms out and did what you require. I suppose some kind of
clever function query might work, but again probably custom.

Frankly, though, I wouldn't go there until I'd exhausted either my resources
or my user's patience.

In the worst case, you could break it up into N sub-queries and sort the results
in the app.

Best,
Erick

On Thu, Oct 19, 2017 at 6:59 AM, Webster Homer  wrote:
> Thank you, Erick.
>
> That is exactly what I thought. Indeed, we don't care about solr's scoring,
> as I said we do care about the order of the terms be maintained, hence the
> requirement for boosting the term values.
>
>
> On Wed, Oct 18, 2017 at 4:23 PM, Erick Erickson 
> wrote:
>
>> bq: Can I boost the Terms in the terms query
>>
>> I'm pretty sure you can't. But how many of these do you have? You can
>> always increase the maxBooleanClauses limit in solrconfig.xml. It's
>> primarily there to say "having this many clauses is usually a bad
>> idea, so proceed with caution". I've seen 10,000 and higher be used
>> before, you're really only limited by memory.
>>
>> And I'm going to guess that your application doesn't have a high query
>> rate, so you can likely make maxBooleanClauses be very high.
>>
>> Basically, the code that TermsQuerParser uses bypasses scoring on the
>> theory that these vary large OR clauses are usually useless for
>> scoring, your application is an outlier. But you knew that already ;)
>>
>>
>> Best,
>> Erick
>>
>> On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer 
>> wrote:
>> > I have an application which currently uses a boolean query. The query
>> could
>> > have a large number of boolean terms. I know that the TermsQuery doesn't
>> > have the same limitations as the boolean query. However I need to
>> maintain
>> > the order of the original terms.
>> >
>> > The query terms from the boolean query are actually values returned by a
>> > chemical structure search, which are returned in order of their relevancy
>> > in the structure search. I maintain the order by giving them a boost
>> which
>> > is a function of the relevancy from the structure search.
>> >
>> > structure_id:(12345^800 OR 12356^750 OR abcde^600 ...
>> >
>> > This approach gives me the results in the order I need them in. I'd love
>> to
>> > use the TermsQuery instead as it doesn't have the same limitations.
>> >
>> > Can I boost the Terms in the terms query? Is there a way to order the
>> > results? e.g. would the results be returned in the same order I specified
>> > the terms?
>> >
>> > Thanks,
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Re: 3 color jvm memory usage bar

2017-10-19 Thread Erick Erickson
Nawab:

Images are stripped aggressively by the Apache mail servers, your
attachment didn't come through. You'll have to put it somewhere and
provide a link.

Generally the lighter color in each bar is the available resource and the
darker shade is used.

Best,
Erick

On Thu, Oct 19, 2017 at 7:27 AM, Nawab Zada Asad Iqbal  wrote:
> Good morning,
>
>
> What do the 3 colors mean in this bar on Solr dashboard page? (please see
> attached) :
>
>
> Regards
> Nawab


Re: Deploy Solr to Production: guides, best practices

2017-10-19 Thread Erick Erickson
https://wiki.apache.org/solr/WhyNoWar

Also, recent versions just don't build a war _for_ you. If you insist
you can build your own war file by bundling up "the right stuff".
However, there's no guarantee that you'll be able to do that going
forward. I have to confess that I can't guarantee you can make your
own war in 7.x, and I only _think_ you can in 6x. You can probably
tell I don't recommend it BTW.

A quick google search turns up a bunch of hits, I can't vouch for
any of them. If you'd care to add something to the Wiki above
that'd be great, or even a section in the reference guide.

Best,
Erick

On Thu, Oct 19, 2017 at 7:57 AM, GW  wrote:
> Not a Windows user but you should be able to just install it and surf port
> 8983. Once installed it should show in services
>
> https://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/
>
> On 19 October 2017 at 07:18, maximka19  wrote:
>
>> Rick Leir-2 wrote
>> > Maximka
>> > The app server is bundled in Solr, so you do not install Tomcat or JEtty
>> > separately.
>> > Cheers -- Rick
>>
>> Hi! So, what should I do to host it in Windows Server as service? In
>> production.
>>
>> Thanks
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>


Re: ClassicAnalyzer Behavior on accent character

2017-10-19 Thread Erick Erickson
Have you looked at the specification to see how it's _supposed_ to work?

>From the javadocs:
"implements Unicode text segmentation, * as specified by UAX#29."

See http://unicode.org/reports/tr29/#Word_Boundaries

If you look at the spec and feel that ClassicAnalyzer incorrectly
implements the word break rules then perhaps there's a JIRA.

Best,
Erick

On Thu, Oct 19, 2017 at 6:39 AM, Chitra  wrote:
> Hi,
>   I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was
> indexed as "er l n", some characters were trimmed while indexing.
>
> Here is my code
>
> protected Analyzer.TokenStreamComponents createComponents(final String
> fieldName, final Reader reader)
> {
> final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
> reader);
> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
>
> TokenStream tok = new ClassicFilter(src);
> tok = new LowerCaseFilter(getVersion(), tok);
> tok = new StopFilter(getVersion(), tok, stopwords);
> tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive
> search
>
> return new Analyzer.TokenStreamComponents(src, tok)
> {
> @Override
> protected void setReader(final Reader reader) throws IOException
> {
>
> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
> super.setReader(reader);
> }
> };
> }
>
>
> Am I missing anything? Is that expected behavior for my input or any reason
> behind such abnormal behavior?
>
>
> --
> Regards,
> Chitra


Re: Deploy Solr to Production: guides, best practices

2017-10-19 Thread GW
Not a Windows user but you should be able to just install it and surf port
8983. Once installed it should show in services

https://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/

On 19 October 2017 at 07:18, maximka19  wrote:

> Rick Leir-2 wrote
> > Maximka
> > The app server is bundled in Solr, so you do not install Tomcat or JEtty
> > separately.
> > Cheers -- Rick
>
> Hi! So, what should I do to host it in Windows Server as service? In
> production.
>
> Thanks
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


3 color jvm memory usage bar

2017-10-19 Thread Nawab Zada Asad Iqbal
Good morning,


What do the 3 colors mean in this bar on Solr dashboard page? (please see
attached) :


Regards
Nawab


Re: TermsQuery Result Ordering

2017-10-19 Thread Webster Homer
Thank you, Erick.

That is exactly what I thought. Indeed, we don't care about solr's scoring,
as I said we do care about the order of the terms be maintained, hence the
requirement for boosting the term values.


On Wed, Oct 18, 2017 at 4:23 PM, Erick Erickson 
wrote:

> bq: Can I boost the Terms in the terms query
>
> I'm pretty sure you can't. But how many of these do you have? You can
> always increase the maxBooleanClauses limit in solrconfig.xml. It's
> primarily there to say "having this many clauses is usually a bad
> idea, so proceed with caution". I've seen 10,000 and higher be used
> before, you're really only limited by memory.
>
> And I'm going to guess that your application doesn't have a high query
> rate, so you can likely make maxBooleanClauses be very high.
>
> Basically, the code that TermsQuerParser uses bypasses scoring on the
> theory that these vary large OR clauses are usually useless for
> scoring, your application is an outlier. But you knew that already ;)
>
>
> Best,
> Erick
>
> On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer 
> wrote:
> > I have an application which currently uses a boolean query. The query
> could
> > have a large number of boolean terms. I know that the TermsQuery doesn't
> > have the same limitations as the boolean query. However I need to
> maintain
> > the order of the original terms.
> >
> > The query terms from the boolean query are actually values returned by a
> > chemical structure search, which are returned in order of their relevancy
> > in the structure search. I maintain the order by giving them a boost
> which
> > is a function of the relevancy from the structure search.
> >
> > structure_id:(12345^800 OR 12356^750 OR abcde^600 ...
> >
> > This approach gives me the results in the order I need them in. I'd love
> to
> > use the TermsQuery instead as it doesn't have the same limitations.
> >
> > Can I boost the Terms in the terms query? Is there a way to order the
> > results? e.g. would the results be returned in the same order I specified
> > the terms?
> >
> > Thanks,
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


ClassicAnalyzer Behavior on accent character

2017-10-19 Thread Chitra
Hi,
  I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was
indexed as "er l n", some characters were trimmed while indexing.

Here is my code

protected Analyzer.TokenStreamComponents createComponents(final String
fieldName, final Reader reader)
{
final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
reader);
src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);

TokenStream tok = new ClassicFilter(src);
tok = new LowerCaseFilter(getVersion(), tok);
tok = new StopFilter(getVersion(), tok, stopwords);
tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive
search

return new Analyzer.TokenStreamComponents(src, tok)
{
@Override
protected void setReader(final Reader reader) throws IOException
{

src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
super.setReader(reader);
}
};
}


Am I missing anything? Is that expected behavior for my input or any reason
behind such abnormal behavior?


-- 
Regards,
Chitra


Certificate issue ERR_SSL_VERSION_OR_CIPHER_MISMATCH

2017-10-19 Thread Younge, Kent A - Norman, OK - Contractor
Built a clean Solr server imported my certificates and when I go to the 
SSL/HTTPS page it tells me that I have ERR_SSL_VERSION_OR_CIPHER_MISMATCH in 
Chrome and in IE tells me that I need to TURN ON TLS 1.0, TLS 1.1, and TLS 1.2. 
 TLS is turned on and if I browse to the server name instead of the site name 
the SOLR app comes up with a certificate issue saying that the site certificate 
name is different.  I have also installed one of my other certificates that is 
working on one of my other SOLR servers on the server that is having the issue 
and the HTTPS site comes up just fine.This has been going on for over a 
month now and I do not know what to do next.  I have messed with the 
java.security file to see if maybe it was a cipher however, I do not think that 
is actually the problem b/c as I mentioned before if I take one of my other 
certificates and the SOLR HTTPS site comes up for that site name.  So I am 
thinking that the server is configured correctly.  I have requested my 
certificates at least 5 times to see if it is actually the certificate that is 
having the issue.   And none of the certificates for this site has actually 
worked.  I am at a loss at what to look at next.  If I modify the solr.in.sh 
and comment out the SSL settings the site comes up just fine.   I have also 
looked in DNS to see if that was maybe an issue and it is configured properly.  
 I believe another person is having the same issue as I am on the list as well. 








Re: Deploy Solr to Production: guides, best practices

2017-10-19 Thread maximka19
Rick Leir-2 wrote
> Maximka
> The app server is bundled in Solr, so you do not install Tomcat or JEtty
> separately. 
> Cheers -- Rick

Hi! So, what should I do to host it in Windows Server as service? In
production.

Thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-10-19 Thread Markus Jelsma
By the way, we also see a generous amount of warnings in Zookeeper's logs. Are 
these related? An indication of what?

Thanks,
Markus

2017-10-19 08:57:35,583 [myid:2] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x15e1925fb7e3748, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2017-10-19 08:57:35,583 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket 
connection for client /xxx.xxx.xxx.xxx:41312 which had sessionid 
0x15e1925fb7e3748


 
 
-Original message-
> From:Markus Jelsma 
> Sent: Thursday 19th October 2017 13:45
> To: solr-user@lucene.apache.org
> Subject: RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace
> 
> Hello,
> 
> We are having this problem again, now it affects the front-end too, the logs 
> are littered with Zookeeper connection log lines at WARN level.
> 
> Is it expected that i have to deal with this problem myself? Isn't SolrJ or 
> HTTPClient even going to guarantee me that they will handle underlying 
> connection problems?
> 
> If i have to deal with it myself, is it just a case of catching 
> IllegalStateException and closing and reconnecting SolrClient?
> 
> Thanks,
> Markus
> 
> -Original message-
> > From:Shawn Heisey 
> > Sent: Tuesday 18th July 2017 16:18
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrJ 6.6.0 Connection pool shutdown now with stack trace
> > 
> > On 7/18/2017 5:10 AM, Markus Jelsma wrote:
> > > The problem was never resolved but Shawn asked for the stack trace, here 
> > > it is:
> > 
> > > Caused by: java.lang.IllegalStateException: Connection pool shut down 
> > > at org.apache.http.util.Asserts.check(Asserts.java:34) 
> > 
> > As I suspected, it is the connection pool inside HttpClient that is shut
> > down (closed).
> > 
> > Earlier today before I came into the office, I asked the HttpClient user
> > list whether this could ever happen for a reason other than an explicit
> > close/shutdown.  They looked at the code and found that the exception
> > only is thrown if the "isShutDown" boolean flag is true, and the only
> > place that ever gets set to true is when an explicit shutdown is called
> > on the connection pool.
> > 
> > When a solr client is built without an external HttpClient, calling
> > close() on the solr client will shut down the internal HttpClient.  If
> > an external HttpClient is used, the user code would need to shut it down
> > for this to happen.  Recent versions of SolrJ are using
> > CloseableHttpClient, which will shut down the connection pool if close()
> > is called.
> > 
> > It's looking like this error has happened because the HttpClient object
> > inside the solr client has been shut down explicitly, which might have
> > happened because one of the outer layers had close() called.
> > 
> > Thanks,
> > Shawn
> > 
> > 
> 


RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-10-19 Thread Markus Jelsma
Hello,

We are having this problem again, now it affects the front-end too, the logs 
are littered with Zookeeper connection log lines at WARN level.

Is it expected that i have to deal with this problem myself? Isn't SolrJ or 
HTTPClient even going to guarantee me that they will handle underlying 
connection problems?

If i have to deal with it myself, is it just a case of catching 
IllegalStateException and closing and reconnecting SolrClient?

Thanks,
Markus

-Original message-
> From:Shawn Heisey 
> Sent: Tuesday 18th July 2017 16:18
> To: solr-user@lucene.apache.org
> Subject: Re: SolrJ 6.6.0 Connection pool shutdown now with stack trace
> 
> On 7/18/2017 5:10 AM, Markus Jelsma wrote:
> > The problem was never resolved but Shawn asked for the stack trace, here it 
> > is:
> 
> > Caused by: java.lang.IllegalStateException: Connection pool shut down 
> > at org.apache.http.util.Asserts.check(Asserts.java:34) 
> 
> As I suspected, it is the connection pool inside HttpClient that is shut
> down (closed).
> 
> Earlier today before I came into the office, I asked the HttpClient user
> list whether this could ever happen for a reason other than an explicit
> close/shutdown.  They looked at the code and found that the exception
> only is thrown if the "isShutDown" boolean flag is true, and the only
> place that ever gets set to true is when an explicit shutdown is called
> on the connection pool.
> 
> When a solr client is built without an external HttpClient, calling
> close() on the solr client will shut down the internal HttpClient.  If
> an external HttpClient is used, the user code would need to shut it down
> for this to happen.  Recent versions of SolrJ are using
> CloseableHttpClient, which will shut down the connection pool if close()
> is called.
> 
> It's looking like this error has happened because the HttpClient object
> inside the solr client has been shut down explicitly, which might have
> happened because one of the outer layers had close() called.
> 
> Thanks,
> Shawn
> 
> 


Re: Deploy Solr to Production: guides, best practices

2017-10-19 Thread Rick Leir
Maximka
The app server is bundled in Solr, so you do not install Tomcat or JEtty 
separately. 
Cheers -- Rick

On October 19, 2017 2:01:30 AM EDT, maximka19  wrote:
>Hi everyone!
>
>I was looking for full-text search engine and chosen Solr. Quickly
>introduced with Solr. Now I'm having troubles with taking Solr to
>Production
>under Windows Server.
>
>As You know, from Solr 5 there is no .WAR-file in package; I couldn't
>deploy
>Solr 7.1 to Tomcat 9. Didn't found any information, tutorials, guides
>relevantly to new versions of both Solr and Tomcat.
>
>So, the first question that comes: do I need to use default Jetty
>container
>in production? Or Tomcat is more preferable in production ways? If so,
>why?
>For what reasons? In older (and the only) books about Solr I've read
>the
>Tomcat is more efficient in production that default Jetty. Book were
>considering Solr 3 and Tomcat 6. Nowadays versions are much higher. If
>we
>can use Jetty in production, how to deploy Solr with Jetty as a service
>in
>Windows Server? There are no scripts provided for Windows users, only
>for
>.NIX-users.
>
>Troubling with this question for two weeks, really. There are NO
>relevant
>information in such questions, even in official documentation. And the
>other
>thing: do Solr users to know smth else about deployin Solr to
>production?
>Any bugs, recommendations, best practices? Or everything goes
>out-of-the-box? 
>
>
>I really need help, advices and guides in this question.
>Thank You
>
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

[ANNOUNCE] [SECURITY] CVE-2017-12629: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)

2017-10-19 Thread Shalin Shekhar Mangar
CVE-2017-12629: Several critical vulnerabilities discovered in Apache
Solr (XXE & RCE)

Severity: Critical

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 5.5.0 to 5.5.4
Solr 6.0.0 to 6.6.1
Solr 7.0.0 to 7.0.1

Description:
The details of this vulnerability were reported on public mailing
lists. See https://s.apache.org/FJDl

The first vulnerability relates to XML external entity expansion in
the XML Query Parser which is available, by default, for any query
request with parameters deftype=xmlparser. This can be exploited to
upload malicious data to the /upload request handler. It can also be
used as Blind XXE using ftp wrapper in order to read arbitrary local
files from the solr server.

The second vulnerability relates to remote code execution using the
RunExecutableListener available on all affected versions of Solr.

At the time of the above report, this was a 0-day vulnerability with a
working exploit affecting the versions of Solr mentioned in the
previous section. However, mitigation steps were announced to protect
Solr users the same day. See
https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list

Mitigation:
Users are advised to upgrade to either Solr 6.6.2 or Solr 7.1.0
releases both of which address the two vulnerabilities. Once upgrade is
complete, no other steps are required.

If users are unable to upgrade to Solr 6.6.2 or Solr 7.1.0 then they
are advised to restart their Solr instances with the system parameter
`-Ddisable.configEdit=true`. This will disallow any changes to be made
to your configurations via the Config API. This is a key factor in
this vulnerability, since it allows GET requests to add the
RunExecutableListener to your config. Users are also advised to re-map
the XML Query Parser to another parser to mitigate the XXE
vulnerability. For example, adding the following to the solrconfig.xml
file re-maps the xmlparser to the edismax parser:


Credit:
Michael Stepankin (JPMorgan Chase)
Olga Barinova (Gotham Digital Science)

References:
https://issues.apache.org/jira/browse/SOLR-11482
https://issues.apache.org/jira/browse/SOLR-11477
https://wiki.apache.org/solr/SolrSecurity

-- 
Regards,
Shalin Shekhar Mangar.


Deploy Solr to production: best practices

2017-10-19 Thread maximka19
Hi everyone!

I started learning full-text search engines and chosen Solr. I'm introduced
with Solr, but now I'v having troubles to move Solr to production. 



*1.* Container: from Solr 5 there is now .WAR-file provided in package. I
couldn't deploy Solr 7.1 to Tomcat 9. None of existing tutorials or guides
helped. No such information for newer versions.

So, does this mean that officially Solr isn't support other containers like
Tomcat? Can we use Jetty as a main container in production issues? And it's
officially recommended by developers/maintainers? If so, how can I host Solr
as a service in Windows Server? There are not any scripts in package for
Windows, only for .nix machines. How to do that? What a best practices? NO
information, tutorials, guides are provided in such question, especially for
Windows users.

*2.* Other things that should be known in deploying Solr to production:
which? Anything else that Solr users should know?


Sirs, guys, I've searched to whole Web, bought and read 4 books about Solr,
but none of them helped me. Everything is based in older version <5 and much
more for .nix-OS users than Windows users. No relevant information. Even the
official documentation contains a small information and doesn't answer such
questions.

Please, help me, give some advices, tutorials, opinions and show the right
way.

Thank You



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr nodes going into recovery mode and eventually failing

2017-10-19 Thread Emir Arnautović
Hi Shamik,
I am pleased to see you find SPM useful!
I think that your problems might be related to caches exhausting your memory. 
You mentioned that your index is 70GB, but how many documents it has? Remember 
that filter caches can take up to 1bit/doc. With 4096 filter cache size it 
means that full cache will take up to 0.5GB for 1 million documents (it is per 
core, so multiply it by number of replicas). If you have some fields without 
doc values check the sizes of fieldCaches and fieldValueCaches as well.
Other thing that you should revisit is heap size - having heap sizes over 32GB 
prevents JVM from using compressed OOPS resulting in less object fitting heap 
of sizes just over 32GB than of heap of 31GB (the boundary depends on JVM so 
not exactly 32GB).

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Oct 2017, at 01:12, Shamik Bandopadhyay  wrote:
> 
> Hi,
> 
>  I'm having this weird issue where Solr nodes suddenly go into recovery
> mode and eventually failing. That one failure kicks off a cascading effect
> and impacts the other nodes eventually. Without a restart, the entire
> cluster goes into a limbo after a while. Looking into the log and SPM
> monitoring tool, the issue happens under following circumstances:
> 1. Node gets a spike in query/index request, thus exhausting its allocated
> memory.
> 2. GC forces CPU to use 100% of it's capacity
> 3. None of the above, when both JVM and CPU are within limit
> 
> I'm using Solr 6.6. Here are the details about the node :
> 
> Hardware type: AWS m4.4xlarge instance
> Total memory : 64 gb
> CPU : 16
> SSD
> SOLR_JAVA_MEM="-Xms35g -Xmx35g"
> GC_TUNE="-XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts"
> SOLR_OPTS="$SOLR_OPTS -Xss256k"
> SOLR_OPTS="$SOLR_OPTS -Dsolr.autoCommit.maxTime=60"
> SOLR_OPTS="$SOLR_OPTS -Dsolr.clustering.enabled=true"
> SOLR_OPTS="$SOLR_OPTS -Dpkiauth.ttl=12"
> 
> Cache Parameters:
> 4096
> 1000
>  autowarmCount="20"/>
>  autowarmCount="10"/>
>  autowarmCount="10"/>
> 
>  initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" />
>  showItems="10" />
>  class="solr.search.LRUCache" size="4096" initialSize="2048"
> autowarmCount="4096" regenerator="solr.search.NoOpRegenerator" />
> true
> 60
> 
> I've currently 2 shards each having 2 replicas. The index size is
> approximately 70gb.
> 
> Here's a solr log trace from the series of events once the node starts
> getting into trouble. I've posted only the relevant ones here.
> 
> 
> org.apache.solr.common.SolrException.log(SolrException.java:148) -
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
> are disabled.
>at
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1738)
> 
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:143)
> - It has been requested that we recover: core=knowledge
> INFO647718[qtp2039328061-1526] -
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:732)
> - [admin] webapp=null path=/admin/cores
> params={core=knowledge&action=REQUESTRECOVERY&wt=javabin&version=2}
> status=0 QTime=0
> INFO647808[qtp2039328061-1540] -
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:187)
> - [knowledge]  webapp=/solr path=/update
> params={update.distrib=FROMLEADER&distrib.from=
> http://xx.xxx.xxx.63:8983/solr/knowledge/&wt=javabin&version=2}{} 0 0
> 
> WARN657500[recoveryExecutor-3-thread-4-processing-n:xx.xxx.xxx.251:8983_solr
> x:knowledge s:shard2 c:knowledge r:core_node9] -
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:659)
> - Socket timeout on send prep recovery cmd, retrying..
> INFO657500[recoveryExecutor-3-thread-4-processing-n:xx.xxx.xxx.251:8983_solr
> x:knowledge s:shard2 c:knowledge r:core_node9] -
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:676)
> - Sending prep recovery command to [http://xx.xxx.xxx.63:8983/solr];
> [WaitForState:
> action=PREPRECOVERY&core=knowledge&nodeName=xx.xxx.xxx.251:8983_solr&coreNodeName=core_node9&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
> WARN667514[recoveryExecutor-3-thread-4-processing-n:xx.xxx.xxx.251:8983_solr
> x:knowledge s:shard2 c:knowledge r:core_node9] -
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:659)
> - Socket timeout on send prep recovery cmd, retrying..
> 
> The retry happens for few times, then
> 
> INFO689389[qtp2039328061-1649] -
> org.apache.solr.security.RuleBasedAuthorizationPlugin.checkPathPerm(RuleBasedAuthorizationPlugin.java:147)
> - request has come without principal. failed permission {
>  "name":"select",
>  "collection"

Re: Schemaless detecting multivalued fields

2017-10-19 Thread Emir Arnautović
Hi John,
You should be able to do that with custom update request processor chain and 
https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
 


HTH,
Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 19 Oct 2017, at 08:00, John Davis  wrote:
> 
> Hi,
> I know about the schemaless configuration defaulting to multivalued fields
> of the corresponding type.
> 
> I was just wondering if there was a way to first detect if the incoming
> value is list or singleton, and based on it pick the corresponding types.
> Ideally if the value is an long then use tlong while if it is list of longs
> then use tlongS.
> 
> Thanks!
> John



No in-place updates with router.field set

2017-10-19 Thread James
Steps to reproduce:

Use Solr in SolrCloud mode.
Create collection with implicit routing and router.field set to some field,
e.g. "routerfield".
Index very small document. Stop time -> X
Index very large document. Stop time -> Y
Apply update to large document. Note that update command has at least three
entries:
{
 "ID":1133,
 "Property_2":{"set":124},
 "routerfield":"FirstShard"
 }
QTime of update will always be closer to Y than to X.

If I repeat these steps without setting router.field while creating the
collection, QTime of update will be very close X.


>From this simple test I conclude that router.field somehow prevents updates
from being performed as in-place updates.
Can anyone confirm? Is this a bug? Anybody care to open a Jira item if
necessary?

According to the first comment on
https://issues.apache.org/jira/browse/SOLR-8889 the router.field option is
hardly tested and there seem to be also other related problems.