date:20190321

CDCR issues

2019-03-21 Thread Jay Potharaju

Hi,
I just enabled CDCR for one  collection. I am seeing high CPU usage and the 
high number of tlog files and increasing.
The collection does not have lot of data , just started reindexing of data. 
.
Solr 7.7.0 , implicit sharding 8 shards
I have enabled buffer on source side and disabled buffer on target side. 
The number of replicators is set to 4.
 Any suggestions on how to tackle high cpu and growing tlog. The tlog are small 
in size but for the one shard I checked there were about 100 of them. 

Thanks
Jay

HTML/JavaScript Query and Results Display Not Working

2019-03-21 Thread Deoxyribonucleic_DNA ...

I am trying working off of https://wiki.apache.org/solr/SolJSON tutorial. I 
have put my url for solr in the code, copied from solr admin query result to 
make sure the query should return something.

I try typing in "title:Asian" into text box but when the button is hit, textbox 
just clears and nothing in output spot.

I used the dev tools from [F12] key of browser to check console and see there 
was no errors given there, such as for syntax, so not due to that.

Perhaps I am understanding how the url for query works or should be here? If I 
leave out local host part as shown I just get error for not specifying local 
full path.




Solr Ajax Example


// derived from http://www.degraeve.com/reference/simple-ajax-example.php
function xmlhttpPost(strURL)
{
var xmlHttpReq = false;
var self = this;

if (window.XMLHttpRequest) { // Mozilla/Safari
self.xmlHttpReq = new XMLHttpRequest();
}
else if (window.ActiveXObject) { // IE
self.xmlHttpReq = new ActiveXObject("Microsoft.XMLHTTP");
}

self.xmlHttpReq.open('POST', strURL, true);
self.xmlHttpReq.setRequestHeader('Content-Type', 
'application/x-www-form-urlencoded');


self.xmlHttpReq.onreadystatechange = function() {
if (self.xmlHttpReq.readyState == 4) {
updatepage(self.xmlHttpReq.responseText);
}
};

var params = getstandardargs().concat(getquerystring());
var strData = params.join('&');
self.xmlHttpReq.send(strData);
//document.getElementById("raw").innerHTML = strData;
return false;
}

function getstandardargs() {
var params = [
'wt=json'
, 'indent=on'
, 'hl=true'
];

return params;
}
function getquerystring() {
  var form = document.forms['f1'];
  var query = form.query.value;
  qstr = 'q=' + escape(query);
  return qstr;
}

// this function does all the work of parsing the solr response and updating 
the page.
function updatepage(str)
{
  document.getElementById("raw").innerHTML = str;
  var rsp = eval("("+str+")"); // use eval to parse Solr's JSON response
  var html = "
numFound=" + rsp.response.numFound;
  var first = rsp.response.docs[0];
  html += "
product name=" + first.name;
  var hl = rsp.highlighting[first.id];
  if (hl.name != null) { html += "
name highlighted: " + hl.name[0]; }
  if (hl.features != null) { html += "
features highligted: " + 
hl.features[0]; }
  document.getElementById("result").innerHTML = html;
}






  query: 
  


Raw JSON String/output:

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Erick Erickson

sow was introduced in Solr 6, so it’s just ignored in 4x.

bq. Surely the tokenizer splits on white space anyway, or it wouldn't work?

I didn’t work on that code, so I don’t have the details off the top of my head, 
but I’ll take a stab at it as far as my understanding goes. The result is in 
your parsed queries.

Note that   in the better-behaved case, you have a bunch of individual 
tokens ORd together like:
productdetails_tokens_en:9611444530
productdetails_tokens_en:9611444530

 and that’s all. IOW, the query parser has split them into individual tokens 
that are fed one at a time into the analysis chain.

In the bad case you have a bunch of single tokens as well, but then what look 
like multiple tokens, but are not:
+productdetails_tokens_en:9611444500
+productdetails_tokens_en:9612194002 9612194002 9612194002)

which is where the explosion is coming from. It’s deceptive, because when 
shingling, this is a single token "9612194002 9612194002 9612194002” for all it 
looks like something that’d be split by whitespace. 

If you take a look at your admin UI>>your_core>>schema and select your 
productdetails_tokens_en from the drop down and then “load terms” you’ll see. 
If you want to experiment, you can add a tokenSeparator character other than a 
space to the shinglefilter that’ll make it clearer. Then the clause above that 
looks like multiple, whitespace-separated tokens would look like what it really 
is, a single token:

+productdetails_tokens_en:9612194002_9612194002_9612194002)

Best,
Erick

> On Mar 21, 2019, at 3:10 PM, Hubert-Price, Neil  
> wrote:
> 
> Surely the tokenizer splits on white space anyway, or it wouldn't work?

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Hubert-Price, Neil

Hi Erick,

I've run a series of tests using debug=true, the same original query, and 
variations around sow=true/sow=false/not set.  See links below for .txt files 
containing the output.  I have removed any genuine document content and 
replaced it with .. because I don't have the customer's permission to post 
their data.  However the debug info, etc should still be usable.

Points to note:
 - All queries that completed returned the same set of documents
 - Solr 7.1 on the original configuration, query succeeds only if sow=true is 
passed.
 - Solr 7.1 with the config change mentioned earlier, all 3 succeed however 
both original/sow=false have higher QTime and longer parsed queries
 - With Solr 7.1 sow=true the behaviour seems to be the same with/without the 
reconfiguration
 - The Solr 4.6 output seems to be much the same for all 3 attempts, except for 
variations in QTime.  However that may be because the server is older + mostly 
unused currently.  I assume this sow parameter isn't supported in 4.6?

Solr 4.6 Original Query: 
https://drive.google.com/open?id=1vRn-2NabuKoJshqxXpQ-kOeJ8G-gcZlu
Solr 4.6 sow=false: 
https://drive.google.com/open?id=1nAvMvm9LNb-gA3UIDFI-eJzqaOhToQPV
Solr 4.6 sow=true: 
https://drive.google.com/open?id=14PRJG459poLe634E75T68wClJLg0tXWp
Solr 7.1 Original Config sow=true: 
https://drive.google.com/open?id=1q1iNfef6-LmqNjI7gTWxUJLsNx9C2U6v
Solr 7.1 Reconfigured Original Query: 
https://drive.google.com/open?id=138KYW7MCobU_3MZhC4lAhWgvWTaspK2N
Solr 7.1 Reconfigured sow=false: 
https://drive.google.com/open?id=127ZIKtSvivn5SJ4sLR25iu-mUCMW8bCu
Solr 7.1 Reconfigured sow=true: 
https://drive.google.com/open?id=1UJVHzQjgeF4fJ4ILnf4YYag5wdmWi3uS


So this sow=true config has a very definite effect in Solr 7.1 for us at least. 
 I'm unclear how that affects the behaviour of the query though?  Surely the 
tokenizer splits on white space anyway, or it wouldn't work?  Can you explain 
any more about the purpose of this & when it was introduced?

Many Thanks,
Neil


On 21/03/2019, 16:06, "Erick Erickson"  wrote:

Neil:

Yeah, the attachment-stripping is catches everyone first time, we’re so 
used to just adding anything we want to an e-mail…

I don’t know enough about the query parsing to answer off the top of my 
head. I do know one thing that’s changed is “Split on Whitespace” has changed 
from true to false by default, so it’d be interesting to add &sow=false to the 
query.

Beyond that, take a look at what &debug=query added to the URL returns. My 
guess is that it’ll be identical but it’s worth a look.

Sorry I can’t be more help here
Erick

> On Mar 21, 2019, at 1:11 AM, Hubert-Price, Neil 
 wrote:
> 
> Hello Erick,
> 
> This is the first time I've had reason to use the mailing list, so I 
wasn't aware of the behaviour around attachments.  See below, links to the 
images that I originally sent as attachments, both are screenshots from within 
Eclipse MAT looking at a SOLR heap dump.
> 
> LargeQueryStructure.png - 
https://drive.google.com/open?id=1SkRYav2iV6Z1znmzr4KKJzMcXzNF0_Wg 
> LargeNumberClauses.png - 
https://drive.google.com/open?id=1CaySU2HzyvHsdbIW_n0190ofjPS3hAeN
> 
> The LargeQueryStructure image shows as single thread with retained set of 
4.8GB, with the biggest items being a BooleanWeight object of just over 1.8GB 
and a BooleanQuery object of just under 1.8GB
> 
> The LargeNumberClauses image shows a drilldown into the BooleanQuery 
object, where a subquery is taking around 0.9GB and contains a 
BooleanClause[524288] array of clauses (not shown: each of these 524288 is 
actually a subquery with multiple clauses).  The array is taking 0.6GB, and 
there is a second instance of the same array in another subquery (also not 
shown).
> 
> 
> Since the last email we have had some success with a reconfiguration of 
the fieldType that I referenced in my original email below.  Where it was 
originally:
> 
> 
>   
>   
>   
>   
>   
>   
> 
> 
> We have now reconfigured to:
> 
> 
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
> After the reconfiguration, the huge memory effect of the queries in Solr 
7.1 is gone.  We could kill test instances of Solr with a single query in the 
original configuration. After reconfiguration we can run multiple similar 
queries in parallel, and the Solr process responds in 50-150ms with only 
approx. 100MB added to the heap.
> 
> This may well be sufficient for our purposes, as I don't think end users 
will notice the difference in practice & queries that were previously failing 
now return normally.
> 
> However I am still curious as to how this performs so differently in Solr 
4.

RE: is df needed for SolrCloud replication?

2019-03-21 Thread Oakley, Craig (NIH/NLM/NCBI) [C]

Thanks.

That resolves the issue.


Thanks again.

-Original Message-
From: Shawn Heisey  
Sent: Tuesday, March 19, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: is df needed for SolrCloud replication?

On 3/19/2019 4:48 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> I recently noticed that my solr.log files have been getting the following 
> error message:
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: no field 
> name specified in query and no default specified via 'df' param
> 
> The timing of these messages coincide with pings to leader node of the 
> SolrCloud from other nodes of the SolrCloud (the message appears only on 
> whatever node is currently the leader).
> 
> I believe that the user for whom I set up this SolrCloud intentionally 
> removed df from the defaults section of solrconfig.xml (in order to 
> streamline out parts of the code which he does not use).
> 
> I have not (yet) noticed any ill effects from this error. Is this error 
> benign? Or shall I ask the user to reinstate df in the defaults section of 
> solrconfig.xml? Or can SorlCloud replication be configured to work around any 
> ill effects that there may be?

If you don't define df (which means "default field"), then every query 
must indicate which field(s) it will query, or you will see that error 
message.

It sounds like the query that is in the ping handler needs to be changed 
so it includes a field name.  Typically ping handlers use *:* for their 
query, which is special syntax for all documents, and works even when no 
fields are defined.  That query is usually extremely fast.

Thanks,
Shawn

Re: Upgrading solarj from 6.5.1 to 8.0.0

2019-03-21 Thread Erick Erickson

One tangent just so you’re aware. You _must_ re-index from scratch. Lucene 8x 
will refuse to open an index that was _ever_ touched by Solr 6.

Best,
Erick

> On Mar 21, 2019, at 8:26 AM, Lahiru Jayasekera  
> wrote:
> 
> Hi Jason,
> Thanks for the response. I saw the method of setting credentials based on
> individual request.
> But I need to set the credentials at solrclient level. If you remember the
> way to do it please let me know.
> 
> Thanks
> 
> On Thu, Mar 21, 2019 at 8:26 PM Jason Gerlowski 
> wrote:
> 
>> You should be able to set credentials on individual requests with the
>> SolrRequest.setBasicAuthCredentials() method.  That's the method
>> suggested by the latest Solr ref guide at least:
>> 
>> https://lucene.apache.org/solr/guide/7_7/basic-authentication-plugin.html#using-basic-auth-with-solrj
>> 
>> There might be a way to set the credentials on the client itself, but
>> I can't think of it at the moment.
>> 
>> Hope that helps,
>> 
>> Jason
>> 
>> On Thu, Mar 21, 2019 at 2:34 AM Lahiru Jayasekera
>>  wrote:
>>> 
>>> Hi all,
>>> I need help implementing the following code in solarj 8.0.0.
>>> 
>>> private SolrClient server, adminServer;
>>> 
>>> this.adminServer = new HttpSolrClient(SolrClientUrl);
>>> this.server = new HttpSolrClient( SolrClientUrl + "/" +
>> mapping.getCoreName() );
>>> if (serverUserAuth) {
>>>  HttpClientUtil.setBasicAuth(
>>>  (DefaultHttpClient) ((HttpSolrClient) adminServer).getHttpClient(),
>>>  serverUsername, serverPassword);
>>>  HttpClientUtil.setBasicAuth(
>>>  (DefaultHttpClient) ((HttpSolrClient) server).getHttpClient(),
>>>  serverUsername, serverPassword);
>>> }
>>> 
>>> 
>>> I could get the solarClients as following
>>> 
>>> this.adminServer = new HttpSolrClient.Builder(SolrClientUrl).build();
>>> this.server = new HttpSolrClient.Builder( SolrClientUrl + "/" +
>>> mapping.getCoreName() ).build();
>>> 
>>> But i can't find a way to implement basic authentication. I think that it
>>> can be done via SolrHttpClientBuilder.
>>> Can you please help me to solve this?
>>> 
>>> Thank and regards
>>> Lahiru
>>> --
>>> Lahiru Jayasekara
>>> Batch 15
>>> Faculty of Information Technology
>>> University of Moratuwa
>>> 0716492170
>> 
> 
> 
> -- 
> Lahiru Jayasekara
> Batch 15
> Faculty of Information Technology
> University of Moratuwa
> 0716492170

Re: Migrate Solr Master To Cloud 7.5

2019-03-21 Thread Erick Erickson

Yeah, the link you referenced will work. It is _very important_ that you create 
your collection with exactly one shard then do the copy.

After that you can use SPLITSHARD to sub-divide it. This is a costly operation, 
but probably not as costly as re-indexing.

That said, it might be easier to just create a new collection with shards and 
re-index, it depends of course on how painful reindexing is….

Best,
Erick

> On Mar 21, 2019, at 8:55 AM, IZaBEE_Keeper  wrote:
> 
> Hi..
> 
> I have a large Solr 7.5 index over 150M docs and 800GB in a master slave
> setup.. I need to migrate the core to a Solr Cloud instance with pull
> replicas as the index will be exceeding the 2.2B doc limit for a single
> core.. 
> 
> I found this..
> http://lucene.472066.n3.nabble.com/Copy-existing-index-from-standalone-Solr-to-Solr-cloud-td4149920.html
> 
>   
> It's a bit out dated but sounds like it might work..
> 
> Does anyone have any other advice/links for this type of migration? 
> 
> Right now I just need to convert the master to cloud before it gets much
> bigger.. Re-indexing is an option but I would rather convert which is likely
> much faster..
> 
> Thanks.. 
> 
> 
> 
> -
> Bee Keeper at IZaBEE.com
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Gather Nodes Streaming

2019-03-21 Thread Joel Bernstein

gatherNodes requires single value fields in the tuples. In certain
scenarios the cartesianProduct streaming expression can be used to explode
a multi-value field into a single field stream. But in the scenario you
describe this might not be possible.



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Mar 20, 2019 at 10:58 PM Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> What is the fieldType of your 'to field? Which tokenizers/filters is it
> using?
>
> Also, which Solr version are you using?
>
> Regards,
> Edwin
>
> On Thu, 21 Mar 2019 at 01:57, Susmit Shukla 
> wrote:
>
> > Hi,
> >
> > Trying to use solr streaming 'gatherNodes' function. It is for extracting
> > email graph based on from and to fields.
> > It requires 'to' field to be a single value field with docvalues enabled
> > since it is used internally for sorting and unique streams
> >
> > The 'to' field can contain multiple email addresses - each being a node.
> > How to map multiple comma separated email addresses from the 'to' fields
> as
> > separate graph nodes?
> >
> > Thanks
> >
> >
> >
> > >
> > >
> >
>

Re: highlighter, stored documents and performance

2019-03-21 Thread Erick Erickson

By and large, storing data will not affect search speed as much as you might 
think. Getting the top N results (say 10) doesn’t use stored data at all. It’s 
only _after_ that point that highlighting occurs on the 10 docs.

As far as needing the full doc, Jörn is right, it must be stored. The problem 
is that what’s in the index, aside from being very expensive to use to 
reconstruct the doc (think 10s of seconds at least per doc) is lossy. Say you 
stem and one of your words is ‘running’. All that’s in the index is ‘run’ so 
using that to highlight, even if it were fast, wouldn’t be satisfactory.

Best,
Erick

> On Mar 21, 2019, at 9:32 AM, Jörn Franke  wrote:
> 
> Hi,
> 
> Then you have to go for the full documents. I recommend to reduce then the 
> returned results, use paging (if it is a web ui) and split the documents on 
> several nodes (if the previous measures do not turn out to be successful).
> 
> Best regards 
> 
>> Am 21.03.2019 um 17:15 schrieb Martin Frank Hansen (MHQ) :
>> 
>> Hi Jörn,
>> 
>> Thanks for your answer.
>> 
>> Unfortunately, there is no summary included in the documents  and I would 
>> like it to work for all documents.
>> 
>> Best regards
>> 
>> Martin
>> 
>> 
>> Internal - KMD A/S
>> 
>> -Original Message-
>> From: Jörn Franke 
>> Sent: 21. marts 2019 17:11
>> To: solr-user@lucene.apache.org
>> Subject: Re: highlighter, stored documents and performance
>> 
>> I don’t think so - to highlight any possible query you need the full 
>> document.
>> 
>> You could optimize it by only storing a subset of the document and highlight 
>> only in this subset.
>> 
>> Alternatively you can store a summary and show only the summary without 
>> highlighting.
>> 
>>> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
>>> 
>>> Hi,
>>> 
>>> I am wondering how performance highlighting in Solr performs when the 
>>> number of documents get large?
>>> 
>>> Right now we have about 1 TB of data in all sorts of file types and I was 
>>> wondering how storing these documents within Solr (for highlighting 
>>> purpose) will affect performance?
>>> 
>>> Is it possible to use highlighting without storing the documents?
>>> 
>>> Best regards
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>> 
>>> Internal - KMD A/S
>>> 
>>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
>>> KMD’s Privatlivspolitik, der 
>>> fortæller, hvordan vi behandler oplysninger om dig.
>>> 
>>> Protection of your personal data is important to us. Here you can read 
>>> KMD’s Privacy Policy outlining how we 
>>> process your personal data.
>>> 
>>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
>>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst 
>>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi 
>>> dig slette e-mailen i dit system uden at videresende eller kopiere den. 
>>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri 
>>> for virus og andre fejl, som kan påvirke computeren eller it-systemet, 
>>> hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi 
>>> påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse 
>>> med at modtage og bruge e-mailen.
>>> 
>>> Please note that this message may contain confidential information. If you 
>>> have received this message by mistake, please inform the sender of the 
>>> mistake by sending a reply, then delete the message from your system 
>>> without making, distributing or retaining any copies of it. Although we 
>>> believe that the message and any attachments are free from viruses and 
>>> other errors that might affect the computer or it-system where it is 
>>> received and read, the recipient opens the message at his or her own risk. 
>>> We assume no responsibility for any loss or damage arising from the receipt 
>>> or use of this message.

Re: Strange disk size behavior

2019-03-21 Thread Erick Erickson

99% sure it’s background merging. When two segments are merged, the combined 
segment is written and only after it’s successful will the old segments be 
deleted.

Restarting will stop any ongoing merging and delete any un-referenced segments. 
I expect you’ll see the space come back as you start indexing again.

For this reason, it’s required that _at least_ as much free space exists on the 
disk as the index occupies, and maybe 2x.

IOW, this is normal operation at this point. Especially if you optimize (which 
I recommend against) then I practically guarantee that you’ll need as much free 
space on your disk as the index size.

Best,
Erick

> On Mar 21, 2019, at 9:44 AM, SOLR4189  wrote:
> 
> Hi all. 
> We use SOLR-6.5.1 and in our cluster each solr core is placed in different
> virtual machine (one core per one node). Each virtual machine has 104 Gb
> size of disk.  
> Yesterday we marked that several solr cores use disk space in the abnormal
> manner.
> In running command *"df -h
> /opt/solr/CollectionName_shardX_replicaY/data/index"* we saw that 92GB of
> disk is occupied, but size of index in this machine is 62GB by solr cloud
> (also by command *"ls -l
> /opt/solr/CollectionName_shardX_replicaY/data/index"*). After restart solr
> service, df -h also reports 62GB occupied place in disk.
> 
> Does somebody know what is it?
> Can it be somehow connected to our deletes? (we run each night delete by
> query command for deleting expired documents)?
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Strange disk size behavior

2019-03-21 Thread SOLR4189

Hi all. 
We use SOLR-6.5.1 and in our cluster each solr core is placed in different
virtual machine (one core per one node). Each virtual machine has 104 Gb
size of disk.  
Yesterday we marked that several solr cores use disk space in the abnormal
manner.
In running command *"df -h
/opt/solr/CollectionName_shardX_replicaY/data/index"* we saw that 92GB of
disk is occupied, but size of index in this machine is 62GB by solr cloud
(also by command *"ls -l
/opt/solr/CollectionName_shardX_replicaY/data/index"*). After restart solr
service, df -h also reports 62GB occupied place in disk.

Does somebody know what is it?
Can it be somehow connected to our deletes? (we run each night delete by
query command for deleting expired documents)?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: highlighter, stored documents and performance

2019-03-21 Thread Jörn Franke

Hi,

Then you have to go for the full documents. I recommend to reduce then the 
returned results, use paging (if it is a web ui) and split the documents on 
several nodes (if the previous measures do not turn out to be successful).

Best regards 

> Am 21.03.2019 um 17:15 schrieb Martin Frank Hansen (MHQ) :
> 
> Hi Jörn,
> 
> Thanks for your answer.
> 
> Unfortunately, there is no summary included in the documents  and I would 
> like it to work for all documents.
> 
> Best regards
> 
> Martin
> 
> 
> Internal - KMD A/S
> 
> -Original Message-
> From: Jörn Franke 
> Sent: 21. marts 2019 17:11
> To: solr-user@lucene.apache.org
> Subject: Re: highlighter, stored documents and performance
> 
> I don’t think so - to highlight any possible query you need the full document.
> 
> You could optimize it by only storing a subset of the document and highlight 
> only in this subset.
> 
> Alternatively you can store a summary and show only the summary without 
> highlighting.
> 
>> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
>> 
>> Hi,
>> 
>> I am wondering how performance highlighting in Solr performs when the number 
>> of documents get large?
>> 
>> Right now we have about 1 TB of data in all sorts of file types and I was 
>> wondering how storing these documents within Solr (for highlighting purpose) 
>> will affect performance?
>> 
>> Is it possible to use highlighting without storing the documents?
>> 
>> Best regards
>> 
>> Martin
>> 
>> 
>> 
>> 
>> Internal - KMD A/S
>> 
>> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
>> KMD’s Privatlivspolitik, der fortæller, 
>> hvordan vi behandler oplysninger om dig.
>> 
>> Protection of your personal data is important to us. Here you can read KMD’s 
>> Privacy Policy outlining how we process 
>> your personal data.
>> 
>> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
>> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst 
>> informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi 
>> dig slette e-mailen i dit system uden at videresende eller kopiere den. 
>> Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri 
>> for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori 
>> den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os 
>> ikke noget ansvar for tab og skade, som er opstået i forbindelse med at 
>> modtage og bruge e-mailen.
>> 
>> Please note that this message may contain confidential information. If you 
>> have received this message by mistake, please inform the sender of the 
>> mistake by sending a reply, then delete the message from your system without 
>> making, distributing or retaining any copies of it. Although we believe that 
>> the message and any attachments are free from viruses and other errors that 
>> might affect the computer or it-system where it is received and read, the 
>> recipient opens the message at his or her own risk. We assume no 
>> responsibility for any loss or damage arising from the receipt or use of 
>> this message.

RE: highlighter, stored documents and performance

2019-03-21 Thread Martin Frank Hansen (MHQ)

Hi Jörn,

Thanks for your answer.

Unfortunately, there is no summary included in the documents  and I would like 
it to work for all documents.

Best regards

Martin


Internal - KMD A/S

-Original Message-
From: Jörn Franke 
Sent: 21. marts 2019 17:11
To: solr-user@lucene.apache.org
Subject: Re: highlighter, stored documents and performance

I don’t think so - to highlight any possible query you need the full document.

You could optimize it by only storing a subset of the document and highlight 
only in this subset.

Alternatively you can store a summary and show only the summary without 
highlighting.

> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
>
> Hi,
>
> I am wondering how performance highlighting in Solr performs when the number 
> of documents get large?
>
> Right now we have about 1 TB of data in all sorts of file types and I was 
> wondering how storing these documents within Solr (for highlighting purpose) 
> will affect performance?
>
> Is it possible to use highlighting without storing the documents?
>
> Best regards
>
> Martin
>
>
>
>
> Internal - KMD A/S
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy outlining how we process 
> your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
>
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.

Re: highlighter, stored documents and performance

2019-03-21 Thread Jörn Franke

I don’t think so - to highlight any possible query you need the full document.

You could optimize it by only storing a subset of the document and highlight 
only in this subset.

Alternatively you can store a summary and show only the summary without 
highlighting. 

> Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) :
> 
> Hi,
> 
> I am wondering how performance highlighting in Solr performs when the number 
> of documents get large?
> 
> Right now we have about 1 TB of data in all sorts of file types and I was 
> wondering how storing these documents within Solr (for highlighting purpose) 
> will affect performance?
> 
> Is it possible to use highlighting without storing the documents?
> 
> Best regards
> 
> Martin
> 
> 
> 
> 
> Internal - KMD A/S
> 
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik, der fortæller, 
> hvordan vi behandler oplysninger om dig.
> 
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy outlining how we process 
> your personal data.
> 
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
> 
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.

highlighter, stored documents and performance

2019-03-21 Thread Martin Frank Hansen (MHQ)

Hi,

I am wondering how performance highlighting in Solr performs when the number of 
documents get large?

Right now we have about 1 TB of data in all sorts of file types and I was 
wondering how storing these documents within Solr (for highlighting purpose) 
will affect performance?

Is it possible to use highlighting without storing the documents?

Best regards

Martin




Internal - KMD A/S

Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
KMD’s Privatlivspolitik, der fortæller, 
hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s 
Privacy Policy outlining how we process your 
personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis 
du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og 
ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre 
fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og 
læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar 
for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have 
received this message by mistake, please inform the sender of the mistake by 
sending a reply, then delete the message from your system without making, 
distributing or retaining any copies of it. Although we believe that the 
message and any attachments are free from viruses and other errors that might 
affect the computer or it-system where it is received and read, the recipient 
opens the message at his or her own risk. We assume no responsibility for any 
loss or damage arising from the receipt or use of this message.

Migrate Solr Master To Cloud 7.5

2019-03-21 Thread IZaBEE_Keeper

Hi..

I have a large Solr 7.5 index over 150M docs and 800GB in a master slave
setup.. I need to migrate the core to a Solr Cloud instance with pull
replicas as the index will be exceeding the 2.2B doc limit for a single
core.. 

I found this..
http://lucene.472066.n3.nabble.com/Copy-existing-index-from-standalone-Solr-to-Solr-cloud-td4149920.html

  
It's a bit out dated but sounds like it might work..

Does anyone have any other advice/links for this type of migration? 

Right now I just need to convert the master to cloud before it gets much
bigger.. Re-indexing is an option but I would rather convert which is likely
much faster..

Thanks.. 



-
Bee Keeper at IZaBEE.com
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Delay searches till log replay finishes

2019-03-21 Thread Rahul Goswami

Eric,Shawn,

Apologies for the late update on this thread and thank you for your inputs.
My assumption about the number of segments increasing was out of incomplete
understanding of the TieredMergePolicy, but I get it now. Another concern
was slowing indexing rate due to constant merges. This is from reading the
documentation:
"Choosing the best merge factors is generally a trade-off of indexing speed
vs. searching speed. Having fewer segments in the index generally
accelerates searches, because there are fewer places to look. It also can
also result in fewer physical files on disk. But to keep the number of
segments low, merges will occur more often, which can add load to the
system and slow down updates to the index"

Taking your suggestions, we have reduced hard commit interval
(openSearcher=false) from 10 mins to 1 min to begin with. Also, our servers
are on Windows so that could also be a cause of the service getting killed
before being able to gracefully shutdown. The cascading effect is stale
results while tlogs are being played on startup. I understand that although
not foolproof, reducing the autoCommit interval should help mitigate the
problem and we'll continue to monitor this for now.

Thanks,
Rahul

On Fri, Mar 8, 2019 at 2:14 PM Erick Erickson 
wrote:

> (1) no, and Shawn’s comments are well taken.
>
> (2) bq.  is the number of segments would drastically increase
>
> Not true. First of all, TieredMergePolicy will take care of merging “like
> sized” segments for you. You’ll have the same number (or close) no matter
> how short the autocommit interval. Second, new segments are created
> whenever the internal indexing buffer is filled up, default 100M anyway so
> just because you have a long autocommit interval doesn’t say much about the
> number of segments that are created.
>
> This is really not something you should be concerned about, certainly not
> something you should accept other problems because. Solr runs quite well
> with 15 second autocommit and very high indexing rates, why do you think
> your situation is different? Do you have any evidence that would be a
> problem at all?
>
> Best,
> Erick
>
>
> > On Mar 8, 2019, at 11:05 AM, Shawn Heisey  wrote:
> >
> > On 3/8/2019 10:44 AM, Rahul Goswami wrote:
> >> 1) Is there currently a configuration setting in Solr that will trigger
> the
> >> first option you mentioned ? Which is to not serve any searches until
> tlogs
> >> are played. If not, since instances shutting down abruptly is not very
> >> uncommon, would a JIRA to implement this configuration be warranted?
> >
> > In what setup is an abrupt shutdown *expected*?  If that's really
> common, then your setup is, in my opinion, very broken.  It is our intent
> that abrupt death of the Solr process should be quite rare.  We do still
> have a problem on Windows where the wait for clean shutdown is only five
> seconds -- nowhere near enough.  The Windows script still needs a lot of
> work, but most of us are not adept at Windows scripting.
> >
> > There is an issue for the timeout interval in bin\solr.cmd on Windows:
> >
> > https://issues.apache.org/jira/browse/SOLR-9698
> >
> >> 2) We have a setup with moderate indexing rate and moderate search rate.
> >> Currently the auto commit interval is 10 mins. What should be a
> recommended
> >> hard commit interval for such a setup? Our concern with going too low on
> >> that autoCommit interval (with openSearcher=false) is the number of
> >> segments that would drastically increase, eventually causing
> merges,slower
> >> searches etc.
> >
> > Solr has shipped with a 15 second autoCommit, where openSearcher is set
> to false, for a while now.  This is a setting that works quite well.  As
> long as you're not opening a new searcher, commits are quite fast.  I
> personally would use 60 seconds, but 15 seconds does work well.  It is
> usually autoSoftCommit where you need to be concerned about short
> intervals, because a soft commit opens a searcher.
> >
> > Thanks,
> > Shawn
>
>

Re: Upgrading solarj from 6.5.1 to 8.0.0

2019-03-21 Thread Lahiru Jayasekera

Hi Jason,
Thanks for the response. I saw the method of setting credentials based on
individual request.
But I need to set the credentials at solrclient level. If you remember the
way to do it please let me know.

Thanks

On Thu, Mar 21, 2019 at 8:26 PM Jason Gerlowski 
wrote:

> You should be able to set credentials on individual requests with the
> SolrRequest.setBasicAuthCredentials() method.  That's the method
> suggested by the latest Solr ref guide at least:
>
> https://lucene.apache.org/solr/guide/7_7/basic-authentication-plugin.html#using-basic-auth-with-solrj
>
> There might be a way to set the credentials on the client itself, but
> I can't think of it at the moment.
>
> Hope that helps,
>
> Jason
>
> On Thu, Mar 21, 2019 at 2:34 AM Lahiru Jayasekera
>  wrote:
> >
> > Hi all,
> > I need help implementing the following code in solarj 8.0.0.
> >
> > private SolrClient server, adminServer;
> >
> > this.adminServer = new HttpSolrClient(SolrClientUrl);
> > this.server = new HttpSolrClient( SolrClientUrl + "/" +
> mapping.getCoreName() );
> > if (serverUserAuth) {
> >   HttpClientUtil.setBasicAuth(
> >   (DefaultHttpClient) ((HttpSolrClient) adminServer).getHttpClient(),
> >   serverUsername, serverPassword);
> >   HttpClientUtil.setBasicAuth(
> >   (DefaultHttpClient) ((HttpSolrClient) server).getHttpClient(),
> >   serverUsername, serverPassword);
> > }
> >
> >
> > I could get the solarClients as following
> >
> > this.adminServer = new HttpSolrClient.Builder(SolrClientUrl).build();
> > this.server = new HttpSolrClient.Builder( SolrClientUrl + "/" +
> > mapping.getCoreName() ).build();
> >
> > But i can't find a way to implement basic authentication. I think that it
> > can be done via SolrHttpClientBuilder.
> > Can you please help me to solve this?
> >
> > Thank and regards
> > Lahiru
> > --
> > Lahiru Jayasekara
> > Batch 15
> > Faculty of Information Technology
> > University of Moratuwa
> > 0716492170
>


-- 
Lahiru Jayasekara
Batch 15
Faculty of Information Technology
University of Moratuwa
0716492170

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Erick Erickson

Neil:

Yeah, the attachment-stripping is catches everyone first time, we’re so used to 
just adding anything we want to an e-mail…

I don’t know enough about the query parsing to answer off the top of my head. I 
do know one thing that’s changed is “Split on Whitespace” has changed from true 
to false by default, so it’d be interesting to add &sow=false to the query.

Beyond that, take a look at what &debug=query added to the URL returns. My 
guess is that it’ll be identical but it’s worth a look.

Sorry I can’t be more help here
Erick

> On Mar 21, 2019, at 1:11 AM, Hubert-Price, Neil  
> wrote:
> 
> Hello Erick,
> 
> This is the first time I've had reason to use the mailing list, so I wasn't 
> aware of the behaviour around attachments.  See below, links to the images 
> that I originally sent as attachments, both are screenshots from within 
> Eclipse MAT looking at a SOLR heap dump.
> 
> LargeQueryStructure.png - 
> https://drive.google.com/open?id=1SkRYav2iV6Z1znmzr4KKJzMcXzNF0_Wg 
> LargeNumberClauses.png - 
> https://drive.google.com/open?id=1CaySU2HzyvHsdbIW_n0190ofjPS3hAeN
> 
> The LargeQueryStructure image shows as single thread with retained set of 
> 4.8GB, with the biggest items being a BooleanWeight object of just over 1.8GB 
> and a BooleanQuery object of just under 1.8GB
> 
> The LargeNumberClauses image shows a drilldown into the BooleanQuery object, 
> where a subquery is taking around 0.9GB and contains a BooleanClause[524288] 
> array of clauses (not shown: each of these 524288 is actually a subquery with 
> multiple clauses).  The array is taking 0.6GB, and there is a second instance 
> of the same array in another subquery (also not shown).
> 
> 
> Since the last email we have had some success with a reconfiguration of the 
> fieldType that I referenced in my original email below.  Where it was 
> originally:
> 
>  positionIncrementGap="100">
>   
>   
>   
>   
>outputUnigrams="true"/>
>   
> 
> 
> We have now reconfigured to:
> 
>  positionIncrementGap="100">
>   
>   
>   
>   
>outputUnigrams="true"/>
>   
>   
>   
>   
>   
>maxTokenCount="8" consumeAllTokens="false" />
>outputUnigrams="true"/>
>   
> 
> 
> After the reconfiguration, the huge memory effect of the queries in Solr 7.1 
> is gone.  We could kill test instances of Solr with a single query in the 
> original configuration. After reconfiguration we can run multiple similar 
> queries in parallel, and the Solr process responds in 50-150ms with only 
> approx. 100MB added to the heap.
> 
> This may well be sufficient for our purposes, as I don't think end users will 
> notice the difference in practice & queries that were previously failing now 
> return normally.
> 
> However I am still curious as to how this performs so differently in Solr 4.6 
> - the performance in 4.6 without reconfiguration is very similar to Solr 7.1 
> after the reconfiguration.  It is almost as if something within Solr 4.6 is 
> causing it to behave as though the number of tokens is limited (although I 
> can see in the admin pages for Solr 4.6 that the query and index analyser 
> setup both have original config with maxShingleSize=30 setting).  Do you have 
> any thoughts about this?
> 
> 
> Many Thanks,
> Neil
> 
> On 20/03/2019, 16:13, "Erick Erickson"  wrote:
> 
>The Apache mail server aggressively strips attachments, so yours didn’t 
> come through. People often provide links to images stored somewhere else
> 
>As to why this is behaving this way, I’m pretty clueless. A _complete_ 
> shot in the dark is the query parsing changed its default for split on 
> whitespace from true to false, perhaps try specifying "&sow=true". Here’s 
> some background: 
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
> 
>I have no actual, you know, _knowledge_ that it’s related but it’d be 
> super-easy to try and might give a clue.
> 
>Best,
>Erick
> 
>> On Mar 20, 2019, at 2:00 AM, Hubert-Price, Neil  
>> wrote:
>> 
>> Hello All,
>> 
>> We have a recently upgraded system that went from Solr 4.6 to Solr 7.1 (used 
>> as part of an ecommerce application).  In the upgraded version we are seeing 
>> frequent issues with very high Solr memory usage for certain types of query, 
>> but the older 4.6 version does not produce the same response.
>> 
>> Having taken a heap dump and investigated, we can see instances of 
>> individual Solr threads where the retained set is 4GB to 5GB in size.  
>> Drilling into this we can see a particular subquery with over 500,000 
>> clauses.  Screenshots below are from Eclipse MAT viewing a heap dump from 
>> the SOLR process. Observations of the 4.6 version we can see memory 
>> increments of 100-200 MB for the same query, rather than 4-5 GB.
>> 
>> In both systems the index ha

Re: Upgrading solarj from 6.5.1 to 8.0.0

2019-03-21 Thread Jason Gerlowski

You should be able to set credentials on individual requests with the
SolrRequest.setBasicAuthCredentials() method.  That's the method
suggested by the latest Solr ref guide at least:
https://lucene.apache.org/solr/guide/7_7/basic-authentication-plugin.html#using-basic-auth-with-solrj

There might be a way to set the credentials on the client itself, but
I can't think of it at the moment.

Hope that helps,

Jason

On Thu, Mar 21, 2019 at 2:34 AM Lahiru Jayasekera
 wrote:
>
> Hi all,
> I need help implementing the following code in solarj 8.0.0.
>
> private SolrClient server, adminServer;
>
> this.adminServer = new HttpSolrClient(SolrClientUrl);
> this.server = new HttpSolrClient( SolrClientUrl + "/" + mapping.getCoreName() 
> );
> if (serverUserAuth) {
>   HttpClientUtil.setBasicAuth(
>   (DefaultHttpClient) ((HttpSolrClient) adminServer).getHttpClient(),
>   serverUsername, serverPassword);
>   HttpClientUtil.setBasicAuth(
>   (DefaultHttpClient) ((HttpSolrClient) server).getHttpClient(),
>   serverUsername, serverPassword);
> }
>
>
> I could get the solarClients as following
>
> this.adminServer = new HttpSolrClient.Builder(SolrClientUrl).build();
> this.server = new HttpSolrClient.Builder( SolrClientUrl + "/" +
> mapping.getCoreName() ).build();
>
> But i can't find a way to implement basic authentication. I think that it
> can be done via SolrHttpClientBuilder.
> Can you please help me to solve this?
>
> Thank and regards
> Lahiru
> --
> Lahiru Jayasekara
> Batch 15
> Faculty of Information Technology
> University of Moratuwa
> 0716492170

Re: CDCR one source multiple targets

2019-03-21 Thread Arnold Bronley

I see a similar question asked but no answers there too.
http://lucene.472066.n3.nabble.com/CDCR-Replication-from-one-source-to-multiple-targets-td4308717.html
OP there is using multiple cdcr request handlers but in my case I am using
multiple zkhost strings. It will be pretty limiting if we cannot use cdcr
for one source- multiple target cluster situation.
Can somebody please confirm whether this is even supported?

On Wed, Mar 20, 2019 at 1:12 PM Arnold Bronley 
wrote:

> Hi,
>
> is it possible to use CDCR with one source SolrCloud cluster and multiple
> target SolrCloud clusters? I tried to edit the zkHost setting in source
> cluster's solrconfig file by adding multiple comma separated values for
> target zkhosts for multuple target clusters. But the CDCR replication
> happens only to one of the zkhosts and not all. If this is not supported
> then how should I go about implementing something like this?
>
>

Re: Environmental Protection Agency: Stop Deforesting in Sri Lanka

2019-03-21 Thread solrlucene

I am from India will it help

On Thursday, March 21, 2019,  wrote:

> Hello there,
>
> I just signed the petition "Environmental Protection Agency: Stop
> Deforesting in Sri Lanka" and wanted to see if you could help by adding
> your name.
>
> Our goal is to reach 15,000 signatures and we need more support. You can
> read more and sign the petition here:
>
> http://chng.it/vY78rzGf8G
>
> Thanks!
> Janaka
>

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

2019-03-21 Thread Hubert-Price, Neil

Hello Erick,

This is the first time I've had reason to use the mailing list, so I wasn't 
aware of the behaviour around attachments.  See below, links to the images that 
I originally sent as attachments, both are screenshots from within Eclipse MAT 
looking at a SOLR heap dump.

LargeQueryStructure.png - 
https://drive.google.com/open?id=1SkRYav2iV6Z1znmzr4KKJzMcXzNF0_Wg 
LargeNumberClauses.png - 
https://drive.google.com/open?id=1CaySU2HzyvHsdbIW_n0190ofjPS3hAeN

The LargeQueryStructure image shows as single thread with retained set of 
4.8GB, with the biggest items being a BooleanWeight object of just over 1.8GB 
and a BooleanQuery object of just under 1.8GB

The LargeNumberClauses image shows a drilldown into the BooleanQuery object, 
where a subquery is taking around 0.9GB and contains a BooleanClause[524288] 
array of clauses (not shown: each of these 524288 is actually a subquery with 
multiple clauses).  The array is taking 0.6GB, and there is a second instance 
of the same array in another subquery (also not shown).

Since the last email we have had some success with a reconfiguration of the 
fieldType that I referenced in my original email below.  Where it was 
originally:

We have now reconfigured to:

After the reconfiguration, the huge memory effect of the queries in Solr 7.1 is 
gone.  We could kill test instances of Solr with a single query in the original 
configuration. After reconfiguration we can run multiple similar queries in 
parallel, and the Solr process responds in 50-150ms with only approx. 100MB 
added to the heap.

This may well be sufficient for our purposes, as I don't think end users will 
notice the difference in practice & queries that were previously failing now 
return normally.

However I am still curious as to how this performs so differently in Solr 4.6 - 
the performance in 4.6 without reconfiguration is very similar to Solr 7.1 
after the reconfiguration.  It is almost as if something within Solr 4.6 is 
causing it to behave as though the number of tokens is limited (although I can 
see in the admin pages for Solr 4.6 that the query and index analyser setup 
both have original config with maxShingleSize=30 setting).  Do you have any 
thoughts about this?

Many Thanks,
Neil

On 20/03/2019, 16:13, "Erick Erickson"  wrote:

The Apache mail server aggressively strips attachments, so yours didn’t 
come through. People often provide links to images stored somewhere else

As to why this is behaving this way, I’m pretty clueless. A _complete_ shot 
in the dark is the query parsing changed its default for split on whitespace 
from true to false, perhaps try specifying "&sow=true". Here’s some background: 
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

I have no actual, you know, _knowledge_ that it’s related but it’d be 
super-easy to try and might give a clue.

Best,
Erick

> On Mar 20, 2019, at 2:00 AM, Hubert-Price, Neil 
 wrote:
> 
> Hello All,
>  
> We have a recently upgraded system that went from Solr 4.6 to Solr 7.1 
(used as part of an ecommerce application).  In the upgraded version we are 
seeing frequent issues with very high Solr memory usage for certain types of 
query, but the older 4.6 version does not produce the same response.
>  
> Having taken a heap dump and investigated, we can see instances of 
individual Solr threads where the retained set is 4GB to 5GB in size.  Drilling 
into this we can see a particular subquery with over 500,000 clauses.  
Screenshots below are from Eclipse MAT viewing a heap dump from the SOLR 
process. Observations of the 4.6 version we can see memory increments of 
100-200 MB for the same query, rather than 4-5 GB.
>  
> In both systems the index has around 2 million documents, with average 
size around 8KB.
>  
>  
> 
>  
> 
>  
>  
> The subquery with a very large set of clauses relates to a particular 
field setup to use ShingleFilter (with maxShingleSize=30, and 
outputUnigrams=true). Schema.xml definitions for this field are:
>  
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
>  
> 
> 
> 
> 
>  
> The issue happens when the user search contains large numbers of tokens.  
In the example screenshots abo

Environmental Protection Agency: Stop Deforesting in Sri Lanka

2019-03-21 Thread bjchathuranga

Hello there,

I just signed the petition "Environmental Protection Agency: Stop
Deforesting in Sri Lanka" and wanted to see if you could help by adding
your name.

Our goal is to reach 15,000 signatures and we need more support. You can
read more and sign the petition here:

http://chng.it/vY78rzGf8G

Thanks!
Janaka

CDCR issues

HTML/JavaScript Query and Results Display Not Working

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

RE: is df needed for SolrCloud replication?

Re: Upgrading solarj from 6.5.1 to 8.0.0

Re: Migrate Solr Master To Cloud 7.5

Re: Gather Nodes Streaming

Re: highlighter, stored documents and performance

Re: Strange disk size behavior

Strange disk size behavior

Re: highlighter, stored documents and performance

RE: highlighter, stored documents and performance

Re: highlighter, stored documents and performance

highlighter, stored documents and performance

Migrate Solr Master To Cloud 7.5

Re: Delay searches till log replay finishes

Re: Upgrading solarj from 6.5.1 to 8.0.0

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

Re: Upgrading solarj from 6.5.1 to 8.0.0

Re: CDCR one source multiple targets

Re: Environmental Protection Agency: Stop Deforesting in Sri Lanka

Re: Use of ShingleFilter causing very large BooleanQuery structures in Solr 7.1

Environmental Protection Agency: Stop Deforesting in Sri Lanka

24 matches

Site Navigation

Mail list logo

Footer information