BinaryResponseWriter fetches unnecessary fields?

2018-01-19 Thread Wei
Hi all,


We observe that solr query time increases significantly with the number of
rows requested,  even all we retrieve for each document is just
fl=id,score.  Debugged a bit and see that most of the increased time was
spent in BinaryResponseWriter,  converting lucene document into
SolrDocument.


Inside convertLuceneDocToSolrDoc():


https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491
839e6a6b69/solr/core/src/java/org/apache/solr/response/
DocsStreamer.java#L182


   for (IndexableField f : doc.getFields())


I am a bit puzzled why we need to iterate through all the fields in the
document. Why can’t we just iterate through the requested fields in fl?
Specifically:



https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491
839e6a6b69/solr/core/src/java/org/apache/solr/response/
DocsStreamer.java#L156


if we change  sdoc = convertLuceneDocToSolrDoc(doc,
rctx.getSearcher().getSchema())  to


sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(),
fnames)


and just iterate through fnames in convertLuceneDocToSolrDoc(),  there is a
significant performance boost in our case, the query time increase from
rows=128 vs rows=500 is much smaller.  Am I missing something here?


Thanks,

Wei


Re: Strange Alias behavior

2018-01-19 Thread Webster Homer
It seems like a useful feature, especially for migrating from standalone to
solrcloud, at least if the precedence of alias to collection is defined and
enforced.

On Fri, Jan 19, 2018 at 5:01 PM, Shawn Heisey  wrote:

> On 1/19/2018 3:53 PM, Webster Homer wrote:
>
>> I created the alias with an existing collection name because our code base
>> which was created with stand alone solr was a pain to change. I did test
>> that the alias took precedence over the collection, when I did a search.
>>
>
> The ability to create aliases and collections with the same name is viewed
> as a bug by some, and probably will be removed in a future version.
>
> https://issues.apache.org/jira/browse/SOLR-11488
>
> It doesn't really make sense to have an alias with the same name as a
> collection, and the behavior is probably undefined.
>
> Thanks,
> Shawn
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Issue with solr.HTMLStripCharFilterFactory

2018-01-19 Thread Shawn Heisey

On 1/19/2018 11:56 AM, Fiz Ahmed wrote:

But When I Query in Solr Admin.. I am still getting the Search results with
Html Tags in it.


Search results will always contain the actual content that was indexed. 
Analysis only happens to indexed data and/or queries, not stored data.


This is how Solr and Lucene have *always* worked.  It's not new behavior.

To achieve what you want, you will either need to use an update 
processor, or you'll need to adjust your indexing program to make the 
changes before it sends the data to Solr.


If you choose the update processor route, there is a built-in processor 
that has the same behavior as the HTML filter you are using.  Note that 
if you use that update processor, you won't need the html filter in the 
analyzer for the affected fields, because the HTML will be gone before 
the analysis runs.


https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html

You can always write a custom processor if you wish.  A custom processor 
might be required if you want your stored data to undergo some very 
extensive transformation.


Here's the documentation on update processors:

https://lucene.apache.org/solr/guide/6_6/update-request-processors.html

Thanks,
Shawn


Re: Strange Alias behavior

2018-01-19 Thread Shawn Heisey

On 1/19/2018 3:53 PM, Webster Homer wrote:

I created the alias with an existing collection name because our code base
which was created with stand alone solr was a pain to change. I did test
that the alias took precedence over the collection, when I did a search.


The ability to create aliases and collections with the same name is 
viewed as a bug by some, and probably will be removed in a future version.


https://issues.apache.org/jira/browse/SOLR-11488

It doesn't really make sense to have an alias with the same name as a 
collection, and the behavior is probably undefined.


Thanks,
Shawn


Re: Strange Alias behavior

2018-01-19 Thread Webster Homer
I created the alias with an existing collection name because our code base
which was created with stand alone solr was a pain to change. I did test
that the alias took precedence over the collection, when I did a search.

On Fri, Jan 19, 2018 at 4:22 PM, Wenjie Zhang (Jack) <
wenjiezhang2...@gmail.com> wrote:

> Why would you create an alias with an existing collection name?
>
> Sent from my iPhone
>
> > On Jan 19, 2018, at 14:14, Webster Homer  wrote:
> >
> > I just discovered some odd behavior with aliases.
> >
> > We are in the process of converting over to use aliases in solrcloud. We
> > have a number of collections that applications have referenced the
> > collections from when we used standalone solr. So we created alias names
> to
> > match the name that the java applications already used.
> >
> > We still have collections that have the name of the alias.
> >
> > We also decided to create new aliases for use in our ETL process.
> > I have 3 collections that have the same configset which is named
> > b2b-catalog-material
> > collection 1: b2b-catalog-material
> > collection 2: b2b-catalog-material-180117
> > collection 3: b2b-catalog-material-180117T
> >
> > When the alias, b2b-catalog-material-etl is pointed at
> b2b-catalog-material
> > and the alias b2b-catalog-material is pointed to
> b2b-catalog-material-180117
> >
> > and we do a data load to b2b-catalog-material-etl
> >
> > We see data being added to both b2b-catalog-material and
> > b2b-catalog-material-180117
> >
> > when I delete the alias b2b-catalog-material then the data stopped
> loading
> > into the collection b2b-catalog-material-180117
> >
> >
> > So it seems that alias resolution is somewhat recursive. I'm surprised
> that
> > both collections were being updated.
> >
> > Is this the intended behavior for aliases? I don't remember seeing this
> > documented.
> > This was on a solrcloud running solr 7.2
> >
> > I haven't checked this in Solr 7.2 but when I created a new collection
> and
> > then pointed the alias to it and did a search no data was returned
> because
> > there was none to return. So this indicates to me that aliases behave
> > differently if we're writing to them or reading from them.
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Strange Alias behavior

2018-01-19 Thread Wenjie Zhang (Jack)
Why would you create an alias with an existing collection name?

Sent from my iPhone

> On Jan 19, 2018, at 14:14, Webster Homer  wrote:
> 
> I just discovered some odd behavior with aliases.
> 
> We are in the process of converting over to use aliases in solrcloud. We
> have a number of collections that applications have referenced the
> collections from when we used standalone solr. So we created alias names to
> match the name that the java applications already used.
> 
> We still have collections that have the name of the alias.
> 
> We also decided to create new aliases for use in our ETL process.
> I have 3 collections that have the same configset which is named
> b2b-catalog-material
> collection 1: b2b-catalog-material
> collection 2: b2b-catalog-material-180117
> collection 3: b2b-catalog-material-180117T
> 
> When the alias, b2b-catalog-material-etl is pointed at b2b-catalog-material
> and the alias b2b-catalog-material is pointed to b2b-catalog-material-180117
> 
> and we do a data load to b2b-catalog-material-etl
> 
> We see data being added to both b2b-catalog-material and
> b2b-catalog-material-180117
> 
> when I delete the alias b2b-catalog-material then the data stopped loading
> into the collection b2b-catalog-material-180117
> 
> 
> So it seems that alias resolution is somewhat recursive. I'm surprised that
> both collections were being updated.
> 
> Is this the intended behavior for aliases? I don't remember seeing this
> documented.
> This was on a solrcloud running solr 7.2
> 
> I haven't checked this in Solr 7.2 but when I created a new collection and
> then pointed the alias to it and did a search no data was returned because
> there was none to return. So this indicates to me that aliases behave
> differently if we're writing to them or reading from them.
> 
> -- 
> 
> 
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not the intended recipient, 
> you must not copy this message or attachment or disclose the contents to 
> any other person. If you have received this transmission in error, please 
> notify the sender immediately and delete the message and any attachment 
> from your system. Merck KGaA, Darmstadt, Germany and any of its 
> subsidiaries do not accept liability for any omissions or errors in this 
> message which may arise as a result of E-Mail-transmission or for damages 
> resulting from any unauthorized changes of the content of this message and 
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
> subsidiaries do not guarantee that this message is free of viruses and does 
> not accept liability for any damages caused by any virus transmitted 
> therewith.
> 
> Click http://www.emdgroup.com/disclaimer to access the German, French, 
> Spanish and Portuguese versions of this disclaimer.


Re: SOLR Data Backup

2018-01-19 Thread S G
Another option is to have CDCR enabled for Solr and replicate your data to
another Solr cluster continuously.

BTW, why do we not recommend having Solr as a source of truth?

On Thu, Jan 18, 2018 at 4:08 AM, Florian Gleixner  wrote:

> Am 18.01.2018 um 10:21 schrieb Wael Kader:
> > Hello,
> >
> > Whats the best way to do a backup of the SOLR data.
> > I have a single node solr server and I want to always keep a copy of the
> > data I have.
> >
> > Is replication an option for what I want ?
> >
> > I would like to get some tutorials and papers if possible on the method
> > that should be used in case its backup or replication or anything else.
> >
>
> The reference manual will help you:
>
>
> https://lucene.apache.org/solr/guide/6_6/making-and-
> restoring-backups.html#standalone-mode-backups
>
>


Re: Preserve order during indexing

2018-01-19 Thread Webster Homer
db order isn't generally defined, unless you are using an explicit "order
by" on your select. Default behavior would vary by database type and even
release of the database. You can index the fields that you would "order by"
in the db, and sort on those fields in solr

On Thu, Jan 18, 2018 at 10:17 PM, jagdish vasani 
wrote:

> Hi Ashish,
> I think it's not possible,solr creates  inverted index.. but you can get
> documents by sorting orders, give sort= asc/desc.
>
> Thanks,
> JagdishVasani
> On 19-Jan-2018 9:22 am, "Aashish Agarwal"  wrote:
>
> > Hi,
> >
> > I need to index documents in solr so that they are stored in same order
> as
> > present in database. i.e *:* gives result in db order. Is it possible.
> >
> > Thanks,
> > Aashish
> >
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Strange Alias behavior

2018-01-19 Thread Webster Homer
I just discovered some odd behavior with aliases.

We are in the process of converting over to use aliases in solrcloud. We
have a number of collections that applications have referenced the
collections from when we used standalone solr. So we created alias names to
match the name that the java applications already used.

We still have collections that have the name of the alias.

We also decided to create new aliases for use in our ETL process.
I have 3 collections that have the same configset which is named
b2b-catalog-material
collection 1: b2b-catalog-material
collection 2: b2b-catalog-material-180117
collection 3: b2b-catalog-material-180117T

When the alias, b2b-catalog-material-etl is pointed at b2b-catalog-material
and the alias b2b-catalog-material is pointed to b2b-catalog-material-180117

and we do a data load to b2b-catalog-material-etl

We see data being added to both b2b-catalog-material and
b2b-catalog-material-180117

when I delete the alias b2b-catalog-material then the data stopped loading
into the collection b2b-catalog-material-180117


So it seems that alias resolution is somewhat recursive. I'm surprised that
both collections were being updated.

Is this the intended behavior for aliases? I don't remember seeing this
documented.
This was on a solrcloud running solr 7.2

I haven't checked this in Solr 7.2 but when I created a new collection and
then pointed the alias to it and did a search no data was returned because
there was none to return. So this indicates to me that aliases behave
differently if we're writing to them or reading from them.

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Adding a child doc incrementally

2018-01-19 Thread S G
Restriction to a single shard seems like a big limitation for us.
Also, I was hoping that this was something Solr provided out of the box.
(Like
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates
)

Something like:

{
 "id":"parents-id",
 "price":{"set":99},
 "popularity":{"inc":20},
 "children": {"add": {child document(s)}}
}

or something like:
{
 "id":"child-id",
 "parentId": "parents-id"
 ... normal fields of the child ...
 "operationType": "add | delete"
}

In both the cases, Solr can just look at the parents' ID, route the
document to the correct shard and add the child to the parent to create the
full nested document (as in block join), that would be ideal.

Thanks
SG





On Wed, Jan 17, 2018 at 9:58 PM, Gus Heck  wrote:

> If the document routing can be arranged such that the children and the
> parent are always co-located in the same shard, and share an identifier,
> the graph query can pull back the parent plus any arbitrary number of
> "children" that have been added at any time in any order. In this scheme
> "children" are just things that match your graph query... (
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-
> GraphQueryParser)
> However, if your query has to cross shards, that won't work (yet...
> https://issues.apache.org/jira/browse/SOLR-11384).
>
> More info here:
> https://www.slideshare.net/lucidworks/solr-graph-query-
> presented-by-kevin-watters-kmw-technology
>
> On Mon, Jan 15, 2018 at 2:09 PM, S G  wrote:
>
> > Hi,
> >
> > We have a use-case where a single document can contain thousands of child
> > documents.
> > However, I could not find any way to do it incrementally.
> > Only way is to read the full document from Solr, add the new child
> document
> > to it and then re-index the full document will all of its child documents
> > again.
> > This causes lot of reads from Solr just to form the document with one
> extra
> > document.
> > Ideally, I would have liked to only send the parent-ID and the
> > child-document only as part of an "incremental update" command to Solr.
> >
> > Is there a way to incrementally add a child document to a parent
> document?
> >
> > Thanks
> > SG
> >
>
>
>
> --
> http://www.the111shift.com
>


RE: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Pouliot, Scott
Working on that now to see if it helps us out.  Solr process is NOT dying at 
all.  Searches are still working as expected, but since we load balance 
requestsif the master/slave are out of sync the search results vary.  

The advice is MUCH appreciated!

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, January 19, 2018 1:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Replication being flaky (6.2.0)

On 1/19/2018 11:27 AM, Shawn Heisey wrote:
> On 1/19/2018 8:54 AM, Pouliot, Scott wrote:
>> I do have a ticket in with our systems team to up the file handlers 
>> since I am seeing the "Too many files open" error on occasion on our 
>> prod servers.  Is this the setting you're referring to?  Found we 
>> were set to to 1024 using the "Ulimit" command.
> 
> No, but that often needs increasing too.  I think you need to increase 
> the process limit even if that's not the cause of this particular problem.

Had another thought.  Either of these limits can cause completely unpredictable 
problems with Solr.  The open file limit could be the reason for these issues, 
even if you're not actually hitting the process limit.  As I mentioned before, 
I would expect a process limit to cause Solr to kill itself, and your other 
messages don't mention problems like that.

The scale of your Solr installation indicates that you should greatly increase 
both limits on all of your Solr servers.

Thanks,
Shawn


Issue with solr.HTMLStripCharFilterFactory

2018-01-19 Thread Fiz Ahmed
Hi Solr Experts,

I am using the HTMLStripCharFilterFactory for removing  tags in Body
element.

Body contains data like Ipad

I made changes in managed schema .







---


 

  



















  

  



















  




I restarted the Solr and Indexed again.


But When I Query in Solr Admin.. I am still getting the Search results with
Html Tags in it.



"body":"Practically everytime I log onto Mogran, suddenly I see it
running


*Please let me know what will be the Issue…Am I Missing anything.*


Thanks

Fiz..


Re: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Shawn Heisey

On 1/19/2018 11:27 AM, Shawn Heisey wrote:

On 1/19/2018 8:54 AM, Pouliot, Scott wrote:
I do have a ticket in with our systems team to up the file handlers 
since I am seeing the "Too many files open" error on occasion on our 
prod servers.  Is this the setting you're referring to?  Found we were 
set to to 1024 using the "Ulimit" command.


No, but that often needs increasing too.  I think you need to increase 
the process limit even if that's not the cause of this particular problem.


Had another thought.  Either of these limits can cause completely 
unpredictable problems with Solr.  The open file limit could be the 
reason for these issues, even if you're not actually hitting the process 
limit.  As I mentioned before, I would expect a process limit to cause 
Solr to kill itself, and your other messages don't mention problems like 
that.


The scale of your Solr installation indicates that you should greatly 
increase both limits on all of your Solr servers.


Thanks,
Shawn


Re: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Shawn Heisey

On 1/19/2018 8:54 AM, Pouliot, Scott wrote:

I do have a ticket in with our systems team to up the file handlers since I am seeing the "Too 
many files open" error on occasion on our prod servers.  Is this the setting you're referring 
to?  Found we were set to to 1024 using the "Ulimit" command.


No, but that often needs increasing too.  I think you need to increase 
the process limit even if that's not the cause of this particular problem.


Sounds like you're running on Linux, though ulimit is probably available 
on other platforms too.


If it's Linux, generally you must increase both the number of processes 
and the open file limit in /etc/security/limits.conf.  Trying to use the 
ulimit command generally doesn't work because the kernel has hard limits 
configured that ulimit can't budge.  If it's not Linux, then you'll need 
to consult with an expert in the OS you're running.


Again, assuming Linux, in the output of "ulimit -a" the value I'm 
talking about is the "-u" value -- "max user processes".  The following 
is the additions that I typically make to /etc/security/limits.conf, to 
increase both the open file limit and the process limit for the solr user:


solr    hard    nproc   61440
solr    soft    nproc   40960

solr    hard    nofile  65535
solr    soft    nofile  49151

Are you running into problems where Solr just disappears?  I would 
expect a process limit to generate OutOfMemoryError exceptions. When 
Solr is started with the included shell script, unless it's running with 
the foreground option, OOME will kill the Solr process.  We have issues 
to bring the OOME death option to running in the foreground, as well as 
when running on Windows.


Thanks,
Shawn



RE: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Pouliot, Scott
That's evidence enough for me to beat on our systems guys to get these file 
handles upped and cross my fingers then!

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, January 19, 2018 1:18 PM
To: solr-user 
Subject: Re: Solr Replication being flaky (6.2.0)

"Could be", certainly. "Definitely is" is iffier ;)...

But the statement "If we restart the Solr service or optimize the core it seems 
to kick back in again.", especially the "optimize" bit (which, by the way you 
should do only if you have the capability of doing it periodically [1]) is some 
evidence that this may be in the vicinity. One of the effects of an optimize is 
to merge your segments files from N to 1. So say you have 10 segments. Each one 
of those may consist of 10-15 individual files, all of which are held open. So 
you'd go from 150 open file handles to 15..

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucidworks.com%2F2017%2F10%2F13%2Fsegment-merging-deleted-documents-optimize-may-bad%2F=02%7C01%7CScott.Pouliot%40peoplefluent.com%7Cc2912861f58248e3a92808d55f690eb8%7C8b16fb62c78448b6aba889567990e7fe%7C1%7C0%7C636519827178716698=DxnChrfyTbRDjB7HzqpOE%2BvOJRIxdnrXVCIyfoMjJPU%3D=0

Best,
Erick

On Fri, Jan 19, 2018 at 9:32 AM, Pouliot, Scott 
 wrote:
> Erick,
>
> Thanks!  Could these settings be toying with replication?  Solr itself seems 
> to be working like a champ, except when things get out of sync.
>
> Scott
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, January 19, 2018 12:27 PM
> To: solr-user 
> Subject: Re: Solr Replication being flaky (6.2.0)
>
> Scott:
>
> We usually recommend setting files and processes very, very high. Like 65K 
> high. Or unlimited if you can.
>
> Plus max user processes should also be bumped very high as well, like 65K as 
> well.
>
> Plus max memory and virtual memory should be unlimited.
>
> We've included warnings at startup for open files and processes, see 
> SOLR-11703
>
> Best,
> Erick
>
> On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott 
>  wrote:
>> I do have a ticket in with our systems team to up the file handlers since I 
>> am seeing the "Too many files open" error on occasion on our prod servers.  
>> Is this the setting you're referring to?  Found we were set to to 1024 using 
>> the "Ulimit" command.
>>
>> -Original Message-
>> From: Shawn Heisey [mailto:apa...@elyograg.org]
>> Sent: Friday, January 19, 2018 10:48 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Replication being flaky (6.2.0)
>>
>> On 1/19/2018 7:50 AM, Pouliot, Scott wrote:
>>> So we're running Solr in a Master/Slave configuration (1 of each) and it 
>>> seems that the replication stalls or stops functioning every now and again. 
>>>  If we restart the Solr service or optimize the core it seems to kick back 
>>> in again.
>>>
>>> Anyone have any idea what might be causing this?  We do have a good amount 
>>> of cores on each server (@150 or so), but I have heard reports of a LOT 
>>> more than that in use.
>>
>> Have you increased the number of processes that the user running Solr is 
>> allowed to start?  Most operating systems limit the number of 
>> threads/processes a user can start to a low value like 1024.  With 150 
>> cores, particularly with background tasks like replication configured, 
>> chances are that Solr is going to need to start a lot of threads.  This is 
>> an OS setting that a lot of Solr admins end up needing to increase.
>>
>> I ran into the process limit on my servers and I don't have anywhere near 
>> 150 cores.
>>
>> The fact that restarting Solr gets it working again (at least
>> temporarily) would fit with a process limit being the problem.  I'm not 
>> guaranteeing that this is the problem, only saying that it fits.
>>
>> Thanks,
>> Shawn


Re: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Erick Erickson
"Could be", certainly. "Definitely is" is iffier ;)...

But the statement "If we restart the Solr service or optimize the core
it seems to kick back in again.", especially the "optimize" bit
(which, by the way you should do only if you have the capability of
doing it periodically [1]) is some evidence that this may be in the
vicinity. One of the effects of an optimize is to merge your segments
files from N to 1. So say you have 10 segments. Each one of those may
consist of 10-15 individual files, all of which are held open. So
you'd go from 150 open file handles to 15..

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Fri, Jan 19, 2018 at 9:32 AM, Pouliot, Scott
 wrote:
> Erick,
>
> Thanks!  Could these settings be toying with replication?  Solr itself seems 
> to be working like a champ, except when things get out of sync.
>
> Scott
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, January 19, 2018 12:27 PM
> To: solr-user 
> Subject: Re: Solr Replication being flaky (6.2.0)
>
> Scott:
>
> We usually recommend setting files and processes very, very high. Like 65K 
> high. Or unlimited if you can.
>
> Plus max user processes should also be bumped very high as well, like 65K as 
> well.
>
> Plus max memory and virtual memory should be unlimited.
>
> We've included warnings at startup for open files and processes, see 
> SOLR-11703
>
> Best,
> Erick
>
> On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott 
>  wrote:
>> I do have a ticket in with our systems team to up the file handlers since I 
>> am seeing the "Too many files open" error on occasion on our prod servers.  
>> Is this the setting you're referring to?  Found we were set to to 1024 using 
>> the "Ulimit" command.
>>
>> -Original Message-
>> From: Shawn Heisey [mailto:apa...@elyograg.org]
>> Sent: Friday, January 19, 2018 10:48 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Replication being flaky (6.2.0)
>>
>> On 1/19/2018 7:50 AM, Pouliot, Scott wrote:
>>> So we're running Solr in a Master/Slave configuration (1 of each) and it 
>>> seems that the replication stalls or stops functioning every now and again. 
>>>  If we restart the Solr service or optimize the core it seems to kick back 
>>> in again.
>>>
>>> Anyone have any idea what might be causing this?  We do have a good amount 
>>> of cores on each server (@150 or so), but I have heard reports of a LOT 
>>> more than that in use.
>>
>> Have you increased the number of processes that the user running Solr is 
>> allowed to start?  Most operating systems limit the number of 
>> threads/processes a user can start to a low value like 1024.  With 150 
>> cores, particularly with background tasks like replication configured, 
>> chances are that Solr is going to need to start a lot of threads.  This is 
>> an OS setting that a lot of Solr admins end up needing to increase.
>>
>> I ran into the process limit on my servers and I don't have anywhere near 
>> 150 cores.
>>
>> The fact that restarting Solr gets it working again (at least
>> temporarily) would fit with a process limit being the problem.  I'm not 
>> guaranteeing that this is the problem, only saying that it fits.
>>
>> Thanks,
>> Shawn


RE: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Pouliot, Scott
Erick,

Thanks!  Could these settings be toying with replication?  Solr itself seems to 
be working like a champ, except when things get out of sync.

Scott

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, January 19, 2018 12:27 PM
To: solr-user 
Subject: Re: Solr Replication being flaky (6.2.0)

Scott:

We usually recommend setting files and processes very, very high. Like 65K 
high. Or unlimited if you can.

Plus max user processes should also be bumped very high as well, like 65K as 
well.

Plus max memory and virtual memory should be unlimited.

We've included warnings at startup for open files and processes, see SOLR-11703

Best,
Erick

On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott 
 wrote:
> I do have a ticket in with our systems team to up the file handlers since I 
> am seeing the "Too many files open" error on occasion on our prod servers.  
> Is this the setting you're referring to?  Found we were set to to 1024 using 
> the "Ulimit" command.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, January 19, 2018 10:48 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Replication being flaky (6.2.0)
>
> On 1/19/2018 7:50 AM, Pouliot, Scott wrote:
>> So we're running Solr in a Master/Slave configuration (1 of each) and it 
>> seems that the replication stalls or stops functioning every now and again.  
>> If we restart the Solr service or optimize the core it seems to kick back in 
>> again.
>>
>> Anyone have any idea what might be causing this?  We do have a good amount 
>> of cores on each server (@150 or so), but I have heard reports of a LOT more 
>> than that in use.
>
> Have you increased the number of processes that the user running Solr is 
> allowed to start?  Most operating systems limit the number of 
> threads/processes a user can start to a low value like 1024.  With 150 cores, 
> particularly with background tasks like replication configured, chances are 
> that Solr is going to need to start a lot of threads.  This is an OS setting 
> that a lot of Solr admins end up needing to increase.
>
> I ran into the process limit on my servers and I don't have anywhere near 150 
> cores.
>
> The fact that restarting Solr gets it working again (at least
> temporarily) would fit with a process limit being the problem.  I'm not 
> guaranteeing that this is the problem, only saying that it fits.
>
> Thanks,
> Shawn


Re: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Erick Erickson
Scott:

We usually recommend setting files and processes very, very high. Like
65K high. Or unlimited if you can.

Plus max user processes should also be bumped very high as well, like
65K as well.

Plus max memory and virtual memory should be unlimited.

We've included warnings at startup for open files and processes, see SOLR-11703

Best,
Erick

On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott
 wrote:
> I do have a ticket in with our systems team to up the file handlers since I 
> am seeing the "Too many files open" error on occasion on our prod servers.  
> Is this the setting you're referring to?  Found we were set to to 1024 using 
> the "Ulimit" command.
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, January 19, 2018 10:48 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Replication being flaky (6.2.0)
>
> On 1/19/2018 7:50 AM, Pouliot, Scott wrote:
>> So we're running Solr in a Master/Slave configuration (1 of each) and it 
>> seems that the replication stalls or stops functioning every now and again.  
>> If we restart the Solr service or optimize the core it seems to kick back in 
>> again.
>>
>> Anyone have any idea what might be causing this?  We do have a good amount 
>> of cores on each server (@150 or so), but I have heard reports of a LOT more 
>> than that in use.
>
> Have you increased the number of processes that the user running Solr is 
> allowed to start?  Most operating systems limit the number of 
> threads/processes a user can start to a low value like 1024.  With 150 cores, 
> particularly with background tasks like replication configured, chances are 
> that Solr is going to need to start a lot of threads.  This is an OS setting 
> that a lot of Solr admins end up needing to increase.
>
> I ran into the process limit on my servers and I don't have anywhere near 150 
> cores.
>
> The fact that restarting Solr gets it working again (at least
> temporarily) would fit with a process limit being the problem.  I'm not 
> guaranteeing that this is the problem, only saying that it fits.
>
> Thanks,
> Shawn


RE: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Pouliot, Scott
I do have a ticket in with our systems team to up the file handlers since I am 
seeing the "Too many files open" error on occasion on our prod servers.  Is 
this the setting you're referring to?  Found we were set to to 1024 using the 
"Ulimit" command.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Friday, January 19, 2018 10:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Replication being flaky (6.2.0)

On 1/19/2018 7:50 AM, Pouliot, Scott wrote:
> So we're running Solr in a Master/Slave configuration (1 of each) and it 
> seems that the replication stalls or stops functioning every now and again.  
> If we restart the Solr service or optimize the core it seems to kick back in 
> again.
> 
> Anyone have any idea what might be causing this?  We do have a good amount of 
> cores on each server (@150 or so), but I have heard reports of a LOT more 
> than that in use.

Have you increased the number of processes that the user running Solr is 
allowed to start?  Most operating systems limit the number of threads/processes 
a user can start to a low value like 1024.  With 150 cores, particularly with 
background tasks like replication configured, chances are that Solr is going to 
need to start a lot of threads.  This is an OS setting that a lot of Solr 
admins end up needing to increase.

I ran into the process limit on my servers and I don't have anywhere near 150 
cores.

The fact that restarting Solr gets it working again (at least
temporarily) would fit with a process limit being the problem.  I'm not 
guaranteeing that this is the problem, only saying that it fits.

Thanks,
Shawn


Re: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Shawn Heisey

On 1/19/2018 7:50 AM, Pouliot, Scott wrote:

So we're running Solr in a Master/Slave configuration (1 of each) and it seems 
that the replication stalls or stops functioning every now and again.  If we 
restart the Solr service or optimize the core it seems to kick back in again.

Anyone have any idea what might be causing this?  We do have a good amount of 
cores on each server (@150 or so), but I have heard reports of a LOT more than 
that in use.


Have you increased the number of processes that the user running Solr is 
allowed to start?  Most operating systems limit the number of 
threads/processes a user can start to a low value like 1024.  With 150 
cores, particularly with background tasks like replication configured, 
chances are that Solr is going to need to start a lot of threads.  This 
is an OS setting that a lot of Solr admins end up needing to increase.


I ran into the process limit on my servers and I don't have anywhere 
near 150 cores.


The fact that restarting Solr gets it working again (at least 
temporarily) would fit with a process limit being the problem.  I'm not 
guaranteeing that this is the problem, only saying that it fits.


Thanks,
Shawn


RE: Solr Replication being flaky (6.2.0)

2018-01-19 Thread Pouliot, Scott
I'm at the point now where I may end up writing a script to compare 
master/slave nightly...and trigger an optimize or solr restart if there are any 
differences.  Of course I have to check 150+ cores...but it could be done.  I'm 
just hoping I don't need to go that route

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com] 
Sent: Friday, January 19, 2018 10:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Replication being flaky (6.2.0)

This happens to me quite often as well.  Generally on the replication admin 
screen it will say its downloading a file, but be at 0 or a VERY small kb/sec.  
Then after a restart of the slave its back to downloading at 30 to
100 mg/sec.  Would be curious if there actually is a solution to this aside 
from checking every day if the core replicated.  Im on Solr 5.x by the way -Dave

On Fri, Jan 19, 2018 at 9:50 AM, Pouliot, Scott < 
scott.poul...@peoplefluent.com> wrote:

> So we're running Solr in a Master/Slave configuration (1 of each) and 
> it seems that the replication stalls or stops functioning every now 
> and again.  If we restart the Solr service or optimize the core it 
> seems to kick back in again.
>
> Anyone have any idea what might be causing this?  We do have a good 
> amount of cores on each server (@150 or so), but I have heard reports 
> of a LOT more than that in use.
>
> Here is our master config:
> 
> 
>   
>   startup
>   commit
>
>   
>   00:00:10
> 
> 
> 
> 1
> 
>   
>
> And our slave config:
> 
> 
>
>   
>name="masterUrl">http://server1:8080/solr/${https://na01.safelinks.pro
> tection.outlook.com/?url=solr.core.name=02%7C01%7CScott.Pouliot%4
> 0peoplefluent.com%7C8d43918dd95540a3a11708d55f523302%7C8b16fb62c78448b
> 6aba889567990e7fe%7C1%7C1%7C636519729029923349=Fes6G36gIMRyfahTI
> fftg0eUEVEiVK77B8KpuTr%2FJrA%3D=0}
> 
>
>   
>   00:00:45
> 
>   
>
>   
> 
>   solr-data-config.xml
> 
>   
>


Re: Solr Replication being flaky (6.2.0)

2018-01-19 Thread David Hastings
This happens to me quite often as well.  Generally on the replication admin
screen it will say its downloading a file, but be at 0 or a VERY small
kb/sec.  Then after a restart of the slave its back to downloading at 30 to
100 mg/sec.  Would be curious if there actually is a solution to this aside
from checking every day if the core replicated.  Im on Solr 5.x by the way
-Dave

On Fri, Jan 19, 2018 at 9:50 AM, Pouliot, Scott <
scott.poul...@peoplefluent.com> wrote:

> So we're running Solr in a Master/Slave configuration (1 of each) and it
> seems that the replication stalls or stops functioning every now and
> again.  If we restart the Solr service or optimize the core it seems to
> kick back in again.
>
> Anyone have any idea what might be causing this?  We do have a good amount
> of cores on each server (@150 or so), but I have heard reports of a LOT
> more than that in use.
>
> Here is our master config:
> 
> 
>   
>   startup
>   commit
>
>   
>   00:00:10
> 
> 
> 
> 1
> 
>   
>
> And our slave config:
> 
> 
>
>   
>   http://server1:8080/solr/${solr.core.name}
> 
>
>   
>   00:00:45
> 
>   
>
>   
> 
>   solr-data-config.xml
> 
>   
>


Solr Replication being flaky (6.2.0)

2018-01-19 Thread Pouliot, Scott
So we're running Solr in a Master/Slave configuration (1 of each) and it seems 
that the replication stalls or stops functioning every now and again.  If we 
restart the Solr service or optimize the core it seems to kick back in again.

Anyone have any idea what might be causing this?  We do have a good amount of 
cores on each server (@150 or so), but I have heard reports of a LOT more than 
that in use.

Here is our master config:


  
  startup
  commit

  
  00:00:10



1

  

And our slave config:



  
  http://server1:8080/solr/${solr.core.name}

  
  00:00:45

  

  

  solr-data-config.xml