Re: Solr unable to start up after setting up SSL in Solr 7.4.0

2018-08-21 Thread Zheng Lin Edwin Yeo
Ok noted.
Thank you.

Regards,
Edwin

On Tue, 21 Aug 2018 at 21:40, Shawn Heisey  wrote:

> On 8/20/2018 9:57 PM, Zheng Lin Edwin Yeo wrote:
> > This is the error that I get:
> > ERROR StatusLogger Unable to access
> file:/C:/Users/edwin/Desktop/edwin/solr-
> >
> 7.4.0/server/scripts/cloud-scripts/file:C:/Users/edwin/Desktop/edwin/solr-7.
> > 4.0/server/scripts/cloud-scripts/log4j2.xml
> >   java.io.FileNotFoundException:
> C:\Users\edwin\Desktop\edwin\solr-7.4.0\serv
> >
> er\scripts\cloud-scripts\file:C:\Users\edwin\Desktop\edwin\solr-7.4.0\server
> > \scripts\cloud-scripts\log4j2.xml (The filename, directory name, or
> volume
> > label syntax is incorrect)
>
> We have a bug for this already:
>
> https://issues.apache.org/jira/browse/SOLR-12538
>
> Should be fine when 7.5.0 is released.
>
> Quick fix:  In solr.cmd change all occurrences of "file:" to "file:///".
>
> Something changed between log4j1 and log4j2 for file URI handling.  The
> problem only surfaces on Windows.
>
> Thanks,
> Shawn
>
>


Atomic Update Failure With solr.UUID Field

2018-08-21 Thread Stephen Lewis Bianamara
Hello SOLR Community,

I'm prototyping a collection on SOLR 6.6.3 with UUID fields, and I'm
hitting some trouble with atomic updates. At a high level, here's the
problem: suppose you have a schema with an optional field of type solr.UUID
field, and a document with a value for that field. Any atomic update on
that document which does not contain the UUID field will fail. Below I
provide an example and then an exact set of repro steps.

So for example, suppose I have the following doc: {"Id":1,
"SomeString":"woof", "MyUUID":"617c7768-7cc3-42d0-9ae1-74398bc5a3e7"}. If I
run an atomic update on it like {"Id":1,"SomeString":{"set":"meow"}}, it
will fail with message "TransactionLog doesn't know how to serialize class
java.util.UUID; try implementing ObjectResolver?"

Is this a known issue? Precise repro below. Thanks!

Exact repro
-
1. Define collection MyCollection with the following schema:


  




  
  Id
  


  

2. Create a document {"Id":1, "SomeString":"woof"} in the admin UI
(MyCollection > Documents > /update). The update succeeds and the doc is
searchable.
3. Apply the following atomic update. It succeeds. {"Id":1,
"SomeString":{"set":"bark"}}
4. Add a value for MyUUID (either with atomic update or regular). It
succeeds. {"Id":1,  "MyUUID":{"set":"617c7768-7cc3-42d0-9ae1-74398bc5a3e7"}}
5. Try to atomically update just the SomeString field. It fails. {"Id":1,
"SomeString":{"set":"meow"}}

The error that happens on failure is the following.

Status: 
{"data":{"responseHeader":{"status":500,"QTime":2},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"TransactionLog
doesn't know how to serialize class java.util.UUID; try implementing
ObjectResolver?","trace":"org.apache.solr.common.SolrException:
TransactionLog doesn't know how to serialize class java.util.UUID; try
implementing ObjectResolver?\r\n\tat
org.apache.solr.update.TransactionLog$1.resolve(TransactionLog.java:100)\r\n\tat
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:234)\r\n\tat
org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:589)\r\n\tat
org.apache.solr.update.TransactionLog.write(TransactionLog.java:395)\r\n\tat
org.apache.solr.update.UpdateLog.add(UpdateLog.java:532)\r\n\tat
org.apache.solr.update.UpdateLog.add(UpdateLog.java:516)\r\n\tat
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:320)\r\n\tat
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)\r\n\tat
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)\r\n\tat
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)\r\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)\r\n\tat
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)\r\n\tat
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)\r\n\tat
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)\r\n\tat
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)\r\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:506)\r\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:145)\r\n\tat
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:121)\r\n\tat
org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:84)\r\n\tat
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\r\n\tat
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\r\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:118

Re: Index Upgrader tool

2018-08-21 Thread Shawn Heisey

On 8/21/2018 2:29 AM, Artjoms Laivins wrote:

We are running Solr cloud with 3 nodes v. 6.6.2
We started with version 5 so we have some old index that we need safely move 
over to v. 7 now.
New data comes in several times per day.
Our questions are:

Should we run IndexUpgrader tool on one slave node that is down or it is safe 
to run it while Solr is running and possible updates of the index are coming?
If yes, when we start it again will leader update this node with new data only 
or will it overwrite index?


It might not be possible to upgrade two major versions like that, even 
with IndexUpgrader.  There is only a guarantee of reading an index 
ORIGINALLY written by the previous major version.


Even if it's possible to accomplish an upgrade, it is strongly 
recommended that you index from scratch anyway.


You cannot run IndexUpgrader while Solr has the index open.  The index 
must be completely closed.  You cannot update an index while it is being 
upgraded.


Thanks,
Shawn



Re: SOLR Issue

2018-08-21 Thread Shawn Heisey

On 8/21/2018 8:05 AM, Sandesh Ingawale wrote:

We have ecommerce base web application and we used SOLR for search.

Now we want to implement SOLR authentication and SOLR ranking feature 
with our application.


We need to do this programmatically(C#), can you please suggest what 
should be approach we have?


OR If you have any code snippets please send us. SOLR authentication 
needs to done with our application.


As well as please let us know if we need to do any configuration.



There is a C# client and a .NET client for Solr that are referenced in 
the documentation.  These are third party software, so you would need to 
discuss usage with those projects, not this one.


https://lucene.apache.org/solr/guide/7_4/client-api-lineup.html

I tried as per below link but after change in jetty.xml and restart 
the SOLR, it gives error.


https://support.ptc.com/help/windchill/wc111_hc/whc_en/index.html#page/Windchill_Help_Center/WCInstall_WCIndexSearchConfigStandalone_BasicAuthStandalone.html 





Those instructions are for a version of Solr that was not produced by 
the Solr project.  If you're using a version packaged by someone else, 
you'll need to talk to whoever did that packaging.  We have no idea how 
they might have changed things.


Here are the official instructions for enabling basic authentication in 
Solr:


https://lucene.apache.org/solr/guide/7_4/authentication-and-authorization-plugins.html

Thanks,
Shawn



Index Upgrader tool

2018-08-21 Thread Artjoms Laivins
Hello,

We are running Solr cloud with 3 nodes v. 6.6.2
We started with version 5 so we have some old index that we need safely move 
over to v. 7 now.
New data comes in several times per day.
Our questions are:

Should we run IndexUpgrader tool on one slave node that is down or it is safe 
to run it while Solr is running and possible updates of the index are coming?
If yes, when we start it again will leader update this node with new data only 
or will it overwrite index?


Best Regards,
Artjoms Laivins


SOLR Issue

2018-08-21 Thread Sandesh Ingawale
Hi,
We have ecommerce base web application and we used SOLR for search.
Now we want to implement SOLR authentication and SOLR ranking feature with our 
application.
We need to do this programmatically(C#), can you please suggest what should be 
approach we have?
OR If you have any code snippets please send us. SOLR authentication needs to 
done with our application.
As well as please let us know if we need to do any configuration.

I tried as per below link but after change in jetty.xml and restart the SOLR, 
it gives error.
https://support.ptc.com/help/windchill/wc111_hc/whc_en/index.html#page/Windchill_Help_Center/WCInstall_WCIndexSearchConfigStandalone_BasicAuthStandalone.html


Thanks,
Sandesh Ingawale
Specialist - eCommerce Product
Mobile: +91 9960025165
singaw...@hitachi-solutions.com
Hitachi Solutions
Inspire The Next!
dynamics.hitachi-solutions.com

[poty_logo-only_v2]

This e-mail is intended solely for the person or entity to which it is 
addressed and may contain confidential and/or privileged information. Any 
review, dissemination, copying, printing or other use of this e-mail by persons 
or entities other than the addressee is prohibited. If you have received this 
e-mail in error, please contact the sender immediately and delete this e-mail 
and any attachments from any device.


Re: 7.3.1: Query of death - all nodes ran out of memory and had to be shut down

2018-08-21 Thread Shawn Heisey

On 8/20/2018 9:55 PM, Ash Ramesh wrote:

We ran a bunch of deep paginated queries (offset of 1,000,000) with a
filter query. We set the timeout to 5 seconds and it did timeout. We aren't
sure if this is what caused the irrecoverable failure, but by reading this
-
https://lucene.apache.org/solr/guide/7_4/pagination-of-results.html#performance-problems-with-deep-paging
, we feel that this was the cause.


Yes, this is most likely the cause.

Since you have three shards, the problem is even worse than Erick 
described.  Those 110 results will be returned by EVERY shard, and 
consolidated on the machine that's actually making the query.  So it 
will have three million results in memory that it must sort.


Unless you're running on Windows, the bin/solr script will configure 
Java to kill itself when OutOfMemoryError occurs.  It does this because 
program behavior after OOME occurs is completely unpredictable, so 
there's a good chance that if it keeps running, it will corrupt the index.


If you're going to be doing queries like this, you need a larger heap.  
There's no way around that.


Thanks,
Shawn



Re: not range query in block join

2018-08-21 Thread Erick Erickson
pure not queries are spottily supported and can work (or not)
depending on where they're used, they often need to be translated from

-some_clause
to
*:* -some_clause


Solr/Lucene do not implement pure boolean logic, which often throws
people. See: https://lucidworks.com/2011/12/28/why-not-and-or-and-not/

Best,
Erick

On Tue, Aug 21, 2018 at 9:13 AM, Novin Novin  wrote:
> Hi Guys,
>
> I was try to do block join query with "not". I got not success, can anybody
> please help me out here.
>
> This works   q=+_query_:"{!parent which=type_s:parent}
> +time_tdt:[2018-08-01T16:00:00Z
> TO 2018-08-04T15:59:59Z]"
> This works q=-time_tdt:[2018-08-01T16:00:00Z TO 2018-08-04T15:59:59Z]
>
> This does not work q=+_query_:"{!parent which=type_s:parent}
> -time_tdt:[2018-08-01T16:00:00Z
> TO 2018-08-04T15:59:59Z]"
>
> Did I missed something?
>
> Thanks in advanced.
> Bests,
> Novin


not range query in block join

2018-08-21 Thread Novin Novin
Hi Guys,

I was try to do block join query with "not". I got not success, can anybody
please help me out here.

This works   q=+_query_:"{!parent which=type_s:parent}
+time_tdt:[2018-08-01T16:00:00Z
TO 2018-08-04T15:59:59Z]"
This works q=-time_tdt:[2018-08-01T16:00:00Z TO 2018-08-04T15:59:59Z]

This does not work q=+_query_:"{!parent which=type_s:parent}
-time_tdt:[2018-08-01T16:00:00Z
TO 2018-08-04T15:59:59Z]"

Did I missed something?

Thanks in advanced.
Bests,
Novin


Re: 7.3.1: Query of death - all nodes ran out of memory and had to be shut down

2018-08-21 Thread Erick Erickson
bq. I meant to ask whether there is a high probability that that could
be the correlated cause for the issue.

Yes, I do tend to be pedantic on occasion, a personal failing ;)


bq. Do you know why Solr itself isn't able to recover or is that to be
expected with allowing such deep pagination.

The general problem isn't deep paging itself, but the fact that in
your case it generates an OOM. After an OOM there is no way to
recover; the state of the program is unknown. This is pretty much true
of all Java programs, which is why there's an OOM-killer script you
can configure. That won't help your situation since you'll probably
dive right back into an OOM, but at least it doesn't continue to try
to work with a program that's in an unknown state.

Best,
Erick

On Tue, Aug 21, 2018 at 2:08 AM, Ere Maijala  wrote:
> Hi,
>
> Just my short comment here. It's difficult to say for someone else, but we
> identified deep paging as the definite reason for running out of memory or
> at least grinding to semi-halt because of long stop-the-world garbage
> collection pauses in an application running on a similar SolrCloud. You can
> often get away without issues as long as you only have a single shard, but
> for the reason Erick mentioned deep paging in a sharded index is a heavy
> operation.
>
> Regards,
> Ere
>
> Ash Ramesh kirjoitti 21.8.2018 klo 8.09:
>>
>> Hi Erick,
>>
>> Sorry I phrased that the wrong way. I meant to ask whether there is a high
>> probability that that could be the correlated cause for the issue. Do you
>> know why Solr itself isn't able to recover or is that to be expected with
>> allowing such deep pagination. We are going to be removing it going
>> forwards, but want to make sure that we find the root cause.
>>
>> Appreciate your help as always :)
>>
>> Ash
>>
>> On Tue, Aug 21, 2018 at 2:59 PM Erick Erickson 
>> wrote:
>>
>>> Did the large offsets _definitely_ cause the OOM? How do you expect
>>> that to be answerable? It's likely though. To return rows 1,000,000
>>> through 1,000,010 the system has to keep a list of 1,000,010 top
>>> documents. It has to be this way because you don't know (and can't
>>> guess) the score or a doc prior to, well, scoring it. And these very
>>> large structures are kept for every query being processed. Not only
>>> will that chew up memory, it'll chew up CPU cycles as well as this an
>>> ordered list.
>>>
>>> This is an anti-pattern, cursors were invented because this pattern is
>>> very costly (as you're finding out).
>>>
>>> Further, 4G isn't very much memory by modern standards.
>>>
>>> So it's very likely (but not guaranteed) that using cursors will fix
>>> this problem.
>>>
>>> Best,
>>> Erick
>>>
>>>
>>>
>>> On Mon, Aug 20, 2018 at 8:55 PM, Ash Ramesh  wrote:

 Hi everyone,

 We ran into an issue yesterday where all our ec2 machines, running solr,
 ran out of memory and could not heal themselves. I'll try break down
 what
 happened here.

 *System Architecture:*

 - Solr Version: 7.3.1
 - Replica Types: TLOG/PULL
 - Num Shards: 8 (default hashing mechanism)
 - Doc Count: > 20m
 - Index Size: 17G
 - EC2 Machine Spec: 16 Core | 32G ram | 100G SSD
 - Num EC2 Machines: 7+ (scales up and down)
 - Max Shards per node (one node per EC2 instance): 8 (some nodes had 4,
 some had 8)
 - Num TLOG shard replicas: 3 (3 copies of each shard as TLOG)
 - Num PULL shard replicas: 3+
 - Heap: 4G

 *What was run prior to the issue:*

 We ran these queries around 2.55pm

 We ran a bunch of deep paginated queries (offset of 1,000,000) with a
 filter query. We set the timeout to 5 seconds and it did timeout. We
>>>
>>> aren't

 sure if this is what caused the irrecoverable failure, but by reading
>>>
>>> this

 -

>>>
>>> https://lucene.apache.org/solr/guide/7_4/pagination-of-results.html#performance-problems-with-deep-paging

 , we feel that this was the cause.

 We did not use a cursor.

 This cluster was healthy for about 1 week, but we noticed the
 degradation
 soon after (within 30min) of running the offset queries mentioned above.
>>>
>>> We

 currently use a single sharded collection in production, however are
 transitioning to an 8 shard cluster. We hit this issue in a controlled 8
 sharded environment, but don't notice any issues on our production
>>>
>>> (single

 sharded) cluster. On production the query still timed out (with same num
 docs etc.) but didn't go into a crazy state.

 *What Happened:*

 - All the EC2 instances started logging OOM error. None of the nodes
 were
 responsive to new requests.
 - We saw that the Heap usage jumped from an average of 2.7G to the max
 of
 4G within a 5 minute window.
 - CPU across all 16 cores was at 100%
 - We saw that the distributed requests were timing out across all
>>>
>>> machines.

>>>

Re: Handshake for NRT?

2018-08-21 Thread Walter Underwood
The updates are fairly frequent (a few per minute) and have a tight freshness 
requirement.
We really don’t want to show tutors who are not available. Luckily, it is a 
smallish
collection, a few hundred thousand.

The traffic isn’t a problem and the cluster is working very well. This is about
understanding our metrics.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 21, 2018, at 8:46 AM, Erick Erickson  wrote:
> 
> A couple of notes:
> 
> TLOG replicas will have the same issue. When I said that leaders
> forwarded to followers, what that's really about is that the follower
> guarantees that the docs have been written to the TLOG. So if you
> change your model to use TLOG replicas, don't expect a change.
> 
> PULL replicas, OTOH, only pull down changed segments, but you're
> replace the raw document forwarding with segment copying so it's not
> clear to me how that would change the number of messages flying
> around.
> 
> bq. All our updates are single documents. We need to track the
> availability of online tutors, so we don’t batch them.
> 
> I'm inferring that this means the updates aren't all that frequent and
> if you waited for, say, 100 changes you might wait a long time. FYI,
> here is the result of some experimentation I did for the difference
> between various batch sizes:
> https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/, may
> not apply if your tutors come and go slowly of course.
> 
> Best,
> Erick
> 
> On Tue, Aug 21, 2018 at 8:02 AM, Walter Underwood  
> wrote:
>> Thanks, that is exactly what I was curious about.
>> 
>> All our updates are single documents. We need to track the availability of 
>> online
>> tutors, so we don’t batch them.
>> 
>> Right now, we have a replication factor of 36 (way too many), so that means 
>> each
>> update means 3 x 35 internal communications. Basically, a 100X update 
>> amplification
>> for our cluster.
>> 
>> We’ll be reducing the cluster to four hosts as soon as we get out of the 
>> current
>> blackout on prod changes.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Aug 20, 2018, at 10:05 PM, Erick Erickson  
>>> wrote:
>>> 
>>> Walter:
>>> 
>>> Each update is roughly
>>> 
>>> request goes to leader (may be forwarded)
>>> 
>>> leader sends the update to _each_ replica. depending on how many docs
>>> you're sending per update request this may be more than one request.
>>> IIRC there was some JIRA a while ago where the forwarding wasn't all
>>> that efficient, but that's going from (shaky) memoryh.
>>> 
>>> each follower acks back to the leader
>>> 
>>> leader acks back to the client.
>>> 
>>> So perhaps you're seeing the individual forwards to followers? Your
>>> logs should show update requests with FROMLEADER for these
>>> sub-requests (updates and queries). Does that help?
>>> 
>>> Erick
>>> 
>>> 
>>> 
>>> On Mon, Aug 20, 2018 at 8:03 PM, Walter Underwood  
>>> wrote:
 I’m comparing request counts from New Relic, which is reporting 16 krpm 
 aggregate
 requests across the cluster, and the AWS load balancer is reporting 1 
 krpm. Or it might
 be 1k requests per 5 minutes because CloudWatch is like that.
 
 This is a 36 node cluster, not sharded. We are going to shrink it, but I’d 
 like to understand it.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
> On Aug 20, 2018, at 7:02 PM, Shalin Shekhar Mangar 
>  wrote:
> 
> There are a single persistent HTTP connection open from the leader to each
> replica in the shard. All updates coming to the leader are expanded (for
> atomic updates) and streamed over that single connection. When using
> in-place docvalues updates, there is a possibility of the replica making a
> request to the leader if updates has been re-ordered and the replica does
> not have enough context to process the update.
> 
> Can you quantify the "tons of internal traffic"? Are you seeing higher
> number of open connections as well?
> 
> On Fri, Aug 17, 2018 at 11:17 PM Walter Underwood 
> wrote:
> 
>> How many messages are sent back and forth between a leader and replica
>> with NRT?
>> 
>> We have a collection that gets frequent updates and we are seeing a ton 
>> of
>> internal
>> cluster traffic.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
> 
> --
> Regards,
> Shalin Shekhar Mangar.
 
>> 



Re: Handshake for NRT?

2018-08-21 Thread Erick Erickson
A couple of notes:

TLOG replicas will have the same issue. When I said that leaders
forwarded to followers, what that's really about is that the follower
guarantees that the docs have been written to the TLOG. So if you
change your model to use TLOG replicas, don't expect a change.

PULL replicas, OTOH, only pull down changed segments, but you're
replace the raw document forwarding with segment copying so it's not
clear to me how that would change the number of messages flying
around.

bq. All our updates are single documents. We need to track the
availability of online tutors, so we don’t batch them.

I'm inferring that this means the updates aren't all that frequent and
if you waited for, say, 100 changes you might wait a long time. FYI,
here is the result of some experimentation I did for the difference
between various batch sizes:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/, may
not apply if your tutors come and go slowly of course.

Best,
Erick

On Tue, Aug 21, 2018 at 8:02 AM, Walter Underwood  wrote:
> Thanks, that is exactly what I was curious about.
>
> All our updates are single documents. We need to track the availability of 
> online
> tutors, so we don’t batch them.
>
> Right now, we have a replication factor of 36 (way too many), so that means 
> each
> update means 3 x 35 internal communications. Basically, a 100X update 
> amplification
> for our cluster.
>
> We’ll be reducing the cluster to four hosts as soon as we get out of the 
> current
> blackout on prod changes.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Aug 20, 2018, at 10:05 PM, Erick Erickson  wrote:
>>
>> Walter:
>>
>> Each update is roughly
>>
>> request goes to leader (may be forwarded)
>>
>> leader sends the update to _each_ replica. depending on how many docs
>> you're sending per update request this may be more than one request.
>> IIRC there was some JIRA a while ago where the forwarding wasn't all
>> that efficient, but that's going from (shaky) memoryh.
>>
>> each follower acks back to the leader
>>
>> leader acks back to the client.
>>
>> So perhaps you're seeing the individual forwards to followers? Your
>> logs should show update requests with FROMLEADER for these
>> sub-requests (updates and queries). Does that help?
>>
>> Erick
>>
>>
>>
>> On Mon, Aug 20, 2018 at 8:03 PM, Walter Underwood  
>> wrote:
>>> I’m comparing request counts from New Relic, which is reporting 16 krpm 
>>> aggregate
>>> requests across the cluster, and the AWS load balancer is reporting 1 krpm. 
>>> Or it might
>>> be 1k requests per 5 minutes because CloudWatch is like that.
>>>
>>> This is a 36 node cluster, not sharded. We are going to shrink it, but I’d 
>>> like to understand it.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
 On Aug 20, 2018, at 7:02 PM, Shalin Shekhar Mangar 
  wrote:

 There are a single persistent HTTP connection open from the leader to each
 replica in the shard. All updates coming to the leader are expanded (for
 atomic updates) and streamed over that single connection. When using
 in-place docvalues updates, there is a possibility of the replica making a
 request to the leader if updates has been re-ordered and the replica does
 not have enough context to process the update.

 Can you quantify the "tons of internal traffic"? Are you seeing higher
 number of open connections as well?

 On Fri, Aug 17, 2018 at 11:17 PM Walter Underwood 
 wrote:

> How many messages are sent back and forth between a leader and replica
> with NRT?
>
> We have a collection that gets frequent updates and we are seeing a ton of
> internal
> cluster traffic.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>

 --
 Regards,
 Shalin Shekhar Mangar.
>>>
>


Re: Handshake for NRT?

2018-08-21 Thread Walter Underwood
Thanks, that is exactly what I was curious about.

All our updates are single documents. We need to track the availability of 
online
tutors, so we don’t batch them.

Right now, we have a replication factor of 36 (way too many), so that means each
update means 3 x 35 internal communications. Basically, a 100X update 
amplification
for our cluster.

We’ll be reducing the cluster to four hosts as soon as we get out of the current
blackout on prod changes.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 20, 2018, at 10:05 PM, Erick Erickson  wrote:
> 
> Walter:
> 
> Each update is roughly
> 
> request goes to leader (may be forwarded)
> 
> leader sends the update to _each_ replica. depending on how many docs
> you're sending per update request this may be more than one request.
> IIRC there was some JIRA a while ago where the forwarding wasn't all
> that efficient, but that's going from (shaky) memoryh.
> 
> each follower acks back to the leader
> 
> leader acks back to the client.
> 
> So perhaps you're seeing the individual forwards to followers? Your
> logs should show update requests with FROMLEADER for these
> sub-requests (updates and queries). Does that help?
> 
> Erick
> 
> 
> 
> On Mon, Aug 20, 2018 at 8:03 PM, Walter Underwood  
> wrote:
>> I’m comparing request counts from New Relic, which is reporting 16 krpm 
>> aggregate
>> requests across the cluster, and the AWS load balancer is reporting 1 krpm. 
>> Or it might
>> be 1k requests per 5 minutes because CloudWatch is like that.
>> 
>> This is a 36 node cluster, not sharded. We are going to shrink it, but I’d 
>> like to understand it.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Aug 20, 2018, at 7:02 PM, Shalin Shekhar Mangar  
>>> wrote:
>>> 
>>> There are a single persistent HTTP connection open from the leader to each
>>> replica in the shard. All updates coming to the leader are expanded (for
>>> atomic updates) and streamed over that single connection. When using
>>> in-place docvalues updates, there is a possibility of the replica making a
>>> request to the leader if updates has been re-ordered and the replica does
>>> not have enough context to process the update.
>>> 
>>> Can you quantify the "tons of internal traffic"? Are you seeing higher
>>> number of open connections as well?
>>> 
>>> On Fri, Aug 17, 2018 at 11:17 PM Walter Underwood 
>>> wrote:
>>> 
 How many messages are sent back and forth between a leader and replica
 with NRT?
 
 We have a collection that gets frequent updates and we are seeing a ton of
 internal
 cluster traffic.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 
>>> 
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>> 



Re: Dataimport not working on solrcloud

2018-08-21 Thread Shawn Heisey

On 8/20/2018 10:00 PM, Sushant Vengurlekar wrote:

I have a dataimport working on standalone solr instance but the same
doesn't work on solrcloud. I keep on hitting this error

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception in invoking url


There will be more to this error than what you've shared. Look in 
solr.log, and share all the ERROR/WARN entries from the correct 
timeframe.  Some of them can be quite long.  We will need *all* of that 
information.  Will also need the exact Solr version.



The url is returning well formed xml. I have verified that. The solr nodes
can fully resolve this url. I checked that out. I have the following params
set in xml-import.xml

connectionTimeout="50" readTimeout="5000"


We'll need to see the full dataimport config and the handler config from 
solrconfig.xml.


Thanks,
Shawn



Haystack, the search relevance conference comes to London on October 2nd 2018

2018-08-21 Thread Charlie Hull

Hi all,

We're very happy to announce the first Haystack Europe conference in 
London on October 2nd.


https://opensourceconnections.com/events/haystack-europe-2018/

Come and hear talks by Doug Turnbull, co-author of Relevant Search, 
Karen Renshaw, Head of Search and Content for Grainger Global Online and 
other relevance experts, plus the usual networking and knowledge sharing.


Hope to meet some of you there!

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: SOLRJ 7.x library fails ejb deployment with weblogic

2018-08-21 Thread Shawn Heisey

On 8/21/2018 7:29 AM, Ganesh Kumar J wrote:

Thank you so much for your reply. Unfortunately our production weblogic cluster 
runs in java 1.7 we are unable to upgrade to java 1.8

In this case do you have any idea how we can use solrj version below 7.x which 
can authenticate with kerberozied solr cluster. Since keberozied authentication 
classes are available only from solrj7.x


Create a custom CloseableHttpClient (Apache HttpComponents project) 
instance that is set up with the authentication you require and any 
other customizations you want (timeouts, thread limits, etc).  Use that 
custom HttpClient when building the solr client.  You'll need to talk to 
that project for help with this -- it's a separate project from Solr.


Your stacktrace did not mention any Solr classes.  So you might need to 
talk to whoever created weblogic.


Thanks,
Shawn



Re: Solr unable to start up after setting up SSL in Solr 7.4.0

2018-08-21 Thread Shawn Heisey

On 8/20/2018 9:57 PM, Zheng Lin Edwin Yeo wrote:

This is the error that I get:
ERROR StatusLogger Unable to access file:/C:/Users/edwin/Desktop/edwin/solr-
7.4.0/server/scripts/cloud-scripts/file:C:/Users/edwin/Desktop/edwin/solr-7.
4.0/server/scripts/cloud-scripts/log4j2.xml
  java.io.FileNotFoundException: C:\Users\edwin\Desktop\edwin\solr-7.4.0\serv
er\scripts\cloud-scripts\file:C:\Users\edwin\Desktop\edwin\solr-7.4.0\server
\scripts\cloud-scripts\log4j2.xml (The filename, directory name, or volume
label syntax is incorrect)


We have a bug for this already:

https://issues.apache.org/jira/browse/SOLR-12538

Should be fine when 7.5.0 is released.

Quick fix:  In solr.cmd change all occurrences of "file:" to "file:///".

Something changed between log4j1 and log4j2 for file URI handling.  The 
problem only surfaces on Windows.


Thanks,
Shawn



RE: SOLRJ 7.x library fails ejb deployment with weblogic

2018-08-21 Thread Ganesh Kumar J
Hi Jan Høydahl,

  Thank you so much for your reply. Unfortunately our 
production weblogic cluster runs in java 1.7 we are unable to upgrade to java 
1.8

In this case do you have any idea how we can use solrj version below 7.x which 
can authenticate with kerberozied solr cluster. Since keberozied authentication 
classes are available only from solrj7.x

Thanks & Regards,
J.Ganesh Kumar.

From: Jan Høydahl [mailto:jan@cominvent.com]
Sent: 21 August 2018 17:12
To: solr-user 
Cc: Ganesh Kumar J 
Subject: Re: SOLRJ 7.x library fails ejb deployment with weblogic

Hi,

I don't know what version of Weblogic and Java you use, but note that Solr, 
even SolrJ 7.x requires Java 8, while 5.x required Java 7.

There seems to be several discussion on stackoverflow and elsewhere about 
similar issues:
https://stackoverflow.com/questions/19152655/java-lang-arrayindexoutofboundsexception-while-deploying-app-in-wls-12

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


21. aug. 2018 kl. 13:28 skrev Ganesh Kumar J 
mailto:ganeshkuma...@sella.it>>:

Hi Team,

We have an ejb application and deployment in weblogic cluster where 
the application uses SOLRJ java client to communicate with SOLR.

Previously we were using solrj 5.4 and it was working fine. 
Recently we enabled kerberos in our cluster so we forced to upgrade our solrj 
library to 7.x. since the authentication stuff classes are available only in 
solrj version 7.x

 ours is maven project so are using "weblogic-maven-plugin" to 
compile all our jsp files before deployment and build fails due to below error 
trace

  And also also we runs the build by removing that 
"weblogic-maven-plugin" in pom. But this time build is success and deployment 
fails.



[jspc] -webapp specified, searching . for JSPs
[jspc] No jsp files found, nothing to do
java.lang.ArrayIndexOutOfBoundsException: 22091
at com.bea.objectweb.asm.ClassReader.(Unknown Source)
at com.bea.objectweb.asm.ClassReader.(Unknown Source)
at 
weblogic.application.utils.annotation.ClassInfoImpl.(ClassInfoImpl.java:51)
at 
weblogic.application.utils.annotation.ClassfinderClassInfos.polulateOneClassInfo(ClassfinderClassInfos
at 
weblogic.application.utils.annotation.ClassfinderClassInfos.populateClassInfos(ClassfinderClassInfos.j
at 
weblogic.application.utils.annotation.ClassfinderClassInfos.(ClassfinderClassInfos.java:35)
at weblogic.servlet.internal.War.initializeClassInfosIfNecessary(War.java:443)
at weblogic.servlet.internal.War.getAnnotatedClasses(War.java:373)
at 
weblogic.servlet.internal.WebBaseModuleExtensionContext.getAnnotatedClasses(WebBaseModuleExtensionCont
at 
weblogic.ejb.container.deployer.BaseModuleExtensionFactory.hasAnnotatedEJBs(BaseModuleExtensionFactory
at 
weblogic.ejb.tools.EJBToolsModuleExtensionFactory.create(EJBToolsModuleExtensionFactory.java:22)
at 
weblogic.application.compiler.ModuleState.initExtensions(ModuleState.java:206)
at 
weblogic.application.compiler.flow.CompileModuleFlow.compileModules(CompileModuleFlow.java:148)
at 
weblogic.application.compiler.flow.CompileModuleFlow.compile(CompileModuleFlow.java:69)
at 
weblogic.application.compiler.FlowDriver$FlowStateChange.next(FlowDriver.java:70)
at 
weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:42)
at weblogic.application.compiler.FlowDriver.nextState(FlowDriver.java:37)
at weblogic.application.compiler.FlowDriver.run(FlowDriver.java:27)
at weblogic.application.compiler.EARCompiler.compile(EARCompiler.java:53)
at 
weblogic.application.compiler.flow.AppCompilerFlow.compileInput(AppCompilerFlow.java:101)
at 
weblogic.application.compiler.flow.AppCompilerFlow.compile(AppCompilerFlow.java:35)
at 
weblogic.application.compiler.FlowDriver$FlowStateChange.next(FlowDriver.java:70)
at 
weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:42)
at weblogic.application.compiler.FlowDriver.nextState(FlowDriver.java:37)
at weblogic.application.compiler.FlowDriver.run(FlowDriver.java:27)
at weblogic.application.compiler.Appc.runBody(Appc.java:203)
at weblogic.utils.compiler.Tool.run(Tool.java:158)
at weblogic.utils.compiler.Tool.run(Tool.java:115)
at weblogic.application.compiler.Appc.main(Appc.java:263)
at weblogic.appc.main(appc.java:14)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at weblogic.ant.taskdefs.j2ee.CompilerTask.invokeMain(CompilerTask.java:301)
at weblogic.ant.taskdefs.j2ee.Appc.privateExecute(Appc.java:261)
at weblogic.ant.taskdefs.j2ee.Appc.execute(Appc.java:164)
at org.codehaus.mojo.weblogic.AppcMojo.execute(AppcMojo.java:191)
at 
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager

Re: SOLRJ 7.x library fails ejb deployment with weblogic

2018-08-21 Thread Jan Høydahl
Hi,

I don't know what version of Weblogic and Java you use, but note that Solr, 
even SolrJ 7.x requires Java 8, while 5.x required Java 7.

There seems to be several discussion on stackoverflow and elsewhere about 
similar issues:
https://stackoverflow.com/questions/19152655/java-lang-arrayindexoutofboundsexception-while-deploying-app-in-wls-12
 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 21. aug. 2018 kl. 13:28 skrev Ganesh Kumar J :
> 
> Hi Team,
> 
> We have an ejb application and deployment in weblogic cluster 
> where the application uses SOLRJ java client to communicate with SOLR.
> 
> Previously we were using solrj 5.4 and it was working fine. 
> Recently we enabled kerberos in our cluster so we forced to upgrade our solrj 
> library to 7.x. since the authentication stuff classes are available only in 
> solrj version 7.x
> 
>  ours is maven project so are using "weblogic-maven-plugin" to 
> compile all our jsp files before deployment and build fails due to below 
> error trace
> 
>   And also also we runs the build by removing that 
> "weblogic-maven-plugin" in pom. But this time build is success and deployment 
> fails.
> 
> 
> 
> [jspc] -webapp specified, searching . for JSPs
> [jspc] No jsp files found, nothing to do
> java.lang.ArrayIndexOutOfBoundsException: 22091
> at com.bea.objectweb.asm.ClassReader.(Unknown Source)
> at com.bea.objectweb.asm.ClassReader.(Unknown Source)
> at 
> weblogic.application.utils.annotation.ClassInfoImpl.(ClassInfoImpl.java:51)
> at 
> weblogic.application.utils.annotation.ClassfinderClassInfos.polulateOneClassInfo(ClassfinderClassInfos
> at 
> weblogic.application.utils.annotation.ClassfinderClassInfos.populateClassInfos(ClassfinderClassInfos.j
> at 
> weblogic.application.utils.annotation.ClassfinderClassInfos.(ClassfinderClassInfos.java:35)
> at weblogic.servlet.internal.War.initializeClassInfosIfNecessary(War.java:443)
> at weblogic.servlet.internal.War.getAnnotatedClasses(War.java:373)
> at 
> weblogic.servlet.internal.WebBaseModuleExtensionContext.getAnnotatedClasses(WebBaseModuleExtensionCont
> at 
> weblogic.ejb.container.deployer.BaseModuleExtensionFactory.hasAnnotatedEJBs(BaseModuleExtensionFactory
> at 
> weblogic.ejb.tools.EJBToolsModuleExtensionFactory.create(EJBToolsModuleExtensionFactory.java:22)
> at 
> weblogic.application.compiler.ModuleState.initExtensions(ModuleState.java:206)
> at 
> weblogic.application.compiler.flow.CompileModuleFlow.compileModules(CompileModuleFlow.java:148)
> at 
> weblogic.application.compiler.flow.CompileModuleFlow.compile(CompileModuleFlow.java:69)
> at 
> weblogic.application.compiler.FlowDriver$FlowStateChange.next(FlowDriver.java:70)
> at 
> weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:42)
> at weblogic.application.compiler.FlowDriver.nextState(FlowDriver.java:37)
> at weblogic.application.compiler.FlowDriver.run(FlowDriver.java:27)
> at weblogic.application.compiler.EARCompiler.compile(EARCompiler.java:53)
> at 
> weblogic.application.compiler.flow.AppCompilerFlow.compileInput(AppCompilerFlow.java:101)
> at 
> weblogic.application.compiler.flow.AppCompilerFlow.compile(AppCompilerFlow.java:35)
> at 
> weblogic.application.compiler.FlowDriver$FlowStateChange.next(FlowDriver.java:70)
> at 
> weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:42)
> at weblogic.application.compiler.FlowDriver.nextState(FlowDriver.java:37)
> at weblogic.application.compiler.FlowDriver.run(FlowDriver.java:27)
> at weblogic.application.compiler.Appc.runBody(Appc.java:203)
> at weblogic.utils.compiler.Tool.run(Tool.java:158)
> at weblogic.utils.compiler.Tool.run(Tool.java:115)
> at weblogic.application.compiler.Appc.main(Appc.java:263)
> at weblogic.appc.main(appc.java:14)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at weblogic.ant.taskdefs.j2ee.CompilerTask.invokeMain(CompilerTask.java:301)
> at weblogic.ant.taskdefs.j2ee.Appc.privateExecute(Appc.java:261)
> at weblogic.ant.taskdefs.j2ee.Appc.execute(Appc.java:164)
> at org.codehaus.mojo.weblogic.AppcMojo.execute(AppcMojo.java:191)
> at 
> org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490)
> at 
> org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694)
> at 
> org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.
> at 
> org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535)
> at 
> org.apache.maven.lifecy

SOLRJ 7.x library fails ejb deployment with weblogic

2018-08-21 Thread Ganesh Kumar J
Hi Team,

 We have an ejb application and deployment in weblogic cluster 
where the application uses SOLRJ java client to communicate with SOLR.

 Previously we were using solrj 5.4 and it was working fine. 
Recently we enabled kerberos in our cluster so we forced to upgrade our solrj 
library to 7.x. since the authentication stuff classes are available only in 
solrj version 7.x

  ours is maven project so are using "weblogic-maven-plugin" to 
compile all our jsp files before deployment and build fails due to below error 
trace

   And also also we runs the build by removing that 
"weblogic-maven-plugin" in pom. But this time build is success and deployment 
fails.



[jspc] -webapp specified, searching . for JSPs
[jspc] No jsp files found, nothing to do
java.lang.ArrayIndexOutOfBoundsException: 22091
at com.bea.objectweb.asm.ClassReader.(Unknown Source)
at com.bea.objectweb.asm.ClassReader.(Unknown Source)
at 
weblogic.application.utils.annotation.ClassInfoImpl.(ClassInfoImpl.java:51)
at 
weblogic.application.utils.annotation.ClassfinderClassInfos.polulateOneClassInfo(ClassfinderClassInfos
at 
weblogic.application.utils.annotation.ClassfinderClassInfos.populateClassInfos(ClassfinderClassInfos.j
at 
weblogic.application.utils.annotation.ClassfinderClassInfos.(ClassfinderClassInfos.java:35)
at weblogic.servlet.internal.War.initializeClassInfosIfNecessary(War.java:443)
at weblogic.servlet.internal.War.getAnnotatedClasses(War.java:373)
at 
weblogic.servlet.internal.WebBaseModuleExtensionContext.getAnnotatedClasses(WebBaseModuleExtensionCont
at 
weblogic.ejb.container.deployer.BaseModuleExtensionFactory.hasAnnotatedEJBs(BaseModuleExtensionFactory
at 
weblogic.ejb.tools.EJBToolsModuleExtensionFactory.create(EJBToolsModuleExtensionFactory.java:22)
at 
weblogic.application.compiler.ModuleState.initExtensions(ModuleState.java:206)
at 
weblogic.application.compiler.flow.CompileModuleFlow.compileModules(CompileModuleFlow.java:148)
at 
weblogic.application.compiler.flow.CompileModuleFlow.compile(CompileModuleFlow.java:69)
at 
weblogic.application.compiler.FlowDriver$FlowStateChange.next(FlowDriver.java:70)
at 
weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:42)
at weblogic.application.compiler.FlowDriver.nextState(FlowDriver.java:37)
at weblogic.application.compiler.FlowDriver.run(FlowDriver.java:27)
at weblogic.application.compiler.EARCompiler.compile(EARCompiler.java:53)
at 
weblogic.application.compiler.flow.AppCompilerFlow.compileInput(AppCompilerFlow.java:101)
at 
weblogic.application.compiler.flow.AppCompilerFlow.compile(AppCompilerFlow.java:35)
at 
weblogic.application.compiler.FlowDriver$FlowStateChange.next(FlowDriver.java:70)
at 
weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:42)
at weblogic.application.compiler.FlowDriver.nextState(FlowDriver.java:37)
at weblogic.application.compiler.FlowDriver.run(FlowDriver.java:27)
at weblogic.application.compiler.Appc.runBody(Appc.java:203)
at weblogic.utils.compiler.Tool.run(Tool.java:158)
at weblogic.utils.compiler.Tool.run(Tool.java:115)
at weblogic.application.compiler.Appc.main(Appc.java:263)
at weblogic.appc.main(appc.java:14)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at weblogic.ant.taskdefs.j2ee.CompilerTask.invokeMain(CompilerTask.java:301)
at weblogic.ant.taskdefs.j2ee.Appc.privateExecute(Appc.java:261)
at weblogic.ant.taskdefs.j2ee.Appc.execute(Appc.java:164)
at org.codehaus.mojo.weblogic.AppcMojo.execute(AppcMojo.java:191)
at 
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535)
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecu
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:
at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.

Re: 7.3.1: Query of death - all nodes ran out of memory and had to be shut down

2018-08-21 Thread Jan Høydahl
The solution is to move to cursors, but you may as a safety net try to apply 
the RequestSanitizerComponent to disallow large offsets, see 
https://github.com/cominvent/request-sanitizer-component 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 21. aug. 2018 kl. 11:08 skrev Ere Maijala :
> 
> Hi,
> 
> Just my short comment here. It's difficult to say for someone else, but we 
> identified deep paging as the definite reason for running out of memory or at 
> least grinding to semi-halt because of long stop-the-world garbage collection 
> pauses in an application running on a similar SolrCloud. You can often get 
> away without issues as long as you only have a single shard, but for the 
> reason Erick mentioned deep paging in a sharded index is a heavy operation.
> 
> Regards,
> Ere
> 
> Ash Ramesh kirjoitti 21.8.2018 klo 8.09:
>> Hi Erick,
>> Sorry I phrased that the wrong way. I meant to ask whether there is a high
>> probability that that could be the correlated cause for the issue. Do you
>> know why Solr itself isn't able to recover or is that to be expected with
>> allowing such deep pagination. We are going to be removing it going
>> forwards, but want to make sure that we find the root cause.
>> Appreciate your help as always :)
>> Ash
>> On Tue, Aug 21, 2018 at 2:59 PM Erick Erickson 
>> wrote:
>>> Did the large offsets _definitely_ cause the OOM? How do you expect
>>> that to be answerable? It's likely though. To return rows 1,000,000
>>> through 1,000,010 the system has to keep a list of 1,000,010 top
>>> documents. It has to be this way because you don't know (and can't
>>> guess) the score or a doc prior to, well, scoring it. And these very
>>> large structures are kept for every query being processed. Not only
>>> will that chew up memory, it'll chew up CPU cycles as well as this an
>>> ordered list.
>>> 
>>> This is an anti-pattern, cursors were invented because this pattern is
>>> very costly (as you're finding out).
>>> 
>>> Further, 4G isn't very much memory by modern standards.
>>> 
>>> So it's very likely (but not guaranteed) that using cursors will fix
>>> this problem.
>>> 
>>> Best,
>>> Erick
>>> 
>>> 
>>> 
>>> On Mon, Aug 20, 2018 at 8:55 PM, Ash Ramesh  wrote:
 Hi everyone,
 
 We ran into an issue yesterday where all our ec2 machines, running solr,
 ran out of memory and could not heal themselves. I'll try break down what
 happened here.
 
 *System Architecture:*
 
 - Solr Version: 7.3.1
 - Replica Types: TLOG/PULL
 - Num Shards: 8 (default hashing mechanism)
 - Doc Count: > 20m
 - Index Size: 17G
 - EC2 Machine Spec: 16 Core | 32G ram | 100G SSD
 - Num EC2 Machines: 7+ (scales up and down)
 - Max Shards per node (one node per EC2 instance): 8 (some nodes had 4,
 some had 8)
 - Num TLOG shard replicas: 3 (3 copies of each shard as TLOG)
 - Num PULL shard replicas: 3+
 - Heap: 4G
 
 *What was run prior to the issue:*
 
 We ran these queries around 2.55pm
 
 We ran a bunch of deep paginated queries (offset of 1,000,000) with a
 filter query. We set the timeout to 5 seconds and it did timeout. We
>>> aren't
 sure if this is what caused the irrecoverable failure, but by reading
>>> this
 -
 
>>> https://lucene.apache.org/solr/guide/7_4/pagination-of-results.html#performance-problems-with-deep-paging
 , we feel that this was the cause.
 
 We did not use a cursor.
 
 This cluster was healthy for about 1 week, but we noticed the degradation
 soon after (within 30min) of running the offset queries mentioned above.
>>> We
 currently use a single sharded collection in production, however are
 transitioning to an 8 shard cluster. We hit this issue in a controlled 8
 sharded environment, but don't notice any issues on our production
>>> (single
 sharded) cluster. On production the query still timed out (with same num
 docs etc.) but didn't go into a crazy state.
 
 *What Happened:*
 
 - All the EC2 instances started logging OOM error. None of the nodes were
 responsive to new requests.
 - We saw that the Heap usage jumped from an average of 2.7G to the max of
 4G within a 5 minute window.
 - CPU across all 16 cores was at 100%
 - We saw that the distributed requests were timing out across all
>>> machines.
 - We shutdown all the machines that only had PULL replicas on them and it
 still didn't 'fix' itself.
 - Eventually we shut down SOLR on the main node which had all the master
 TLOG replicas. Once restarted, the machine started working again.
 
 
 *Questions:*
 - Did this deep pagination query *DEFINITELY* cause this issue?
 - Is each node single threaded? I don't think so, but I'd like to confirm
 that.
 - Is there any configuration that we could use to avoid th

Re: Solr unable to start up after setting up SSL in Solr 7.4.0

2018-08-21 Thread Jan Høydahl
Hi,

Now, the zkcli.bat error may in fact be a real bug, perhaps caused by SOLR-7887.
I think you should file that one in a JIRA.

As a workaround you may attempt using the equivalent (from memory):

bin/solr zk upconfig -c collection1 -d /path/to/conf -z localhost:2181

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 21. aug. 2018 kl. 05:57 skrev Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> The default configurations is working for this, so I will try to change the
> things step by step first to find out this issue.
> 
> However, I found that I can't load the configurations into ZooKeeper even
> in a new cluster with default configurations in Solr 7.4.0 (it also can't
> load previously and I use the zkcli.bat for Solr 7.3.1 to load the
> configurations). The same method and command is working for Solr 7.3.1.
> This is done by using zkcli.bat under server\scripts\cloud-scripts, with
> the following command:
> zkcli.bat -zkhost localhost:2181 \ -cmd upconfig -confname collection1
> -confdir
> C:\Users\edwin\Desktop\edwin\solr-7.4.0\configuration\collection1\conf
> 
> This is the error that I get:
> ERROR StatusLogger Unable to access file:/C:/Users/edwin/Desktop/edwin/solr-
> 7.4.0/server/scripts/cloud-scripts/file:C:/Users/edwin/Desktop/edwin/solr-7.
> 4.0/server/scripts/cloud-scripts/log4j2.xml
> java.io.FileNotFoundException: C:\Users\edwin\Desktop\edwin\solr-7.4.0\serv
> er\scripts\cloud-scripts\file:C:\Users\edwin\Desktop\edwin\solr-7.4.0\server
> \scripts\cloud-scripts\log4j2.xml (The filename, directory name, or volume
> label
> syntax is incorrect)
>at java.io.FileInputStream.open0(Native Method)
>at java.io.FileInputStream.open(Unknown Source)
>at java.io.FileInputStream.(Unknown Source)
>at java.io.FileInputStream.(Unknown Source)
>at sun.net.www.protocol.file.FileURLConnection.connect(Unknown
> Source)
>at
> sun.net.www.protocol.file.FileURLConnection.getInputStream(Unknown So
> urce)
>at java.net.URL.openStream(Unknown Source)
>at
> org.apache.logging.log4j.core.config.ConfigurationSource.fromUri(Conf
> igurationSource.java:247)
>at
> org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.get
> Configuration(ConfigurationFactory.java:404)
>at
> org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.get
> Configuration(ConfigurationFactory.java:346)
>at
> org.apache.logging.log4j.core.config.ConfigurationFactory.getConfigur
> ation(ConfigurationFactory.java:260)
>at
> org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext
> .java:615)
>at
> org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext
> .java:636)
>at
> org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:
> 231)
>at
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log
> 4jContextFactory.java:153)
>at
> org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log
> 4jContextFactory.java:45)
>at
> org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
>at
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(Abstrac
> tLoggerAdapter.java:121)
>at
> org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFac
> tory.java:43)
>at
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(Abstract
> LoggerAdapter.java:46)
>at
> org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFact
> ory.java:29)
>at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
>at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
>at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:
> 74)
>at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:197)
> 
> 
> Regards,
> Edwin
> 
> 
> On Mon, 20 Aug 2018 at 21:11, Jan Høydahl  wrote:
> 
>> Hi,
>> 
>> Can you please try to reproduce your issue on a completely empty cluster,
>> and a single node Solr instance, following the refguide instructions at
>> https://lucene.apache.org/solr/guide/7_4/enabling-ssl.html <
>> https://lucene.apache.org/solr/guide/7_4/enabling-ssl.html> with all
>> default configurations, just to sort out any custom changes you may have
>> introduced? If that works, then you can try to change things step by step
>> until you find the difference in config causing your issue.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 20. aug. 2018 kl. 11:04 skrev Zheng Lin Edwin Yeo >> :
>>> 
>>> Hi,
>>> 
>>> So far it is still not able to work with the files from Solr 7.4.0. I
>> found
>>> that the jetty-ssl.xml is the file with the difference that causes the
>>> issue.
>>> 
>>> This is the jetty-ssl.xml from Solr 7.3.1:
>>> >> default="./etc/solr-ssl.keystore.jks"/>
>>> >> default="secret"/>
>>> >> default="./etc/solr-ssl.keystore.jks"/>
>>> >> default="secret"/>
>>> > name="solr.jetty

Questions on optimizing RptWithGeometrySpatialField

2018-08-21 Thread Benoît Denis
Hello,

I am indexing WKT polygons on a RptWithGeometrySpatialField defined as follows:



I am currently experimenting on tuning the values of distErrPct and maxDistErr.

>From my experiments, increasing the value of distErrPct from the default 0.15 
>to 0.5 greatly increases indexing speed (5 times faster), however, only 
>increasing maxDistErr from 0.001 to 1 has no impact at all.

I have a few questions:


·I do not understand clearly from the documentation how do distErrPct 
and maxDistErr relate to each other and what explains that changing maxDistErr 
does not have impact. Is it possible to further explain the relationship 
between those two values?

·From the documentation, whatever is the value of distErrPct, with 
RptWithGeometrySpatialField my queries will always be fully precise. Is it 
correct? So why not always putting the value to 0.5, which seems to improve 
indexing performance and I am not measuring major query speed impact.

·Lastly, the documentation says that RptWithGeometrySpatialField is 
particularly optimized for INTERSECTS. Most of my queries are currently testing 
the intersections of a box with indexed polygons. The box is transformed to a 
coordinated range query, such as  myfield:[49.9,57.0 TO 72.8,160.1] . Should I 
modify my code to do explicitly an INTERSECTS for better performances or is my 
query automatically converted to an INTERSECTS?

Thank you in advance!

Benoît



 --

E-MAIL DISCLAIMER

The present message may contain confidential and/or legally privileged 
information. If you are not the intended addressee and in case of a 
transmission error, please notify the sender immediately and destroy this 
E-mail. Disclosure, reproduction or distribution of this document and its 
possible attachments is strictly forbidden.

SPACEBEL denies all liability for incomplete, improper, inaccurate, 
intercepted, (partly) destroyed, lost and/or belated transmission of the 
current information given that unencrypted electronic transmission cannot 
currently be guaranteed to be secure or error free.
Upon request or in conformity with formal, contractual agreements, an 
originally signed hard copy will be sent to you to confirm the information 
contained in this E-mail.

SPACEBEL denies all liability where E-mail is used for private use.

SPACEBEL cannot be held responsible for possible viruses that might corrupt 
this message and/or your computer system.
 ---

Re: 7.3.1: Query of death - all nodes ran out of memory and had to be shut down

2018-08-21 Thread Ere Maijala

Hi,

Just my short comment here. It's difficult to say for someone else, but 
we identified deep paging as the definite reason for running out of 
memory or at least grinding to semi-halt because of long stop-the-world 
garbage collection pauses in an application running on a similar 
SolrCloud. You can often get away without issues as long as you only 
have a single shard, but for the reason Erick mentioned deep paging in a 
sharded index is a heavy operation.


Regards,
Ere

Ash Ramesh kirjoitti 21.8.2018 klo 8.09:

Hi Erick,

Sorry I phrased that the wrong way. I meant to ask whether there is a high
probability that that could be the correlated cause for the issue. Do you
know why Solr itself isn't able to recover or is that to be expected with
allowing such deep pagination. We are going to be removing it going
forwards, but want to make sure that we find the root cause.

Appreciate your help as always :)

Ash

On Tue, Aug 21, 2018 at 2:59 PM Erick Erickson 
wrote:


Did the large offsets _definitely_ cause the OOM? How do you expect
that to be answerable? It's likely though. To return rows 1,000,000
through 1,000,010 the system has to keep a list of 1,000,010 top
documents. It has to be this way because you don't know (and can't
guess) the score or a doc prior to, well, scoring it. And these very
large structures are kept for every query being processed. Not only
will that chew up memory, it'll chew up CPU cycles as well as this an
ordered list.

This is an anti-pattern, cursors were invented because this pattern is
very costly (as you're finding out).

Further, 4G isn't very much memory by modern standards.

So it's very likely (but not guaranteed) that using cursors will fix
this problem.

Best,
Erick



On Mon, Aug 20, 2018 at 8:55 PM, Ash Ramesh  wrote:

Hi everyone,

We ran into an issue yesterday where all our ec2 machines, running solr,
ran out of memory and could not heal themselves. I'll try break down what
happened here.

*System Architecture:*

- Solr Version: 7.3.1
- Replica Types: TLOG/PULL
- Num Shards: 8 (default hashing mechanism)
- Doc Count: > 20m
- Index Size: 17G
- EC2 Machine Spec: 16 Core | 32G ram | 100G SSD
- Num EC2 Machines: 7+ (scales up and down)
- Max Shards per node (one node per EC2 instance): 8 (some nodes had 4,
some had 8)
- Num TLOG shard replicas: 3 (3 copies of each shard as TLOG)
- Num PULL shard replicas: 3+
- Heap: 4G

*What was run prior to the issue:*

We ran these queries around 2.55pm

We ran a bunch of deep paginated queries (offset of 1,000,000) with a
filter query. We set the timeout to 5 seconds and it did timeout. We

aren't

sure if this is what caused the irrecoverable failure, but by reading

this

-


https://lucene.apache.org/solr/guide/7_4/pagination-of-results.html#performance-problems-with-deep-paging

, we feel that this was the cause.

We did not use a cursor.

This cluster was healthy for about 1 week, but we noticed the degradation
soon after (within 30min) of running the offset queries mentioned above.

We

currently use a single sharded collection in production, however are
transitioning to an 8 shard cluster. We hit this issue in a controlled 8
sharded environment, but don't notice any issues on our production

(single

sharded) cluster. On production the query still timed out (with same num
docs etc.) but didn't go into a crazy state.

*What Happened:*

- All the EC2 instances started logging OOM error. None of the nodes were
responsive to new requests.
- We saw that the Heap usage jumped from an average of 2.7G to the max of
4G within a 5 minute window.
- CPU across all 16 cores was at 100%
- We saw that the distributed requests were timing out across all

machines.

- We shutdown all the machines that only had PULL replicas on them and it
still didn't 'fix' itself.
- Eventually we shut down SOLR on the main node which had all the master
TLOG replicas. Once restarted, the machine started working again.


*Questions:*
- Did this deep pagination query *DEFINITELY* cause this issue?
- Is each node single threaded? I don't think so, but I'd like to confirm
that.
- Is there any configuration that we could use to avoid this in the

future?

- Why could the nodes not recover by themselves? When we ran the same

query

on the single shard cluster it failed and didn't spin out of control.

Thanks for all your help, Logs are pasted below from different

timestamps.


Regards,
Ash

*Logs:*

Here are some logs we collected. Not sure if it tells a lot outside of

what

we know.

*Time: 2.55pm ~ Requests are failing to complete in time*


ERROR RequestHandlerBase org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Time allowed to handle
this request exceeded:[
http://10.0.9.204:8983/solr/media_shard1_replica_p57,
http://10.0.9.204:8983/solr/media_shard4_replica_p80,
http://10.0.9.204:8983/solr/media_shard3_replica_p73,
http://10.0.9.204:8983/solr/media_shard2_replica_p68]
#011at


org.apache.solr.handler.component.