Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?

2018-10-24 Thread shamik
Thanks Erick, appreciate your help



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr cluster tuning

2018-10-24 Thread Erick Erickson
To add to Daniel's comments: Are you indexing at the same time? Say
your autocommit time is 10 seconds. For the sake of argument let's say
it takes 15 queries to warm your searcher. Let's further say that the
average time for those 15 queries is 500ms each and once the searcher
is warmed the average time drops to 100ms. You'll have an average
close to 100ms.

OTOH, if you only fire 15 queries over that 10 seconds, the average
would be 500ms.

My guess is your autowarm counts for filterCache and queryResult cache
are the default 0 and if you set them to, say, 20 each much of your
problem would disappear.  Ditto if you stopped indexing. Both point to
the searchers having to pull data into memory from disk and/or rebuild
caches.

Best,
Erick
On Wed, Oct 24, 2018 at 1:37 PM Davis, Daniel (NIH/NLM) [C]
 wrote:
>
> Usually, responses are due to I/O waits getting the data off of the disk.   
> So, to me, this seems more likely because as you bombard the server with 
> queries, you cause more and more of the data needed to answer the query into 
> memory.
>
> To verify this, I'd bombard your server with queries to warm it up, and then 
> repeat your test with the queries coming in slowly or quickly.
>
> If it still holds up, then there is something other than Solr going on with 
> that server, and taking memory from Solr or your index is somewhat too big 
> for your server.  Linux likes to overcommit memory - try setting vm 
> swappiness to something low, like 10, rather than the default 60.   Look for 
> anything on the server with Solr that may be competing with it for I/O 
> resources, and causing its pages to swap out.
>
> Also, look at the size of your index data.
>
> These are general advises in dealing with inverted indexes - some of the Solr 
> engineers on this list may have some very specific ideas, such as merging 
> activity or other background tasks running when the query load is lighter.   
> I wouldn't know how to check for these things, but would thing they wouldn't 
> affect query response time that badly.
>
> -Original Message-
> From: Vidhya Kailash 
> Sent: Wednesday, October 24, 2018 4:22 PM
> To: solr-user@lucene.apache.org
> Subject: Solr cluster tuning
>
> We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch data 
> from collections. We recently deployed our code to production and noticed 
> that response time is more if the number of incoming requests are less.
>
> But strangely, if we bombard the system with more and more requests we get 
> much better response time.
>
> My suspicion is client is closing the connections sooner in case of slower 
> requests and slower in case of faster requests.
>
> We tried tuning by passing custom HTTPClient to SolrJ and also by updating 
> HttpShardHandlerFactory settings. For example we made - maxThreadIdleTime = 
> 6 socketTimeOut = 18
>
> Wondering what other tuning we can do to make this perform the same 
> irrespective of the number of requests.
>
> Thanks!
>
> Vidhya


Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?

2018-10-24 Thread Erick Erickson
No best practices as such, "whatever works" about covers it. That's
not a huge query rate, especially if you have replicas per shard so I
wouldn't worry too much about it. If you rack 100 clients all driving
Solr as hard as possible and people complain that query responses are
bad you'll know where to look first.

About batching, see:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/

YMMV of course. If I were going to give you a starting point for
batching it would be on the order of at least 100 per shard. So a 5
shard collection would have at least 500 Solr documents per call to
cloudSolrClient.add(doclist).

Best,
Erick
On Wed, Oct 24, 2018 at 2:20 PM shamik  wrote:
>
> Thanks Erick, that's extremely insightful. I'm not using batching and that's
> the reason I was exploring ConcurrentUpdateSolrClient. Currently, N threads
> are reusing the same CloudSolrClient to send data to Solr. Ofcourse, the
> single point of failure was my biggest concern with
> ConcurrentUpdateSolrClient, thanks for clarifying my doubt.
>
> "You also want to be a little careful how hard you drive Solr if you're also
> serving queries at the same time, the more cycles you use for indexing the
> fewer are available to serve queries."
>
> Our solr servers are also used to serve queries (50-100/minute). Our hard
> commit set at 10 minutes while soft commit is disabled. Are there any best
> practices (I know it's too generic, but specifically around indexing) that I
> should follow?
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr and Java G1

2018-10-24 Thread Erick Erickson
Java 9 hasn't been proven at this point, so I'd be reluctant to
recommend it under any circumstances. Java 11 is probably going to be
the next recommended version, but there are some outstanding issues.

Lots of clients I know are using G1 at this point.

Best,
Erick
On Wed, Oct 24, 2018 at 3:05 PM Walter Underwood  wrote:
>
> We’ve been running 1.8.0_131 with G1 in prod for well over a year, on 30-50 
> hosts.
>
> We actually got a SIGSEGV in a JVM two days ago, but that has been the only 
> error. AWS had scheduled a reboot for that some host by tomorrow, so it might 
> have been a hardware issue.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 24, 2018, at 2:59 PM, Brian Lininger  
> > wrote:
> >
> > Hey Solr Users,
> > I'm curious what the state of Solr/Lucene is with the G1 garbage
> > collector. specifically Solr 6.6.5 & 7.5.0 and JDK8/9?  I know that
> > there have been issues previously that caused Lucene's test suites to
> > randomly fail with G1, but all of the known issues are resolved as far as I
> > can tell. Reading thru email archive it seems that plenty of people have
> > been using G1 in production for a year or two without issue, has anyone
> > tried G1 recently and run into any issues?
> > Thanks,
> > Brian Lininger
>


Re: Solr and Java G1

2018-10-24 Thread Walter Underwood
We’ve been running 1.8.0_131 with G1 in prod for well over a year, on 30-50 
hosts.

We actually got a SIGSEGV in a JVM two days ago, but that has been the only 
error. AWS had scheduled a reboot for that some host by tomorrow, so it might 
have been a hardware issue.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 2:59 PM, Brian Lininger  wrote:
> 
> Hey Solr Users,
> I'm curious what the state of Solr/Lucene is with the G1 garbage
> collector. specifically Solr 6.6.5 & 7.5.0 and JDK8/9?  I know that
> there have been issues previously that caused Lucene's test suites to
> randomly fail with G1, but all of the known issues are resolved as far as I
> can tell. Reading thru email archive it seems that plenty of people have
> been using G1 in production for a year or two without issue, has anyone
> tried G1 recently and run into any issues?
> Thanks,
> Brian Lininger



Solr and Java G1

2018-10-24 Thread Brian Lininger
Hey Solr Users,
I'm curious what the state of Solr/Lucene is with the G1 garbage
collector. specifically Solr 6.6.5 & 7.5.0 and JDK8/9?  I know that
there have been issues previously that caused Lucene's test suites to
randomly fail with G1, but all of the known issues are resolved as far as I
can tell. Reading thru email archive it seems that plenty of people have
been using G1 in production for a year or two without issue, has anyone
tried G1 recently and run into any issues?
Thanks,
Brian Lininger


Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?

2018-10-24 Thread shamik
Thanks Erick, that's extremely insightful. I'm not using batching and that's
the reason I was exploring ConcurrentUpdateSolrClient. Currently, N threads
are reusing the same CloudSolrClient to send data to Solr. Ofcourse, the
single point of failure was my biggest concern with
ConcurrentUpdateSolrClient, thanks for clarifying my doubt.

"You also want to be a little careful how hard you drive Solr if you're also
serving queries at the same time, the more cycles you use for indexing the
fewer are available to serve queries."

Our solr servers are also used to serve queries (50-100/minute). Our hard
commit set at 10 minutes while soft commit is disabled. Are there any best
practices (I know it's too generic, but specifically around indexing) that I
should follow?





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Solr cluster tuning

2018-10-24 Thread Davis, Daniel (NIH/NLM) [C]
Usually, responses are due to I/O waits getting the data off of the disk.   So, 
to me, this seems more likely because as you bombard the server with queries, 
you cause more and more of the data needed to answer the query into memory.

To verify this, I'd bombard your server with queries to warm it up, and then 
repeat your test with the queries coming in slowly or quickly.

If it still holds up, then there is something other than Solr going on with 
that server, and taking memory from Solr or your index is somewhat too big for 
your server.  Linux likes to overcommit memory - try setting vm swappiness to 
something low, like 10, rather than the default 60.   Look for anything on the 
server with Solr that may be competing with it for I/O resources, and causing 
its pages to swap out.

Also, look at the size of your index data.

These are general advises in dealing with inverted indexes - some of the Solr 
engineers on this list may have some very specific ideas, such as merging 
activity or other background tasks running when the query load is lighter.   I 
wouldn't know how to check for these things, but would thing they wouldn't 
affect query response time that badly. 

-Original Message-
From: Vidhya Kailash  
Sent: Wednesday, October 24, 2018 4:22 PM
To: solr-user@lucene.apache.org
Subject: Solr cluster tuning

We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch data from 
collections. We recently deployed our code to production and noticed that 
response time is more if the number of incoming requests are less.

But strangely, if we bombard the system with more and more requests we get much 
better response time.

My suspicion is client is closing the connections sooner in case of slower 
requests and slower in case of faster requests.

We tried tuning by passing custom HTTPClient to SolrJ and also by updating 
HttpShardHandlerFactory settings. For example we made - maxThreadIdleTime = 
6 socketTimeOut = 18

Wondering what other tuning we can do to make this perform the same 
irrespective of the number of requests.

Thanks!

Vidhya


Solr cluster tuning

2018-10-24 Thread Vidhya Kailash
We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch data
from collections. We recently deployed our code to production and noticed
that response time is more if the number of incoming requests are less.

But strangely, if we bombard the system with more and more requests we get
much better response time.

My suspicion is client is closing the connections sooner in case of slower
requests and slower in case of faster requests.

We tried tuning by passing custom HTTPClient to SolrJ and also by updating
HttpShardHandlerFactory settings. For example we made -
maxThreadIdleTime = 6
socketTimeOut = 18

Wondering what other tuning we can do to make this perform the same
irrespective of the number of requests.

Thanks!

Vidhya


Re: Does ConcurrentUpdateSolrClient apply for SolrCloud ?

2018-10-24 Thread Erick Erickson
I wouldn't use ConcurrentUpdateSolrClient for the following reasons:

1> If a doc that needs to go to shard2 is received by a replica on
shard1, it must be forwarded to the leader of shard1, introducing an
extra hop. CloudSolrClient subdivides the batch and sends the docs to
the leader of the right shard automatically. You are batching, right?
You should.

2> CloudSolrClient does the above in parallel _already_.

3> You put the load for routing docs entirely on the single Solr node
you specify in the url.

4> You introduce a single point of failure (i.e. the node you specify
in the url).

5> If your indexing throughput is not what you need, you can string
together N SolrJ clients. Or you can create N threads in your indexing
client and still get the advantages of CloudSolrClient routing docs
correctly.

You also want to be a little careful how hard you drive Solr if you're
also serving queries at the same time, the more cycles you use for
indexing the fewer are available to serve queries.

Best,
Erick


On Wed, Oct 24, 2018 at 1:01 PM Shamik Bandopadhyay  wrote:
>
> Hi,
>
>I'm looking into the possibility of using ConcurrentUpdateSolrClient for
> indexing a large volume of data instead of CloudSolrClient. Having an
> async,batch API seems to be a better fit for us where we tend to index a
> lot of data periodically. As I'm looking into the API, I'm wonderign if
> this can be used for SolrCloud.
>
> ConcurrentUpdateSolrClientclient = new
> ConcurrentUpdateSolrClient.Builder(url).withThreadCount(100).withQueueSize(50).build();
>
> The Builder object only takes a single url, not sure what that would be in
> case of SolrCloud. For e.g. if I've two shards with a couple of replicas,
> then what will be the server url?
>
> I was not able to find any relevant document or example to clarify my
> doubt. Any pointers will be appreciated.
>
> Thanks


Does ConcurrentUpdateSolrClient apply for SolrCloud ?

2018-10-24 Thread Shamik Bandopadhyay
Hi,

   I'm looking into the possibility of using ConcurrentUpdateSolrClient for
indexing a large volume of data instead of CloudSolrClient. Having an
async,batch API seems to be a better fit for us where we tend to index a
lot of data periodically. As I'm looking into the API, I'm wonderign if
this can be used for SolrCloud.

ConcurrentUpdateSolrClientclient = new
ConcurrentUpdateSolrClient.Builder(url).withThreadCount(100).withQueueSize(50).build();

The Builder object only takes a single url, not sure what that would be in
case of SolrCloud. For e.g. if I've two shards with a couple of replicas,
then what will be the server url?

I was not able to find any relevant document or example to clarify my
doubt. Any pointers will be appreciated.

Thanks


Re: Slow Response for less volume

2018-10-24 Thread Deepak Goel
Are you getting errors in Jmeter?

On Wed, 24 Oct 2018, 21:49 Amjad Khan,  wrote:

> Hi,
>
> We recently moved to Solr Cloud (Google) with 4 nodes and have very
> limited number of data.
>
> We are facing very weird issue here, solr cluster response time for query
> is high when we have less number of hit and the moment we run our test to
> hit the solr cluster hard we see better response in 10ms.
>
> Any clue will be appreciated.
>
> Thanks


Re: Setting up MiniSolrCloudCluster to use pre-built index

2018-10-24 Thread Mark Miller
The merge can be really fast - it can just dump in the new segments and
rewrite the segments file basically.

I guess for you want, that's perhaps not the ideal route though. You could
maybe try and use collection aliases.

I thought about adding shard aliases way back, but never got to it.

On Tue, Oct 23, 2018 at 7:10 PM Ken Krugler 
wrote:

> Hi Mark,
>
> I’ll have a completely new, rebuilt index that’s (a) large, and (b)
> already sharded appropriately.
>
> In that case, using the merge API isn’t great, in that it would take
> significant time and temporarily use double (or more) disk space.
>
> E.g. I’ve got an index with 250M+ records, and about 200GB. There are
> other indexes, still big but not quite as large as this one.
>
> So I’m still wondering if there’s any robust way to swap in a fresh set of
> shards, especially without relying on legacy cloud mode.
>
> I think I can figure out where the data is being stored for an existing
> (empty) collection, shut that down, swap in the new files, and reload.
>
> But I’m wondering if that’s really the best (or even sane) approach.
>
> Thanks,
>
> — Ken
>
> On May 19, 2018, at 6:24 PM, Mark Miller  wrote:
>
> You create MiniSolrCloudCluster with a base directory and then each Jetty
> instance created gets a SolrHome in a subfolder called node{i}. So if
> legacyCloud=true you can just preconfigure a core and index under the right
> node{i} subfolder. legacyCloud=true should not even exist anymore though,
> so the long term way to do this would be to create a collection and then
> use the merge API or something to merge your index into the empty
> collection.
>
> - Mark
>
> On Sat, May 19, 2018 at 5:25 PM Ken Krugler 
> wrote:
>
> Hi all,
>
> Wondering if anyone has experience (this is with Solr 6.6) in setting up
> MiniSolrCloudCluster for unit testing, where we want to use an existing
> index.
>
> Note that this index wasn’t built with SolrCloud, as it’s generated by a
> distributed (Hadoop) workflow.
>
> So there’s no “restore from backup” option, or swapping collection
> aliases, etc.
>
> We can push our configset to Zookeeper and create the collection as per
> other unit tests in Solr, but what’s the right way to set up data dirs for
> the cores such that Solr is running with this existing index (or indexes,
> for our sharded test case)?
>
> Thanks!
>
> — Ken
>
> PS - yes, we’re aware of the routing issue with generating our own shards….
>
> --
> Ken Krugler
> +1 530-210-6378 <(530)%20210-6378>
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
> --
>
> - Mark
> about.me/markrmiller
>
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

-- 
- Mark

http://about.me/markrmiller


Re: Slow Response for less volume

2018-10-24 Thread Walter Underwood
If your cache is 2048 entries, then every one of those 1600 queries is in cache.

Our logs typically have about a million lines, with distinct queries 
distributed according to the Zipf law. Some common queries, a long tail, that 
sort of thing.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 10:02 AM, Amjad Khan  wrote:
> 
> Thanks Wunder for this prompt response.
> 
> We are testing with 1600 different text to search with Jmeter and that keeps 
> running continuously, and keep running continuously means cache has been 
> built and there should be better response now. Doesn’t it?
> 
> Thanks
> 
> 
> 
>> On Oct 24, 2018, at 12:20 PM, Walter Underwood  wrote:
>> 
>> Are you testing with a small number of queries? If your cache is larger than 
>> the number of queries in your benchmark, the first round will load the 
>> cache, then everything will be super fast.
>> 
>> Load testing a system with caches is hard.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
>>> 
>>> Hi,
>>> 
>>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
>>> number of data.
>>> 
>>> We are facing very weird issue here, solr cluster response time for query 
>>> is high when we have less number of hit and the moment we run our test to 
>>> hit the solr cluster hard we see better response in 10ms.
>>> 
>>> Any clue will be appreciated.
>>> 
>>> Thanks
>> 
> 



Re: Slow Response for less volume

2018-10-24 Thread Walter Underwood
But a zero size cache doesn’t give realistic benchmarks. It makes things slower 
than they will be in production.

We do this:

1. Collect production logs.
2. Split the logs into a warming log and and a benchmark log. The warming log 
should be at least as large as the query result cache.
3. Run the warming log with four threads (unlikely to overload the system).
4. Run the benchmark with a controlled requests/minute and enough threads to 
keep up with that. Might be a few hundred with a large, slow cluster. Run for 
at least an hour.
5. Analyze the results into percentile response times for each request handler. 
Warn about any errors or a benchmark that takes too long.

Then reepeat. Oh, yeah, load the prod content first.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 9:52 AM, Erick Erickson  wrote:
> 
> You can set your queryResultCache and filterCache "size" parameter to
> zero in solrconfig.xml to disable those caches.
> On Wed, Oct 24, 2018 at 9:21 AM Walter Underwood  
> wrote:
>> 
>> Are you testing with a small number of queries? If your cache is larger than 
>> the number of queries in your benchmark, the first round will load the 
>> cache, then everything will be super fast.
>> 
>> Load testing a system with caches is hard.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
>>> 
>>> Hi,
>>> 
>>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
>>> number of data.
>>> 
>>> We are facing very weird issue here, solr cluster response time for query 
>>> is high when we have less number of hit and the moment we run our test to 
>>> hit the solr cluster hard we see better response in 10ms.
>>> 
>>> Any clue will be appreciated.
>>> 
>>> Thanks
>> 



Re: Slow Response for less volume

2018-10-24 Thread Amjad Khan
Thanks Erick,

But do you think that disabling the cache will increase the response time 
instead of solving the problem here.


> On Oct 24, 2018, at 12:52 PM, Erick Erickson  wrote:
> 
> queryResultCache



Re: Slow Response for less volume

2018-10-24 Thread Amjad Khan
Thanks Wunder for this prompt response.

We are testing with 1600 different text to search with Jmeter and that keeps 
running continuously, and keep running continuously means cache has been built 
and there should be better response now. Doesn’t it?

Thanks



> On Oct 24, 2018, at 12:20 PM, Walter Underwood  wrote:
> 
> Are you testing with a small number of queries? If your cache is larger than 
> the number of queries in your benchmark, the first round will load the cache, 
> then everything will be super fast.
> 
> Load testing a system with caches is hard.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
>> 
>> Hi,
>> 
>> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
>> number of data.
>> 
>> We are facing very weird issue here, solr cluster response time for query is 
>> high when we have less number of hit and the moment we run our test to hit 
>> the solr cluster hard we see better response in 10ms.
>> 
>> Any clue will be appreciated.
>> 
>> Thanks
> 



Re: Slow Response for less volume

2018-10-24 Thread Erick Erickson
You can set your queryResultCache and filterCache "size" parameter to
zero in solrconfig.xml to disable those caches.
On Wed, Oct 24, 2018 at 9:21 AM Walter Underwood  wrote:
>
> Are you testing with a small number of queries? If your cache is larger than 
> the number of queries in your benchmark, the first round will load the cache, 
> then everything will be super fast.
>
> Load testing a system with caches is hard.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
> >
> > Hi,
> >
> > We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
> > number of data.
> >
> > We are facing very weird issue here, solr cluster response time for query 
> > is high when we have less number of hit and the moment we run our test to 
> > hit the solr cluster hard we see better response in 10ms.
> >
> > Any clue will be appreciated.
> >
> > Thanks
>


Re: Slow Response for less volume

2018-10-24 Thread Walter Underwood
Are you testing with a small number of queries? If your cache is larger than 
the number of queries in your benchmark, the first round will load the cache, 
then everything will be super fast.

Load testing a system with caches is hard.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 9:19 AM, Amjad Khan  wrote:
> 
> Hi,
> 
> We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
> number of data.
> 
> We are facing very weird issue here, solr cluster response time for query is 
> high when we have less number of hit and the moment we run our test to hit 
> the solr cluster hard we see better response in 10ms.
> 
> Any clue will be appreciated.
> 
> Thanks



Slow Response for less volume

2018-10-24 Thread Amjad Khan
Hi,

We recently moved to Solr Cloud (Google) with 4 nodes and have very limited 
number of data.

We are facing very weird issue here, solr cluster response time for query is 
high when we have less number of hit and the moment we run our test to hit the 
solr cluster hard we see better response in 10ms.

Any clue will be appreciated.

Thanks

Re: TLOG replica stucks

2018-10-24 Thread Erick Erickson
bq. I've noticed that some replicas stop receiving updates from the
leader without any visible signs from the cluster status.

Hmm, yes, this isn't expected at all. What are you seeing that causes
you to say this? You'd have to be monitoring the log for update
messages to the replicas that aren't leaders or the like. If anyone is
going to have a prayer of reproducing we'll need more info on exactly
what you're seeing and how you're measuring this.

Have you changed any configurations in your replicas at all? We'd need
the exact steps you performed if so.

On a quick test I didn't see this, but if it were that easy to
reproduce I'd expect it to have shown up before.

NOTE: just looking at the cloud graph and having a node be active is
not _necessarily_ sufficient for the node to be up to date. It
_should_ be sufficient if (and only if) the node was shut down
gracefully, but a "kill -9" or similar doesn't give the replicas on
the node the opportunity to change the state. The "live_nodes" znode
in ZooKeeper must also contain the node the replica resides on.

If you see this state again, you could try pinging the node directly,
does it respond? Your URL should look something like:
http://host:port/solr/colection_shard1_replica_t1/query?q=*:*=false

The "distrib=false" is important as it won't forward the query to any
other replica. If what you're reporting is really happening, that node
should respond with a document count different from other nodes.

NOTE: there's a delay between the time the leader indexes a doc and
it's visible on the follower. Are you sure you're waiting for
leader_commit_interval+polling_interval+autowarm_time before
concluding that there's a problem? I'm a bit suspicious that checking
the versions is concluding that your indexes are out of sync when
really they're just catching up normally. If it's at all possible to
turn off indexing for a few minutes when this happens and everything
just gets better then it's not really a problem.

If we prove out that this is really happening as you think, then a
JIRA (with steps to reproduce) is _definitely_ in order.

Best,
Erick
On Wed, Oct 24, 2018 at 2:07 AM Vadim Ivanov
 wrote:
>
> Hi All !
>
> I'm testing Solr 7.5 with TLOG replicas on SolrCloud with 5 nodes.
>
> My collection has shards and every shard has 3 TLOG replicas on different
> nodes.
>
> I've noticed that some replicas stop receiving updates from the leader
> without any visible signs from the cluster status.
>
> (all replicas active and green in Admin UI CLOUD graph). But indexversion of
> 'ill' replica not increasing with the leader.
>
> It seems to be dangerous, because that 'ill' replica could become a leader
> after restart of the nodes and I already experienced data loss.
>
> I didn't notice any meaningfull records in solr log, except that probably
> problem occurs when leader changes.
>
> Meanwhile, I monitor indexversion of all replicas in a cluster by mbeans and
> recreate ill replicas when difference with the leader indexversion  more
> than one
>
> Any suggestions?
>
> --
>
> Best regards, Vadim
>
>
>


Re: partial update in solr

2018-10-24 Thread Alexandre Rafalovitch
You could use something like AtomicUpdateProcessorFactory:
https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory

Regards,
   Alex.
On Wed, 24 Oct 2018 at 04:48, Zahra Aminolroaya  wrote:
>
> Does Solr have a partial update like elastic?
>
> Elastic will automatically merge new document with the existing one having
> the same id. For example if the new document has a value for field that it
> was previously null, it will add the value for that field.
>
>
> However, based on what I found, partially update in solr could be applied
> only by directly defining the updated field sth like below:
>
> curl 'localhost:8983/solr/update?commit=true' -H
> 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
>
> I do not want to define the updated field for Solr by sth like "set". I want
> Solr to automatically merge documents with same id instead of deleting the
> previous document and inserting the new document.
>
> Can Solr do that?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr filter query on STRING field [Was:Re: solr filter query on text field]

2018-10-24 Thread Alexandre Rafalovitch
First one treats space as end of operation, so the second keyword is
searched against default field (id). Try putting the whole thing into the
quotes. Or use Field Query Parser:
https://lucene.apache.org/solr/guide/7_5/other-parsers.html#field-query-parser

Regards,
   Alex.

On Wed, Oct 24, 2018, 4:59 AM Marek Tichy,  wrote:

> Hi,
>
> I'm having troubles with the filter query on a multiple string field,
> specifically with a space between words. Looking at the histogram and
> values using Solr UI it correctly shows that the indexing stores the
> string "Key case" as it should. However the following filter queries:
>
> fq=sm_field_tags:Key case  //doesn't work
> fq=sm_field_tags:Key+case  //doesn't work
> fq=sm_field_tags:Key* //does work
> fq=sm_field_tags:Key?case //does work
>
>
> Debug shows (for the first case):
> "filter_queries":["sm_field_tags:Key case"],
> "parsed_filter_queries":["sm_field_tags:Key id:case"]
>
> Why does it parse to id: case ? Solr version is 7.4.0
>
> Many thanks
> Marek
>
>
>
>
>
>
>
>
>
> > bq.  is there any difference if the fq field is a string field vs test
> >
> > Absolutely. string fields are not analyzed in any way. They're not
> > tokenized. There are case sensitive. Etc. For example takd
> > My dog
> > as input. A string field will have a _single_ token "My dog.". It will
> > not match a search on "my". It will not match a search on "dog". It
> > won't even match "my dog." as a phrase since the case is different. It
> > won't even match "My dog" because there's no period at the end. It
> > will only match "My dog.".
>
>


Re: Join across shards?

2018-10-24 Thread e_briere
Thanks to you both. Did not take the time to look into yet. I will.
Eric.


Sent from my Samsung Galaxy smartphone.
 Original message From: Erick Erickson 
 Date: 2018-10-24  00:57  (GMT-05:00) To: solr-user 
 Subject: Re: Join across shards? 
In addition to Vadim's comment, Solr Streaming _can_
work across shards and even across collections.
Depending on your use-case this may work for you.

Best,
Erick
On Tue, Oct 23, 2018 at 6:41 AM Vadim Ivanov
 wrote:
>
> Hi,
> You CAN join across collections with runtime "join".
> The only limitation is that FROM collection should not be sharded and joined
> data should reside on one node.
> Solr cannot join across nodes (distributed search is not supported).
> Though using streaming expressions it's possible to do various things...
> --
> Vadim
>
> -Original Message-
> From: e_bri...@videotron.ca [mailto:e_bri...@videotron.ca]
> Sent: Tuesday, October 23, 2018 2:38 PM
> To: solr-user@lucene.apache.org
> Subject: Join across shards?
>
> Hi
> all, otedtext>
>
> Sorry if the question was already covered.
>
> We are using joins across documents with the limitation of having the
> documents to be joined sitting on the same shard. Is there a way around this
> limitation and even join across collections? Are there plans to support this
> out of the box?
>
> Thanks!
>
> Eric Briere.
>


TLOG replica stucks

2018-10-24 Thread Vadim Ivanov
Hi All !

I'm testing Solr 7.5 with TLOG replicas on SolrCloud with 5 nodes.

My collection has shards and every shard has 3 TLOG replicas on different
nodes.

I've noticed that some replicas stop receiving updates from the leader
without any visible signs from the cluster status.

(all replicas active and green in Admin UI CLOUD graph). But indexversion of
'ill' replica not increasing with the leader.

It seems to be dangerous, because that 'ill' replica could become a leader
after restart of the nodes and I already experienced data loss.

I didn't notice any meaningfull records in solr log, except that probably
problem occurs when leader changes.

Meanwhile, I monitor indexversion of all replicas in a cluster by mbeans and
recreate ill replicas when difference with the leader indexversion  more
than one

Any suggestions?

-- 

Best regards, Vadim

 



Solr filter query on STRING field [Was:Re: solr filter query on text field]

2018-10-24 Thread Marek Tichy
Hi,

I'm having troubles with the filter query on a multiple string field,
specifically with a space between words. Looking at the histogram and
values using Solr UI it correctly shows that the indexing stores the
string "Key case" as it should. However the following filter queries:

fq=sm_field_tags:Key case      //doesn't work
fq=sm_field_tags:Key+case      //doesn't work
fq=sm_field_tags:Key*             //does work
fq=sm_field_tags:Key?case //does work


Debug shows (for the first case):
"filter_queries":["sm_field_tags:Key case"],
"parsed_filter_queries":["sm_field_tags:Key id:case"]

Why does it parse to id: case ? Solr version is 7.4.0

Many thanks
Marek









> bq.  is there any difference if the fq field is a string field vs test
>
> Absolutely. string fields are not analyzed in any way. They're not
> tokenized. There are case sensitive. Etc. For example takd
> My dog
> as input. A string field will have a _single_ token "My dog.". It will
> not match a search on "my". It will not match a search on "dog". It
> won't even match "my dog." as a phrase since the case is different. It
> won't even match "My dog" because there's no period at the end. It
> will only match "My dog.".



partial update in solr

2018-10-24 Thread Zahra Aminolroaya
Does Solr have a partial update like elastic?

Elastic will automatically merge new document with the existing one having
the same id. For example if the new document has a value for field that it
was previously null, it will add the value for that field.


However, based on what I found, partially update in solr could be applied
only by directly defining the updated field sth like below:

curl 'localhost:8983/solr/update?commit=true' -H
'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'

I do not want to define the updated field for Solr by sth like "set". I want
Solr to automatically merge documents with same id instead of deleting the
previous document and inserting the new document.

Can Solr do that?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Casting from schemaless to classic schema

2018-10-24 Thread Zahra Aminolroaya
Thanks Alexandre and Shawn. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Slow import from MsSQL and down cluster during process

2018-10-24 Thread Deepak Goel
Please check if there is a deadlock happening by taking heap dumps


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Wed, Oct 24, 2018 at 11:12 AM Daniel Carrasco 
wrote:

> Thanks for all, I'll try later ;)
>
> Greetings!!.
>
> El mié., 24 oct. 2018 a las 7:13, Walter Underwood ( >)
> escribió:
>
> > We handle request rates at a few thousand requests/minute with an 8 GB
> > heap. 95th percentile response time is 200 ms. Median (cached) is 4 ms.
> >
> > An oversized heap will hurt your query performance because everything
> > stops for the huge GC.
> >
> > RAM is still a thousand times faster than SSD, so you want a lot of RAM
> > available for file system buffers managed by the OS.
> >
> > I recommend trying an 8 GB heap with the latest version of Java 8 and the
> > G1 collector.
> >
> > We have this in our solr.in.sh:
> >
> > SOLR_HEAP=8g
> > # Use G1 GC  -- wunder 2017-01-23
> > # Settings from https://wiki.apache.org/solr/ShawnHeisey
> > GC_TUNE=" \
> > -XX:+UseG1GC \
> > -XX:+ParallelRefProcEnabled \
> > -XX:G1HeapRegionSize=8m \
> > -XX:MaxGCPauseMillis=200 \
> > -XX:+UseLargePages \
> > -XX:+AggressiveOpts \
> > "
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Oct 23, 2018, at 9:51 PM, Daniel Carrasco 
> > wrote:
> > >
> > > Hello,
> > >
> > > I've set that heap size because the solr receives a lot of queries
> every
> > > second and I want to cache as much as possible. Also I'm not sure about
> > the
> > > number of documents in the collection, but the webpage have a lot of
> > > products.
> > >
> > > About store the index data in RAM is just an expression. The data is
> > stored
> > > on SSD disks with XFS (faster than EXT4).
> > >
> > > I'll take a look to the links tomorrow at work.
> > >
> > > Thanks!!
> > > Greetings!!
> > >
> > >
> > > El mar., 23 oct. 2018 23:48, Shawn Heisey 
> > escribió:
> > >
> > >> On 10/23/2018 7:15 AM, Daniel Carrasco wrote:
> > >>> Hello,
> > >>>
> > >>> Thanks for your response.
> > >>>
> > >>> We've already thought about that and doubled the instances. Just now
> > for
> > >>> every Solr instance we've 60GB of RAM (40GB configured on Solr), and
> a
> > 16
> > >>> Cores CPU. The entire Data can be stored on RAM and will not fill the
> > RAM
> > >>> (of course talking about raw data, not procesed data).
> > >>
> > >> Why are you making the heap so large?  I've set up servers that can
> > >> handle hundreds of millions of Solr documents in a much smaller
> heap.  A
> > >> 40GB heap would be something you might do if you're handling billions
> of
> > >> documents on one server.
> > >>
> > >> When you say the entire data can be stored in RAM ... are you counting
> > >> that 40GB you gave to Solr?  Because you can't count that -- that's
> for
> > >> Solr, NOT the index data.
> > >>
> > >> The heap size should never be dictated by the amount of memory in the
> > >> server.  It should be made as large as it needs to be for the job, and
> > >> no larger.
> > >>
> > >> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
> > >>
> > >>> About the usage, I've checked the RAM and CPU usage and are not fully
> > >> used.
> > >>
> > >> What exactly are you looking at?  I've had people swear that they
> can't
> > >> see a problem with their systems when Solr is REALLY struggling to
> keep
> > >> up with what it has been asked to do.
> > >>
> > >> Further down on the page I linked above is a section about asking for
> > >> help.  If you can provide the screenshot it mentions there, that would
> > >> be helpful.  Here's a direct link to that section:
> > >>
> > >>
> > >>
> >
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> >
> >
>
> --
> _
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
>