Re: SolrCloud performance

2018-11-02 Thread Deepak Goel
Please see inline for my thoughts


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sat, Nov 3, 2018 at 1:08 AM Chuming Chen  wrote:

> Hi All,
>
> I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g
> -Xmx40g”), each shard has 32 million documents and 32Gbytes in size.
>
> For a given query (I use complexphrase query), typically, the first time
> it took a couple of seconds to return the first 20 docs. However, for the
> following page, or sorting by a field, even run the same query again took a
> lot longer to return results. I can see my 4 solr nodes running crazy with
> more than 100%CPU.
>
> I think the first time the query is being returned by Lucene (which is
already sorted out due to inverted field format). Second time around the
query is satisified by Solr (which is taking longer).


> My understanding is that Solr has query cache, run same query should be
> faster.
>
> What could be wrong here? How do I debug? I checked solr.log in all nodes
> and didn’t see anything unusual. Most frequent log entry looks like this.
>
> INFO  - 2018-11-02 19:32:55.189; [   ]
> org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> path=/admin/metrics
> params={wt=javabin=2=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests}
> status=0 QTime=7
> INFO  - 2018-11-02 19:32:55.192; [   ]
> org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> path=/admin/metrics
> params={wt=javabin=2=solr.jvm:os.processCpuLoad=solr.node:CONTAINER.fs.coreRoot.usableSpace=solr.jvm:os.systemLoadAverage=solr.jvm:memory.heap.used}
> status=0 QTime=1
>
> Thank you for your kind help.
>
> Chuming
>
>
>
>


Re: Index optimization takes too long

2018-11-02 Thread Shawn Heisey

On 11/2/2018 5:00 PM, Wei wrote:

After a recent schema change,  it takes almost 40 minutes to optimize the
index.  The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.


An optimize is not just a straight data copy.  Lucene is actually 
completely recalculating the index data structures.  It will never 
proceed at the full data rate your disks are capable of achieving.


I do not know how docValues actually work during a segment merge, but 
given exactly how the info relates to the inverted index, it's probably 
even more complicated than the rest of the data structures in a Lucene 
index.


On one of the systems I used to manage, back in March of 2017, I was 
seeing a 50GB index take 1.73 hours to optimize.  I do not recall 
whether I had docValues at that point, but I probably did.


http://lucene.472066.n3.nabble.com/What-is-the-bottleneck-for-an-optimise-operation-tt4323039.html#a4323140

There's not much you can do to make this go faster. Putting massively 
faster CPUs in the machine MIGHT make a difference, but it probably 
wouldn't be a BIG difference.  I'm talking about clock speed, not core 
count.


Thanks,
Shawn



Index optimization takes too long

2018-11-02 Thread Wei
Hello,

After a recent schema change,  it takes almost 40 minutes to optimize the
index.  The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.

I have tried to increase maxMergeAtOnceExplicit because the default 30
could be too low:

100

But it doesn't seem to help. Any suggestions?

Thanks,
Wei


Re: SolrCloud performance

2018-11-02 Thread Shawn Heisey

On 11/2/2018 1:38 PM, Chuming Chen wrote:

I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g 
-Xmx40g”), each shard has 32 million documents and 32Gbytes in size.


A 40GB heap is probably completely unnecessary for an index of that 
size.  Does each machine have one replica on it or two? If you are 
trying for high availability, then it will be at least two shard 
replicas per machine.


The values on -Xms and -Xmx should normally be set the same.  Java will 
always tend to allocate the entire max heap it has been allowed, so it's 
usually better to just let it have the whole amount right up front.



For a given query (I use complexphrase query), typically, the first time it 
took a couple of seconds to return the first 20 docs. However, for the 
following page, or sorting by a field, even run the same query again took a lot 
longer to return results. I can see my 4 solr nodes running crazy with more 
than 100%CPU.


Can you obtain a screenshot of a process listing as described at the 
following URL, and provide the image using a file sharing site?


https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

There are separate instructions there for Windows and for Linux/UNIX 
operating systems.


Also useful are the GC logs that are written by Java when Solr is 
started using the included scripts.  I'm looking for logfiles that cover 
several days of runtime.  You'll need to share them with a file sharing 
website -- files will not normally make it to the mailing list if 
attached to a message.


Getting a copy of the solrconfig.xml in use on your collection can also 
be helpful.



My understanding is that Solr has query cache, run same query should be faster.


If the query is absolutely identical in *every* way, then yes, it can be 
satisfied from Solr caches, if their size is sufficient.  If you change 
ANYTHING, including things like rows or start, filters, sorting, facets, 
and other parameters, then the query probably cannot be satisfied 
completely from cache.  At that point, Solr is very reliant on how much 
memory has NOT been allocated to programs -- it must be a sufficient 
quantity of memory that the Solr index data can be effectively cached.



What could be wrong here? How do I debug? I checked solr.log in all nodes and 
didn’t see anything unusual. Most frequent log entry looks like this.

INFO  - 2018-11-02 19:32:55.189; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null 
path=/admin/metrics 
params={wt=javabin=2=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests}
 status=0 QTime=7
INFO  - 2018-11-02 19:32:55.192; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null 
path=/admin/metrics 
params={wt=javabin=2=solr.jvm:os.processCpuLoad=solr.node:CONTAINER.fs.coreRoot.usableSpace=solr.jvm:os.systemLoadAverage=solr.jvm:memory.heap.used}
 status=0 QTime=1


That is not a query.  It is a call to the Metrics API. When I've made 
this call on a production Solr machine, it seems to be very 
resource-intensive, taking a long time.  I don't think it should be made 
frequently.  Probably no more than once a minute. If you are seeing that 
kind of entry in your logs a lot, then that might be contributing to 
your performance issues.


Thanks,
Shawn



SolrCloud performance

2018-11-02 Thread Chuming Chen
Hi All,

I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g 
-Xmx40g”), each shard has 32 million documents and 32Gbytes in size. 

For a given query (I use complexphrase query), typically, the first time it 
took a couple of seconds to return the first 20 docs. However, for the 
following page, or sorting by a field, even run the same query again took a lot 
longer to return results. I can see my 4 solr nodes running crazy with more 
than 100%CPU.

My understanding is that Solr has query cache, run same query should be faster.

What could be wrong here? How do I debug? I checked solr.log in all nodes and 
didn’t see anything unusual. Most frequent log entry looks like this.

INFO  - 2018-11-02 19:32:55.189; [   ] org.apache.solr.servlet.HttpSolrCall; 
[admin] webapp=null path=/admin/metrics 
params={wt=javabin=2=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests}
 status=0 QTime=7
INFO  - 2018-11-02 19:32:55.192; [   ] org.apache.solr.servlet.HttpSolrCall; 
[admin] webapp=null path=/admin/metrics 
params={wt=javabin=2=solr.jvm:os.processCpuLoad=solr.node:CONTAINER.fs.coreRoot.usableSpace=solr.jvm:os.systemLoadAverage=solr.jvm:memory.heap.used}
 status=0 QTime=1

Thank you for your kind help.

Chuming





Re: solr cloud - hdfs folder structure best practice

2018-11-02 Thread lstusr 5u93n4
Great, thanks for the response. This is how we have it configured now, but
we just had the idea the other day that maybe it would be better
otherwise...

And thhanks for the blog post! We ended up with basically the same config,
so it's good to see that validated.

Kyle



On Fri, 2 Nov 2018 at 13:42, Kevin Risden  wrote:

> I prefer a single HDFS home since it definitely simplifies things. No need
> to create folders for each node or anything like that if you add nodes to
> the cluster. The replicas underneath will get their own folders. I don't
> know if there are issues with autoAddReplicas or other types of failovers
> if there are different home folders.
>
> I've run Solr on HDFS with the same basic configs as listed here:
>
> https://risdenk.github.io/2018/10/23/apache-solr-running-on-apache-hadoop-hdfs.html
>
> Kevin Risden
>
>
> On Fri, Nov 2, 2018 at 1:19 PM lstusr 5u93n4  wrote:
>
> > Hi All,
> >
> > Here's a question that I can't find an answer to in the documentation:
> >
> > When configuring solr cloud with HDFS, is it best to:
> >   a) provide a unique hdfs folder for each solr cloud instance
> > or
> >   b) provide the same hdfs folder to all solr cloud instances.
> >
> > So for example, if I have two solr cloud nodes, I can configure them
> either
> > with:
> >
> >node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1
> >node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2
> >
> > Or I could configure both nodes with:
> >
> > -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr
> >
> > In the second option, all solr cloud nodes can "see" all index files from
> > all other solr cloud nodes. Are there pros or cons to allowing the all of
> > the solr nodes to see all files in the collection?
> >
> > Thanks,
> >
> > Kyle
> >
>


Re: solr cloud - hdfs folder structure best practice

2018-11-02 Thread Kevin Risden
I prefer a single HDFS home since it definitely simplifies things. No need
to create folders for each node or anything like that if you add nodes to
the cluster. The replicas underneath will get their own folders. I don't
know if there are issues with autoAddReplicas or other types of failovers
if there are different home folders.

I've run Solr on HDFS with the same basic configs as listed here:
https://risdenk.github.io/2018/10/23/apache-solr-running-on-apache-hadoop-hdfs.html

Kevin Risden


On Fri, Nov 2, 2018 at 1:19 PM lstusr 5u93n4  wrote:

> Hi All,
>
> Here's a question that I can't find an answer to in the documentation:
>
> When configuring solr cloud with HDFS, is it best to:
>   a) provide a unique hdfs folder for each solr cloud instance
> or
>   b) provide the same hdfs folder to all solr cloud instances.
>
> So for example, if I have two solr cloud nodes, I can configure them either
> with:
>
>node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1
>node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2
>
> Or I could configure both nodes with:
>
> -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr
>
> In the second option, all solr cloud nodes can "see" all index files from
> all other solr cloud nodes. Are there pros or cons to allowing the all of
> the solr nodes to see all files in the collection?
>
> Thanks,
>
> Kyle
>


solr cloud - hdfs folder structure best practice

2018-11-02 Thread lstusr 5u93n4
Hi All,

Here's a question that I can't find an answer to in the documentation:

When configuring solr cloud with HDFS, is it best to:
  a) provide a unique hdfs folder for each solr cloud instance
or
  b) provide the same hdfs folder to all solr cloud instances.

So for example, if I have two solr cloud nodes, I can configure them either
with:

   node1: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node1
   node2: -Dsolr.hdfs.home=hdfs://my.hfds:9000/solr/node2

Or I could configure both nodes with:

-Dsolr.hdfs.home=hdfs://my.hfds:9000/solr

In the second option, all solr cloud nodes can "see" all index files from
all other solr cloud nodes. Are there pros or cons to allowing the all of
the solr nodes to see all files in the collection?

Thanks,

Kyle


Re: Solr OCR Support

2018-11-02 Thread Tim Allison
+1 Thank you, Daniel.  If you have any interest in helping out on
TIKA-2749, please join the fun. :D
On Fri, Nov 2, 2018 at 12:12 PM Davis, Daniel (NIH/NLM) [C]
 wrote:
>
> I think that you also have to process a PDF pretty deeply to decide if you 
> want it to be OCR.   I have worked on projects where all of the PDFs are 
> really like faxes - images are encoded in JBIG2 black and white or similar, 
> and there is really one image per page, and no text.   I have also worked on 
> projects where it really is unstructured data, but if a PDF has one image per 
> page and have no text, they should be OCRd.
>
> I've had problems, not with Tesseract, but even with Nuance OCR OEM 
> libraries, where text was missed because one image was the top of the 
> letters, and the image on the next line was the bottom half of the letters.   
> I don't mean to ding Nuance (or tesseract), I just wish to point out that 
> what to OCR is important, because OCR works well when it has good input.
>
> > -Original Message-
> > From: Tim Allison 
> > Sent: Friday, November 2, 2018 11:03 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr OCR Support
> >
> > OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr!  We
> > have an open ticket to make it "just work", but we aren't there yet
> > (TIKA-2749).
> >
> > You have to tell Tika how you want to process images from PDFs via the
> > tika-config.xml file.
> >
> > You've seen this link in the links you mentioned:
> > https://wiki.apache.org/tika/TikaOCR
> >
> > This one is key for PDFs:
> > https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29#OCR
> > On Fri, Nov 2, 2018 at 10:30 AM Furkan KAMACI 
> > wrote:
> > >
> > > Hi All,
> > >
> > > I want to index images and pdf documents which have images into Solr. I
> > > test it with my Solr 6.3.0.
> > >
> > > I've installed tesseract at my computer (Mac). I verify that Tesseract
> > > works fine to extract text from an image.
> > >
> > > I index image into Solr but it has no content. However, as far as I know, 
> > > I
> > > don't need to do anything else to integrate Tesseract with Solr.
> > >
> > > I've checked these but they were not useful for me:
> > >
> > > http://lucene.472066.n3.nabble.com/TIKA-OCR-not-working-
> > td4201834.html
> > > http://lucene.472066.n3.nabble.com/Fwd-configuring-Solr-with-Tesseract-
> > td4361908.html
> > >
> > > My question is, how can I support OCR with Solr?
> > >
> > > Kind Regards,
> > > Furkan KAMACI


RE: Solr OCR Support

2018-11-02 Thread Davis, Daniel (NIH/NLM) [C]
I think that you also have to process a PDF pretty deeply to decide if you want 
it to be OCR.   I have worked on projects where all of the PDFs are really like 
faxes - images are encoded in JBIG2 black and white or similar, and there is 
really one image per page, and no text.   I have also worked on projects where 
it really is unstructured data, but if a PDF has one image per page and have no 
text, they should be OCRd.

I've had problems, not with Tesseract, but even with Nuance OCR OEM libraries, 
where text was missed because one image was the top of the letters, and the 
image on the next line was the bottom half of the letters.   I don't mean to 
ding Nuance (or tesseract), I just wish to point out that what to OCR is 
important, because OCR works well when it has good input.

> -Original Message-
> From: Tim Allison 
> Sent: Friday, November 2, 2018 11:03 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr OCR Support
> 
> OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr!  We
> have an open ticket to make it "just work", but we aren't there yet
> (TIKA-2749).
> 
> You have to tell Tika how you want to process images from PDFs via the
> tika-config.xml file.
> 
> You've seen this link in the links you mentioned:
> https://wiki.apache.org/tika/TikaOCR
> 
> This one is key for PDFs:
> https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29#OCR
> On Fri, Nov 2, 2018 at 10:30 AM Furkan KAMACI 
> wrote:
> >
> > Hi All,
> >
> > I want to index images and pdf documents which have images into Solr. I
> > test it with my Solr 6.3.0.
> >
> > I've installed tesseract at my computer (Mac). I verify that Tesseract
> > works fine to extract text from an image.
> >
> > I index image into Solr but it has no content. However, as far as I know, I
> > don't need to do anything else to integrate Tesseract with Solr.
> >
> > I've checked these but they were not useful for me:
> >
> > http://lucene.472066.n3.nabble.com/TIKA-OCR-not-working-
> td4201834.html
> > http://lucene.472066.n3.nabble.com/Fwd-configuring-Solr-with-Tesseract-
> td4361908.html
> >
> > My question is, how can I support OCR with Solr?
> >
> > Kind Regards,
> > Furkan KAMACI


Re: TLOG replica stucks

2018-11-02 Thread Shawn Heisey

On 11/2/2018 3:12 AM, Vadim Ivanov wrote:

It seems to me that issue related with:
- restart solr node
- rebalance leader
- reload collection
- reload core (Core admin is not forbidden but seems obsolete in SolrCloud)


In SolrCloud, CoreAdmin is an expert option.  Many of the things that 
the Collections API does are implemented internally with code that 
includes calls to the CoreAdmin API ... but using CoreAdmin directly is 
strongly discouraged, especially for anything related to manipulating 
replicas or creating indexes.  It is possible to use CoreAdmin for many 
of these things successfully, but it's also very easy to use it 
incorrectly and cause problems that are difficult to fix.  We recommend 
not using it at all, even if you're intimately familiar the SolrCloud code.


When you reload a collection, all cores (shard replicas) that make up 
the collection are reloaded, even if they are on separate machines.  So 
you do not need to use CoreAdmin to do a reload.  Situations where one 
core in a collection needs a reload but other cores do not are rare.


None of what I've written above addresses the problem that started the 
thread, it's about your note in parentheses on this message.


I don't know any more than the other people responding do about why your 
replica is getting out of sync.  If you can come up with simple step by 
step instructions for reproducing the problem that begin with "download 
the X.Y.Z binary version of Solr", that will make it much easier to 
diagnose.  Until the issue can be seen first-hand and there's something 
useful in Solr's log, we're guessing about what could be going wrong.  
Once we can reproduce it, the odds of getting you a new version that 
doesn't have the problem go up significantly.


Thanks,
Shawn



Re: Solr OCR Support

2018-11-02 Thread Tim Allison
OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr!  We
have an open ticket to make it "just work", but we aren't there yet
(TIKA-2749).

You have to tell Tika how you want to process images from PDFs via the
tika-config.xml file.

You've seen this link in the links you mentioned:
https://wiki.apache.org/tika/TikaOCR

This one is key for PDFs:
https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29#OCR
On Fri, Nov 2, 2018 at 10:30 AM Furkan KAMACI  wrote:
>
> Hi All,
>
> I want to index images and pdf documents which have images into Solr. I
> test it with my Solr 6.3.0.
>
> I've installed tesseract at my computer (Mac). I verify that Tesseract
> works fine to extract text from an image.
>
> I index image into Solr but it has no content. However, as far as I know, I
> don't need to do anything else to integrate Tesseract with Solr.
>
> I've checked these but they were not useful for me:
>
> http://lucene.472066.n3.nabble.com/TIKA-OCR-not-working-td4201834.html
> http://lucene.472066.n3.nabble.com/Fwd-configuring-Solr-with-Tesseract-td4361908.html
>
> My question is, how can I support OCR with Solr?
>
> Kind Regards,
> Furkan KAMACI


Solr OCR Support

2018-11-02 Thread Furkan KAMACI
Hi All,

I want to index images and pdf documents which have images into Solr. I
test it with my Solr 6.3.0.

I've installed tesseract at my computer (Mac). I verify that Tesseract
works fine to extract text from an image.

I index image into Solr but it has no content. However, as far as I know, I
don't need to do anything else to integrate Tesseract with Solr.

I've checked these but they were not useful for me:

http://lucene.472066.n3.nabble.com/TIKA-OCR-not-working-td4201834.html
http://lucene.472066.n3.nabble.com/Fwd-configuring-Solr-with-Tesseract-td4361908.html

My question is, how can I support OCR with Solr?

Kind Regards,
Furkan KAMACI


Re: SolrCloud Replication Failure

2018-11-02 Thread Jeremy Smith
Hi Susheel,

 Yes, it appears that under certain conditions, if a follower is down when 
the leader gets an update, the follower will not receive that update when it 
comes back (or maybe it receives the update and it's then overwritten by its 
own transaction logs, I'm not sure).  Furthermore, if that follower then 
becomes the leader, it will replicate its own out of date value back to the 
former leader, even though the version number is lower.


   -Jeremy


From: Susheel Kumar 
Sent: Thursday, November 1, 2018 2:57:00 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Replication Failure

Are we saying it has to do something with stop and restarting replica's
otherwise I haven't seen/heard any issues with document updates and
forwarding to replica's...

Thanks,
Susheel

On Thu, Nov 1, 2018 at 12:58 PM Erick Erickson 
wrote:

> So  this seems like it absolutely needs a JIRA
> On Thu, Nov 1, 2018 at 9:39 AM Kevin Risden  wrote:
> >
> > I pushed 3 branches that modifies test.sh to test 5.5, 6.6, and 7.5
> locally
> > without docker. I still see the same behavior where the latest updates
> > aren't on the replicas. I still don't know what is happening but it
> happens
> > without Docker :(
> >
> >
> https://github.com/risdenk/test-solr-start-stop-replica-consistency/branches
> >
> > Kevin Risden
> >
> >
> > On Thu, Nov 1, 2018 at 11:41 AM Kevin Risden  wrote:
> >
> > > Erick - Yea thats a fair point. Would be interesting to see if this
> fails
> > > without Docker.
> > >
> > > Kevin Risden
> > >
> > >
> > > On Thu, Nov 1, 2018 at 11:06 AM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Kevin:
> > >>
> > >> You're also using Docker, right? Docker is not "officially" supported
> > >> although there's some movement in that direction and if this is only
> > >> reproducible in Docker than it's a clue where to look
> > >>
> > >> Erick
> > >> On Wed, Oct 31, 2018 at 7:24 PM
> > >> Kevin Risden
> > >>  wrote:
> > >> >
> > >> > I haven't dug into why this is happening but it definitely
> reproduces. I
> > >> > removed the local requirements (port mapping and such) from the
> gist you
> > >> > posted (very helpful). I confirmed this fails locally and on Travis
> CI.
> > >> >
> > >> > https://github.com/risdenk/test-solr-start-stop-replica-consistency
> > >> >
> > >> > I don't even see the first update getting applied from num 10 -> 20.
> > >> After
> > >> > the first update there is no more change.
> > >> >
> > >> > Kevin Risden
> > >> >
> > >> >
> > >> > On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith 
> > >> wrote:
> > >> >
> > >> > > Thanks Erick, this is 7.5.0.
> > >> > > 
> > >> > > From: Erick Erickson 
> > >> > > Sent: Wednesday, October 31, 2018 8:20:18 PM
> > >> > > To: solr-user
> > >> > > Subject: Re: SolrCloud Replication Failure
> > >> > >
> > >> > > What version of solr? This code was pretty much rewriten in 7.3
> IIRC
> > >> > >
> > >> > > On Wed, Oct 31, 2018, 10:47 Jeremy Smith  wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > >  We are currently running a moderately large instance of
> > >> standalone
> > >> > > > solr and are preparing to switch to solr cloud to help us scale
> > >> up.  I
> > >> > > have
> > >> > > > been running a number of tests using docker locally and ran
> into an
> > >> issue
> > >> > > > where replication is consistently failing.  I have pared down
> the
> > >> test
> > >> > > case
> > >> > > > as minimally as I could.  Here's a link for the
> docker-compose.yml
> > >> (I put
> > >> > > > it in a directory called solrcloud_simple) and a script to run
> the
> > >> test:
> > >> > > >
> > >> > > >
> > >> > > >
> https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> > >> > > >
> > >> > > >
> > >> > > > Here's the basic idea behind the test:
> > >> > > >
> > >> > > >
> > >> > > > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard,
> and 2
> > >> > > > replicas (each node gets a replica).  Just use the default
> schema,
> > >> > > although
> > >> > > > I've also tried our schema and got the same result.
> > >> > > >
> > >> > > >
> > >> > > > 2) Shut down solr-2
> > >> > > >
> > >> > > >
> > >> > > > 3) Add 100 simple docs, just id and a field called num.
> > >> > > >
> > >> > > >
> > >> > > > 4) Start solr-2 and check that it received the documents.  It
> did!
> > >> > > >
> > >> > > >
> > >> > > > 5) Update a document, commit, and check that solr-2 received the
> > >> update.
> > >> > > > It did!
> > >> > > >
> > >> > > >
> > >> > > > 6) Stop solr-2, update the same document, start solr-2, and make
> > >> sure
> > >> > > that
> > >> > > > it received the update.  It did!
> > >> > > >
> > >> > > >
> > >> > > > 7) Repeat step 6 with a new value.  This time solr-2 reverts
> back
> > >> to what
> > >> > > > it had in step 5.
> > >> > > >
> > >> > > >
> > >> > > > I believe the main issue comes from this in the logs:
> > >> > > >
> > >> > 

RE: TLOG replica stucks

2018-11-02 Thread Vadim Ivanov
It seems to me that issue related with:
- restart solr node
- rebalance leader
- reload collection
- reload core (Core admin is not forbidden but seems obsolete in SolrCloud)
If nothing is changing in cluster state everything goes smoothly.
May be it can be reproduced wit the same test as in " SolrCloud Replication 
Failure" branch
-- Vadim

> -Original Message-
> From: Ere Maijala [mailto:ere.maij...@helsinki.fi]
> Sent: Thursday, November 01, 2018 5:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: TLOG replica stucks
> 
> Could it be related to reloading a collection? I need to do some
> testing, but it just occurred to me that reload was done at least once
> during the period the cluster had been up.
> 
> Regards,
> Ere
> 
> Ere Maijala kirjoitti 30.10.2018 klo 12.03:
> > Hi,
> >
> > We had the same happen with PULL replicas with Solr 7.5. Solr was
> > showing that they all had correct index version, but the changes were
> > not showing. Unfortunately the solr.log size was too small to catch any
> > issues, so I've now increased and waiting for it to happen again.
> >
> > Regards,
> > Ere
> >
> > Vadim Ivanov kirjoitti 25.10.2018 klo 18.42:
> >> Thanks Erick for you attention!
> >> My comments below, but supposing that the problem resides in zookeeper
> >> I'll collect more information  from zk logs and solr logs and be back
> >> soon.
> >>
> >>> bq. I've noticed that some replicas stop receiving updates from the
> >>> leader without any visible signs from the cluster status.
> >>>
> >>> Hmm, yes, this isn't expected at all. What are you seeing that causes
> >>> you to say this? You'd have to be monitoring the log for update
> >>> messages to the replicas that aren't leaders or the like.  If anyone is
> >>> going to have a prayer of reproducing we'll need more info on exactly
> >>> what you're seeing and how you're measuring this.
> >>
> >> Meanwhile, I have log level WARN... I'l decrease  it to INFO and see. Tnx
> >>
> >>>
> >>> Have you changed any configurations in your replicas at all? We'd need
> >>> the exact steps you performed if so.
> >> Command to create replicas was like this (implicit sharding and custom
> >> CoreName ) :
> >>
> >> mysolr07:8983/solr/admin/collections?action=ADDREPLICA
> >> =rpk94
> >> =rpk94_1_0
> >> =rpk94_1_0_07
> >> =tlog
> >> =mysolr07:8983_solr
> >>
> >>>
> >>> On a quick test I didn't see this, but if it were that easy to
> >>> reproduce I'd expect it to have shown up before.
> >>
> >> Yesterday I've tried to reproduce...  trying to change leader with
> >> REBALANCELEADERS command.
> >> It ended up with no leader at all for the shard  and I could not set
> >> leader at all for a long time.
> >>
> >> There was a problem trying to register as the
> >> leader:org.apache.solr.common.SolrException: Could not register as the
> >> leader because creating the ephemeral registration node in ZooKeeper
> >> failed
> >> ...
> >> Deleting duplicate registration:
> >>
> /collections/rpk94/leader_elect/rpk94_1_117/election/298318118789952308
> 5-core_node73-n_22
> >>
> >> ...
> >>Index fetch failed :org.apache.solr.common.SolrException: No
> >> registered leader was found after waiting for 4000ms , collection:
> >> rpk94 slice: rpk94_1_117
> >> ...
> >>
> >> Even to delete all replicas for the shard and recreate Replica to the
> >> same node with the same name did not help - no leader for that shard.
> >> I had to delete collection, wait till morning and then it recreated
> >> successfully.
> >> Suppose some weird znodes were deleted from  zk by morning.
> >>
> >>>
> >>> NOTE: just looking at the cloud graph and having a node be active is
> >>> not _necessarily_ sufficient for the node to be up to date. It
> >>> _should_ be sufficient if (and only if) the node was shut down
> >>> gracefully, but a "kill -9" or similar doesn't give the replicas on
> >>> the node the opportunity to change the state. The "live_nodes" znode
> >>> in ZooKeeper must also contain the node the replica resides on.
> >>
> >> Node was live, cluster was healthy
> >>
> >>>
> >>> If you see this state again, you could try pinging the node directly,
> >>> does it respond? Your URL should look something like:
> >>>
> http://host:port/solr/colection_shard1_replica_t1/query?q=*:*=false
> >>>
> >>
> >> Yes, sure I did. Ill replica responded and number of documents differs
> >> with the leader
> >>
> >>>
> >>> The "distrib=false" is important as it won't forward the query to any
> >>> other replica. If what you're reporting is really happening, that node
> >>> should respond with a document count different from other nodes.
> >>>
> >>> NOTE: there's a delay between the time the leader indexes a doc and
> >>> it's visible on the follower. Are you sure you're waiting for
> >>> leader_commit_interval+polling_interval+autowarm_time before
> >>> concluding that there's a problem? I'm a bit suspicious that checking
> >>> the versions is concluding that your indexes are