Re: SolrCloud shows cluster still healthy even the node data directory is deleted

2020-11-20 Thread Radar Lei
Hi Erick,

I understand this is how the file handler works.

But for the SolrCloud users, they didn't see the expected replica failover 
happens, then we can not say SolrCloud is totally HA enabled. Do we have plan 
to handle the HA for disk failures? Thanks.

Regards,
Radar

From: Amy Bai 
Date: Wednesday, November 11, 2020 at 8:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted
Hi Erick,

Thanks for your kindly reply.
There are two things that confuse me:

1. index/search queries keep failing because one of the node data directory is 
gone, but the node is not marked as down.

2. The replicas on the failed node are not working, but the Index/search 
queries didn't failover to other healthy replicas.

Regards,
Amy

From: Erick Erickson 
Sent: Monday, November 9, 2020 8:43 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted

Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai  wrote:
>
> Hi community,
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, 
> and I can still log in to the SolrCloud   Admin UI to create/query 
> collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
>
>
> Regards,
> Amy


Re: How can shards distributed evenly among nodes

2020-02-06 Thread Radar Lei
This is weird, when we creating an index, Solr will make sure shards of an
index be distributed to all the existing nodes evenly. But after you used
'UTILIZENODE' of AutoScale, Solr will try to put all the shards of an index
to one or several nodes. Is this intentional or a bug?

For example, we have a four nodes Solr cluster, and my index 'demo' have 4
shards, Solr assigned one shard on each node evenly by default. But after
we used 'UTILIZENODE' against a new node, all the shards will be put on
Node1. This will make one of the node have heavy workload while other nodes
have no work to do.

So the problem is 'UTILIZENODE' only cares if each node have the same
number of replicas, but it won't try to distribute each index's
replica/shard to as many nodes as possible.
Any thoughts? Thanks.

Regards,
Radar


On Tue, Feb 4, 2020 at 5:20 PM Yuan Zhao  wrote:

> Hi Team,
>
> We are using autoscaling policy, we make use of the utilize node feature to
> move replica to new nod.
> But we found after replica are moved, solr can make sure the repilica
> belongs to a same shard located
> on different nodes,  but it can not make sure shard distributed evenly on
> all the nodes.
> That means a node might contain all the shards of an index.
> And, more remarkable, the shards distributed evenly before utilize node
> command is executed.
>
>index_name  | replica_name | shard_name |  node_name   |
> replica_state
>
> ---+--++--+---
>  test_index.t2 | core_node6   | shard1 | test-server:8983_solr|
> active
>  test_index.t4 | core_node7   | shard2 | test-server:8983_solr|
> active
>  test_index.t4 | core_node5   | shard1 | test-server:8983_solr|
> active
>  test_index.t2 | core_node4   | shard0 | test-server:8983_solr|
> active
>  test_index.t1 | core_node3   | shard1 | test-server:8984_solr|
> active
>  test_index.t4 | core_node8   | shard2 | test-server:8984_solr|
> active
>  test_index.t3 | core_node8   | shard1 | test-server:8984_solr|
> active
>  test_index.t2 | core_node2   | shard0 | test-server:8984_solr|
> active
>  test_index.t2 | core_node10  | shard1 | test-server:8985_solr|
> active
>  test_index.t1 | core_node18  | shard2 | test-server:8985_solr|
> active
>  test_index.t4 | core_node10  | shard1 | test-server:8985_solr|
> active
>  test_index.t3 | core_node10  | shard0 | test-server:8985_solr|
> active
>  test_index.t1 | core_node14  | shard2 | test-server:8987_solr|
> active
>  test_index.t3 | core_node14  | shard0 | test-server:8987_solr|
> active
>  test_index.t3 | core_node12  | shard1 | test-server:8987_solr|
> active
>  test_index.t1 | core_node16  | shard1 | test-server:8987_solr|
> active
>
>  Do you have any good solution to this problem.
>  The solr version we are using is 7.4.
>  The cluster policy like:
>  {
> "set-cluster-policy" : [{
>  "replica" : "<2",
>  "shard" : "#EACH",
>  "node" : "#ANY",
>  "strict" : false
> }]
> }
>
> --
> Thanks & regards,
> Yuan
>


Solr Index Data will be delete if state.json did not exists

2018-12-13 Thread Lei Wang
Hi guys,

Currently I am running a 2 nodes cloud of Solr 7.5, I already have a
collection named A and it worked fine with 20GB index Data, while I want to
create a collection named B and want to copy index data from A.
So in Solr5.5, I just copy index folder from A and renamed to B. restart
solr cluster, collection B will be register successfully to solr, and
related data will be pushed to zookeeper(leader info etc).
In Solr7.5, I assume because of
https://issues.apache.org/jira/browse/SOLR-12066, index folder B will be
deleted since no state.json info about collection B can be found in
zookeeper,

So my question is what should I do If I want B can register to solr cluster
successfully other than folder be deleted?

I have tried to set legacyCloud to true, and B can be registered to Solr
cloud successfully.  collection B status data will be pushed into
/clusterstate.json

,
I have to call MIGRATESTATEFORMAT first then remove legacyCloud or set it
to false.

So if they is any other solutions for this case?

Looking forward your response.

Thanks,
Lyle


Re: Performance on faceting using docValues

2015-03-09 Thread lei
The term histograms are shared in this link. Sorry for the confusion.

https://docs.google.com/presentation/d/1tma4hkYjxJfBTnMbO6Pq_dUHqZ0wI_UTlgoVqXtW4ZA/pub?start=false&loop=false&delayms=3000&slide=id.p


> On Mon, Mar 9, 2015 at 10:56 AM, Anshum Gupta 
> wrote:
>
>> Hi Lei,
>>
>> The mailing list doesn't allow attachments. Can you share these via a file
>> sharing platform?
>>
>> On Mon, Mar 9, 2015 at 12:48 AM, lei  wrote:
>>
>> > The Solr instance is single-shard. Index size is around 20G and total
>> doc
>> > # is about 12 million. Below are the histograms for the three facet
>> fields
>> > in my query. Thanks.
>> >
>> >
>> > On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen > >
>> > wrote:
>> >
>> >> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
>> >>
>> >> You present a very interesting observation. I have not noticed what you
>> >> describe, but on the other hand we have not done comparative speed
>> >> tests.
>> >>
>> >> > q=*:*&fq=country:"US"&fq=category:112
>> >>
>> >> First observation: Your query is '*:*, which is a "magic" query. Non-DV
>> >> faceting has optimizations both for this query (although that ought to
>> >> be disabled due to the fq) and for the "inverse" case where there are
>> >> more hits than non-hits. Perhaps you could test with a handful of
>> >> queries, which has different result sizes?
>> >>
>> >> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
>> >>
>> >> The combination of index order and a high limit might be an
>> explanation:
>> >> When resolving the Strings of the facet result, non-DV will perform
>> >> ordinal-lookup, which is fast when done in monotonic rising order
>> >> (sort=index) and if the values are close (limit=2000). I do not know if
>> >> DV benefits the same way.
>> >>
>> >> On the other hand, your limit seems to apply only to material, so it
>> >> could be that the real number of unique values is low and you just set
>> >> the limit to 2000 to be sure you get everything?
>> >>
>> >> > &facet.field=manufacturer&facet.field=seller&facet.field=material
>> >> >
>> >>
>> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
>> >> >
>> >>
>> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
>> >> > &f.material.facet.mincount=1&sort=score+desc
>> >>
>> >> How large is your index in bytes, how many documents does it contain
>> and
>> >> is it single-shard or cloud? Could you paste the loglines containing
>> >> "UnInverted field", which describes the number of unique values and
>> size
>> >> of your facet fields?
>> >>
>> >> - Toke Eskildsen, State and University Library, Denmark
>> >>
>> >>
>>
>>
>> --
>> Anshum Gupta
>>
>
>


Re: Performance on faceting using docValues

2015-03-09 Thread lei
Sure, here is the link to the image of term histograms. Thanks.

https://docs.google.com/presentation/d/1tma4hkYjxJfBTnMbO6Pq_dUHqZ0wI_UTlgoVqXtW4ZA/edit?usp=sharing

On Mon, Mar 9, 2015 at 10:56 AM, Anshum Gupta 
wrote:

> Hi Lei,
>
> The mailing list doesn't allow attachments. Can you share these via a file
> sharing platform?
>
> On Mon, Mar 9, 2015 at 12:48 AM, lei  wrote:
>
> > The Solr instance is single-shard. Index size is around 20G and total doc
> > # is about 12 million. Below are the histograms for the three facet
> fields
> > in my query. Thanks.
> >
> >
> > On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen 
> > wrote:
> >
> >> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
> >>
> >> You present a very interesting observation. I have not noticed what you
> >> describe, but on the other hand we have not done comparative speed
> >> tests.
> >>
> >> > q=*:*&fq=country:"US"&fq=category:112
> >>
> >> First observation: Your query is '*:*, which is a "magic" query. Non-DV
> >> faceting has optimizations both for this query (although that ought to
> >> be disabled due to the fq) and for the "inverse" case where there are
> >> more hits than non-hits. Perhaps you could test with a handful of
> >> queries, which has different result sizes?
> >>
> >> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
> >>
> >> The combination of index order and a high limit might be an explanation:
> >> When resolving the Strings of the facet result, non-DV will perform
> >> ordinal-lookup, which is fast when done in monotonic rising order
> >> (sort=index) and if the values are close (limit=2000). I do not know if
> >> DV benefits the same way.
> >>
> >> On the other hand, your limit seems to apply only to material, so it
> >> could be that the real number of unique values is low and you just set
> >> the limit to 2000 to be sure you get everything?
> >>
> >> > &facet.field=manufacturer&facet.field=seller&facet.field=material
> >> >
> >>
> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
> >> >
> >>
> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
> >> > &f.material.facet.mincount=1&sort=score+desc
> >>
> >> How large is your index in bytes, how many documents does it contain and
> >> is it single-shard or cloud? Could you paste the loglines containing
> >> "UnInverted field", which describes the number of unique values and size
> >> of your facet fields?
> >>
> >> - Toke Eskildsen, State and University Library, Denmark
> >>
> >>
>
>
> --
> Anshum Gupta
>


Re: Performance on faceting using docValues

2015-03-09 Thread lei
The Solr instance is single-shard. Index size is around 20G and total doc #
is about 12 million. Below are the histograms for the three facet fields in
my query. Thanks.


On Thu, Mar 5, 2015 at 11:57 PM, Toke Eskildsen 
wrote:

> On Thu, 2015-03-05 at 21:14 +0100, lei wrote:
>
> You present a very interesting observation. I have not noticed what you
> describe, but on the other hand we have not done comparative speed
> tests.
>
> > q=*:*&fq=country:"US"&fq=category:112
>
> First observation: Your query is '*:*, which is a "magic" query. Non-DV
> faceting has optimizations both for this query (although that ought to
> be disabled due to the fq) and for the "inverse" case where there are
> more hits than non-hits. Perhaps you could test with a handful of
> queries, which has different result sizes?
>
> > &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000
>
> The combination of index order and a high limit might be an explanation:
> When resolving the Strings of the facet result, non-DV will perform
> ordinal-lookup, which is fast when done in monotonic rising order
> (sort=index) and if the values are close (limit=2000). I do not know if
> DV benefits the same way.
>
> On the other hand, your limit seems to apply only to material, so it
> could be that the real number of unique values is low and you just set
> the limit to 2000 to be sure you get everything?
>
> > &facet.field=manufacturer&facet.field=seller&facet.field=material
> >
> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
> >
> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
> > &f.material.facet.mincount=1&sort=score+desc
>
> How large is your index in bytes, how many documents does it contain and
> is it single-shard or cloud? Could you paste the loglines containing
> "UnInverted field", which describes the number of unique values and size
> of your facet fields?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>


Re: Performance on faceting using docValues

2015-03-05 Thread lei
Some mistake in the previous email.

Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 100+ ms (with docValues) vs. 30+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000

The query looks like this:

q=*:*&fq=country:"US"&fq=category:112&facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000&facet.field=manufacturer&facet.field=seller&facet.field=material&f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100&f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100&f.material.facet.mincount=1&sort=score+desc

Thanks,

On Thu, Mar 5, 2015 at 11:42 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> I have one consideration on top of my head, would you mind to show a brief
> snapshot by a sampler?
>
> On Thu, Mar 5, 2015 at 10:18 PM, lei  wrote:
>
> > Hi there,
> >
> > I'm testing facet performance with vs without docValues in Solr 4.7, and
> > found that on first request, performance with docValues is much faster
> than
> > non-docValues. However, for subsequent requests (where the queries are
> > cached), the performance is slower for docValues than non-docValues. Is
> > this an expected behavior? Any idea or solution is appreciated. Thanks.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Re: Performance on faceting using docValues

2015-03-05 Thread lei
Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000



On Thu, Mar 5, 2015 at 11:18 AM, lei  wrote:

> Hi there,
>
> I'm testing facet performance with vs without docValues in Solr 4.7, and
> found that on first request, performance with docValues is much faster
> than non-docValues. However, for subsequent requests (where the queries are
> cached), the performance is slower for docValues than non-docValues. Is
> this an expected behavior? Any idea or solution is appreciated. Thanks.
>


Performance on faceting using docValues

2015-03-05 Thread lei
Hi there,

I'm testing facet performance with vs without docValues in Solr 4.7, and
found that on first request, performance with docValues is much faster than
non-docValues. However, for subsequent requests (where the queries are
cached), the performance is slower for docValues than non-docValues. Is
this an expected behavior? Any idea or solution is appreciated. Thanks.


Performance with fast vector highlighter in solr 4.x

2014-09-22 Thread lei
Hi there,

I'm using Solr 4.7 and find the fast vector highlighter is not as fast as
it used to be in solr 3.x. It seems the results are not cached, even after
several hits of the same query, it still takes dozens of milliseconds to
return. Any idea or solution is appreciated. Thanks.