Re: Need help to configure automated deletion of shard in solr

2020-12-01 Thread Pushkar Mishra
Hi Erick,
It is implicit.
TTL thing I have explored but due to some complications we can't use. that .
Let me explain the actual use case .

We have limited space ,we can't keep storing the document for infinite
time  . So based on the customer's retention policy ,I need to delete the
documents. And in this process  if any shard gets empty , need to delete
the shard as well.

So lets say , is there a way to know, when solr completes the purging of
deleted documents, then based on that flag we can configure shard deletion

Thanks
Pushkar

On Tue, Dec 1, 2020 at 9:02 PM Erick Erickson 
wrote:

> This is still confusing. You haven’t told us what router you are using,
> compositeId or implicit?
>
> If you’re using compositeId (the default), you will never have empty shards
> because docs get assigned to shards via a hashing algorithm that
> distributes
> them very evenly across all available shards. You cannot delete any
> shard when using compositeId as your routing method.
>
> If you don’t know which router you’re using, then you’re using compositeId.
>
> NOTE: for the rest, “documents” means non-deleted documents. Solr will
> take care of purging the deleted documents automatically.
>
> I think you’re making this much more difficult than you need to. Assuming
> that the total number of documents remains relatively constant, you can
> just
> let Solr take care of it all and not bother with trying to individually
> manage
> shards by using the default compositeID routing.
>
> If the number of docs increases you might need to use splitshard. But it
> sounds like the total number of “live” documents isn’t going to increase.
>
> For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire
> after,
> say, 30 dayswhich it doesn’t sound like you do, you can use
> the “Time Routed Alias” option, see:
> https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html
>
> Assuming your TTL isn’t a fixed-interval, you can configure
> DocExpirationUpdateProcessorFactory to deal with TTL automatically.
>
> And if you still think you need to handle this, you need to explain exactly
> what problem you’re trying to solve because so far it appears that
> you’re simply taking on way more work than you need to.
>
> Best,
> Erick
>
> > On Dec 1, 2020, at 9:46 AM, Pushkar Mishra 
> wrote:
> >
> > Hi Team,
> > As I explained the use case , can someone help me out to find out the
> > configuration way to delete the shard here ?
> > A quick response  will be greatly appreciated.
> >
> > Regards
> > Pushkar
> >
> >
> > On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
> > wrote:
> >
> >>
> >>
> >> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
> >> wrote:
> >>
> >>> Hi Erick,
> >>> First of all thanks for your response . I will check the possibility  .
> >>> Let me explain my problem  in detail :
> >>>
> >>> 1. We have other use cases where we are making use of listener on
> >>> postCommit to delete/shift/split the shards . So we have capability to
> >>> delete the shards .
> >>> 2. The current use case is , where we have to delete the documents from
> >>> the shard , and during deletion process(it will be scheduled process,
> may
> >>> be hourly or daily, which will delete the documents) , if shards  gets
> >>> empty (or may be lets  say nominal documents are left ) , then delete
> the
> >>> shard.  And I am exploring to do this using configuration .
> >>>
> >> 3. Also it will not be in live shard for sure as only those documents
> are
> >> deleted which have TTL got over . TTL could be a month or year.
> >>
> >> Please assist if you have any config based idea on this
> >>
> >>> Regards
> >>> Pushkar
> >>>
> >>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
> >>> wrote:
> >>>
>  Are you using the implicit router? Otherwise you cannot delete a
> shard.
>  And you won’t have any shards that have zero documents anyway.
> 
>  It’d be a little convoluted, but you could use the collections
> COLSTATUS
>  Api to
>  find the names of all your replicas. Then query _one_ replica of each
>  shard with something like
>  solr/collection1_shard1_replica_n1/q=*:*=false
> 
>  that’ll return the number of live docs (i.e. non-deleted docs) and if
>  it’s zero
>  you can delete the shard.
> 
>  But the implicit router requires you take complete control of where
>  documents
>  go, i.e. which shard they land on.
> 
>  This really sounds like an XY problem. What’s the use  case you’re
> trying
>  to support where you expect a shard’s number of live docs to drop to
>  zero?
> 
>  Best,
>  Erick
> 
> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
>  wrote:
> >
> > Hi Solr team,
> >
> > I am using solr cloud.(version 8.5.x). I have a need to find out a
> > configuration where I can delete a shard , when number of documents
>  reaches
> > to zero in the shard , can some one help me out to achieve that ?
> >
> 

java.lang.IllegalArgumentException: Comparison method violates its general contract

2020-12-01 Thread Dawn
hello:

jdk11 solr8.7.0
org.apache.solr.common.SolrException: java.lang.IllegalArgumentException: 
Comparison method violates its general contract

QueryRescorer line 114
Comparator sortDocComparator = new Comparator() {
  @Override
  public int compare(ScoreDoc a, ScoreDoc b) {
// Sort by score descending, then docID ascending:
if (a.score > b.score) {
  return -1;
} else if (a.score < b.score) {
  return 1;
} else {
  // This subtraction can't overflow int
  // because docIDs are >= 0:
  return a.doc - b.doc;
}
  }
};


After 1.7, the Collections.sort() method needs to return the result of 0.
Do I have to change this to -Float.compare(a.score, b.score)

Without adding JVM parameters: -Djava.util.Arrays.useLegacyMergeSort=true



Re: Solr8.7 - How to optmize my index ?

2020-12-01 Thread Walter Underwood
Even better DO NOT OPTIMIZE.

Just let Solr manage the indexes automatically.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 1, 2020, at 11:31 AM, Info MatheoSoftware  
> wrote:
> 
> Hi All,
> 
> 
> 
> I found the solution, I must do :
> 
> curl ‘http://xxx:8983/solr/my_core/update?
> 
> commit=true=true’
> 
> 
> 
> It works fine
> 
> 
> 
> Thanks,
> 
> Bruno
> 
> 
> 
> 
> 
> 
> 
> De : Matheo Software [mailto:i...@matheo-software.com]
> Envoyé : mardi 1 décembre 2020 13:28
> À : solr-user@lucene.apache.org
> Objet : Solr8.7 - How to optmize my index ?
> 
> 
> 
> Hi All,
> 
> 
> 
> With Solr5.4, I used the UI button but in Solr8.7 UI this button is missing.
> 
> 
> 
> So I decide to use the command line:
> 
> curl http://xxx:8983/solr/my_core/update?optimize=true
> 
> 
> 
> My collection my_core exists of course.
> 
> 
> 
> The answer of the command line is:
> 
> {
> 
>  "responseHeader":{
> 
>"status":0,
> 
>"QTime":18}
> 
> }
> 
> 
> 
> But nothing change.
> 
> I always have 38M deleted docs in my collection and directory size no change
> like with solr5.4.
> 
> The size of the collection stay always at : 466.33Go
> 
> 
> 
> Could you tell me how can I purge deleted docs ?
> 
> 
> 
> Cordialement, Best Regards
> 
> Bruno Mannina
> 
>  www.matheo-software.com
> 
>  www.patent-pulse.com
> 
> Tél. +33 0 970 738 743
> 
> Mob. +33 0 634 421 817
> 
>  facebook (1)
>  1425551717
>  1425551737
>  1425551760
> 
> 
> 
> 
> 
>  _
> 
> 
>  Avast logo
> 
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> www.avast.com 
> 
> 
> 
> 
> 
> 
> 
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus



RE: Solr8.7 - How to optmize my index ?

2020-12-01 Thread Info MatheoSoftware
Hi All,



I found the solution, I must do :

curl ‘http://xxx:8983/solr/my_core/update?

commit=true=true’



It works fine



Thanks,

Bruno







De : Matheo Software [mailto:i...@matheo-software.com]
Envoyé : mardi 1 décembre 2020 13:28
À : solr-user@lucene.apache.org
Objet : Solr8.7 - How to optmize my index ?



Hi All,



With Solr5.4, I used the UI button but in Solr8.7 UI this button is missing.



So I decide to use the command line:

curl http://xxx:8983/solr/my_core/update?optimize=true



My collection my_core exists of course.



The answer of the command line is:

{

  "responseHeader":{

"status":0,

"QTime":18}

}



But nothing change.

I always have 38M deleted docs in my collection and directory size no change
like with solr5.4.

The size of the collection stay always at : 466.33Go



Could you tell me how can I purge deleted docs ?



Cordialement, Best Regards

Bruno Mannina

  www.matheo-software.com

  www.patent-pulse.com

Tél. +33 0 970 738 743

Mob. +33 0 634 421 817

  facebook (1)
 1425551717
 1425551737
 1425551760





  _


  Avast logo

L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
www.avast.com 







--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


Solr8.7 - How to optmize my index ?

2020-12-01 Thread Matheo Software
Hi All,



With Solr5.4, I used the UI button but in Solr8.7 UI this button is missing.



So I decide to use the command line:

curl http://xxx:8983/solr/my_core/update?optimize=true



My collection my_core exists of course.



The answer of the command line is:

{

  "responseHeader":{

"status":0,

"QTime":18}

}



But nothing change.

I always have 38M deleted docs in my collection and directory size no change
like with solr5.4.

The size of the collection stay always at : 466.33Go



Could you tell me how can I purge deleted docs ?



Cordialement, Best Regards

Bruno Mannina

  www.matheo-software.com

  www.patent-pulse.com

Tél. +33 0 970 738 743

Mob. +33 0 634 421 817

  facebook (1)
 1425551717
 1425551737
 1425551760





--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


Re: Can solr index replacement character

2020-12-01 Thread Erick Erickson
Solr handles UTF-8, so it should be able to. The problem you’ll have is
getting the UTF-8 characters to get through all the various transport
encodings, i.e. if you try to search from a browser, you need to encode
it so the browser passes it through. If you search through SolrJ, it needs
to be encoded at that level. If you use cURL, it needs another….

> On Dec 1, 2020, at 12:30 AM, Eran Buchnick  wrote:
> 
> Hi community,
> During integration tests with new data source I have noticed weird scenario
> where replacement character can't be searched, though, seems to be stored.
> I mean, honestly, I don't want that irrelevant data stored in my index but
> I wondered if solr can index replacement character (U+FFFD �) as string, if
> so, how to search it?
> And in general, is there any built-in char filtration?!
> 
> Thanks



Re: Need help to configure automated deletion of shard in solr

2020-12-01 Thread Erick Erickson
This is still confusing. You haven’t told us what router you are using, 
compositeId or implicit?

If you’re using compositeId (the default), you will never have empty shards
because docs get assigned to shards via a hashing algorithm that distributes
them very evenly across all available shards. You cannot delete any
shard when using compositeId as your routing method.

If you don’t know which router you’re using, then you’re using compositeId.

NOTE: for the rest, “documents” means non-deleted documents. Solr will
take care of purging the deleted documents automatically.

I think you’re making this much more difficult than you need to. Assuming
that the total number of documents remains relatively constant, you can just
let Solr take care of it all and not bother with trying to individually manage
shards by using the default compositeID routing.

If the number of docs increases you might need to use splitshard. But it
sounds like the total number of “live” documents isn’t going to increase.

For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire after,
say, 30 dayswhich it doesn’t sound like you do, you can use
the “Time Routed Alias” option, see:
https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html

Assuming your TTL isn’t a fixed-interval, you can configure
DocExpirationUpdateProcessorFactory to deal with TTL automatically.

And if you still think you need to handle this, you need to explain exactly
what problem you’re trying to solve because so far it appears that 
you’re simply taking on way more work than you need to.

Best,
Erick

> On Dec 1, 2020, at 9:46 AM, Pushkar Mishra  wrote:
> 
> Hi Team,
> As I explained the use case , can someone help me out to find out the
> configuration way to delete the shard here ?
> A quick response  will be greatly appreciated.
> 
> Regards
> Pushkar
> 
> 
> On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
> wrote:
> 
>> 
>> 
>> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
>> wrote:
>> 
>>> Hi Erick,
>>> First of all thanks for your response . I will check the possibility  .
>>> Let me explain my problem  in detail :
>>> 
>>> 1. We have other use cases where we are making use of listener on
>>> postCommit to delete/shift/split the shards . So we have capability to
>>> delete the shards .
>>> 2. The current use case is , where we have to delete the documents from
>>> the shard , and during deletion process(it will be scheduled process, may
>>> be hourly or daily, which will delete the documents) , if shards  gets
>>> empty (or may be lets  say nominal documents are left ) , then delete the
>>> shard.  And I am exploring to do this using configuration .
>>> 
>> 3. Also it will not be in live shard for sure as only those documents are
>> deleted which have TTL got over . TTL could be a month or year.
>> 
>> Please assist if you have any config based idea on this
>> 
>>> Regards
>>> Pushkar
>>> 
>>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
>>> wrote:
>>> 
 Are you using the implicit router? Otherwise you cannot delete a shard.
 And you won’t have any shards that have zero documents anyway.
 
 It’d be a little convoluted, but you could use the collections COLSTATUS
 Api to
 find the names of all your replicas. Then query _one_ replica of each
 shard with something like
 solr/collection1_shard1_replica_n1/q=*:*=false
 
 that’ll return the number of live docs (i.e. non-deleted docs) and if
 it’s zero
 you can delete the shard.
 
 But the implicit router requires you take complete control of where
 documents
 go, i.e. which shard they land on.
 
 This really sounds like an XY problem. What’s the use  case you’re trying
 to support where you expect a shard’s number of live docs to drop to
 zero?
 
 Best,
 Erick
 
> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
 wrote:
> 
> Hi Solr team,
> 
> I am using solr cloud.(version 8.5.x). I have a need to find out a
> configuration where I can delete a shard , when number of documents
 reaches
> to zero in the shard , can some one help me out to achieve that ?
> 
> 
> It is urgent , so a quick response will be highly appreciated .
> 
> Thanks
> Pushkar
> 
> --
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well
 thought
> of... So start responding rather than reacting in life"
 
 
> 
> -- 
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well thought
> of... So start responding rather than reacting in life"



Re: Need help to configure automated deletion of shard in solr

2020-12-01 Thread Pushkar Mishra
Hi Team,
As I explained the use case , can someone help me out to find out the
configuration way to delete the shard here ?
A quick response  will be greatly appreciated.

Regards
Pushkar


On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra 
wrote:

>
>
> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra 
> wrote:
>
>> Hi Erick,
>> First of all thanks for your response . I will check the possibility  .
>> Let me explain my problem  in detail :
>>
>> 1. We have other use cases where we are making use of listener on
>> postCommit to delete/shift/split the shards . So we have capability to
>> delete the shards .
>> 2. The current use case is , where we have to delete the documents from
>> the shard , and during deletion process(it will be scheduled process, may
>> be hourly or daily, which will delete the documents) , if shards  gets
>> empty (or may be lets  say nominal documents are left ) , then delete the
>> shard.  And I am exploring to do this using configuration .
>>
> 3. Also it will not be in live shard for sure as only those documents are
> deleted which have TTL got over . TTL could be a month or year.
>
> Please assist if you have any config based idea on this
>
>> Regards
>> Pushkar
>>
>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson 
>> wrote:
>>
>>> Are you using the implicit router? Otherwise you cannot delete a shard.
>>> And you won’t have any shards that have zero documents anyway.
>>>
>>> It’d be a little convoluted, but you could use the collections COLSTATUS
>>> Api to
>>> find the names of all your replicas. Then query _one_ replica of each
>>> shard with something like
>>> solr/collection1_shard1_replica_n1/q=*:*=false
>>>
>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>>> it’s zero
>>> you can delete the shard.
>>>
>>> But the implicit router requires you take complete control of where
>>> documents
>>> go, i.e. which shard they land on.
>>>
>>> This really sounds like an XY problem. What’s the use  case you’re trying
>>> to support where you expect a shard’s number of live docs to drop to
>>> zero?
>>>
>>> Best,
>>> Erick
>>>
>>> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra 
>>> wrote:
>>> >
>>> > Hi Solr team,
>>> >
>>> > I am using solr cloud.(version 8.5.x). I have a need to find out a
>>> > configuration where I can delete a shard , when number of documents
>>> reaches
>>> > to zero in the shard , can some one help me out to achieve that ?
>>> >
>>> >
>>> > It is urgent , so a quick response will be highly appreciated .
>>> >
>>> > Thanks
>>> > Pushkar
>>> >
>>> > --
>>> > Pushkar Kumar Mishra
>>> > "Reactions are always instinctive whereas responses are always well
>>> thought
>>> > of... So start responding rather than reacting in life"
>>>
>>>

-- 
Pushkar Kumar Mishra
"Reactions are always instinctive whereas responses are always well thought
of... So start responding rather than reacting in life"


RE: CDCR

2020-12-01 Thread Gell-Holleron, Daniel
Hi Shalin, 

Just to add, in the exception with 'CdcrUpdateLogSynchronizer - Caught 
unexpected exception' it says it's because the SolrCore is loading. I don't 
know if this is down to the data being quite large? 

Thanks, 

Daniel 

-Original Message-
From: Gell-Holleron, Daniel 
Sent: 01 December 2020 11:49
To: solr-user@lucene.apache.org
Subject: RE: CDCR

Hi Shalin, 

I did try to do that but it hadn't made any difference, the remote clusters did 
not update. Autocommit is already set on the remote clusters as follows:


${solr.autoCommit.maxTime:15}
1
true


I can also see in the Solr admin pages, there is a warn message with 
CdcrUpdateLogSynchronizer - Caught unexpected exception. That's all I can see 
at the moment. No errors apart from that. 

Thanks, 

Daniel


-Original Message-
From: Shalin Shekhar Mangar 
Sent: 29 November 2020 05:19
To: solr-user@lucene.apache.org
Subject: Re: CDCR

EXTERNAL EMAIL - Be cautious of all links and attachments.

If you manually issue a commit operation on the remote clusters, do you see any 
updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing 
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates 
> (with no errors) even though the solr servers its replicating to 
> aren't updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be 
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

--
Regards,
Shalin Shekhar Mangar.


RE: CDCR

2020-12-01 Thread Gell-Holleron, Daniel
Hi Shalin, 

I did try to do that but it hadn't made any difference, the remote clusters did 
not update. Autocommit is already set on the remote clusters as follows:


${solr.autoCommit.maxTime:15}
1
true


I can also see in the Solr admin pages, there is a warn message with 
CdcrUpdateLogSynchronizer - Caught unexpected exception. That's all I can see 
at the moment. No errors apart from that. 

Thanks, 

Daniel


-Original Message-
From: Shalin Shekhar Mangar  
Sent: 29 November 2020 05:19
To: solr-user@lucene.apache.org
Subject: Re: CDCR

EXTERNAL EMAIL - Be cautious of all links and attachments.

If you manually issue a commit operation on the remote clusters, do you see any 
updates? If yes, then you should set autoCommit on the remote clusters.
If no, then check the logs on the cluster which is receiving the indexing 
operations and see if there are any errors.

On Wed, Nov 25, 2020 at 6:11 PM Gell-Holleron, Daniel < 
daniel.gell-holle...@gb.unisys.com> wrote:

> Hello,
>
> Does anybody have advice on why CDCR would say its Forwarding updates 
> (with no errors) even though the solr servers its replicating to 
> aren't updating?
>
> We have just under 50 million documents, that are spread across 4 servers.
> Each server has a node each.
>
> One side is updating happily so would think that sharding wouldn't be 
> needed at this point?
>
> We are using Solr version 7.7.1.
>
> Thanks,
>
> Daniel
>
>

--
Regards,
Shalin Shekhar Mangar.