[ 
https://issues.apache.org/jira/browse/SOLR-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-12259:
----------------------------------
    Description: 
This needs more careful planning, see the SIP here: 
https://cwiki.apache.org/confluence/display/SOLR/SIP-2+Support+safe+index+transformations+without+reindexing

 

The general problem statement is that the current upgrade path is trappy and 
cumbersome. It would be a great help "in the field" to make the upgrade process 
less painful.

Additionally one of the most common things users want to do is enable 
docValues, but currently they often have to re-index.

Issues:

1> if I upgrade from 5x to 6x and then 7x, theres no guarantee that when I go 
to 7x all the segments have been rewritten in 6x format. Say I have a segment 
at max size that has no deletions. It'll never be rewritten until it has 
deleted docs. And perhaps 50% deleted docs currently.

2> IndexUpgraderTool explicitly does a forcemerge to 1 segment, which is bad.

3> in a large distributed system, running IndexUpgraderTool on all the nodes is 
cumbersome even if <2> is acceptable.

4> Users who realize specifying docValues on a field would be A Good Thing have 
to re-index. We have UninvertDocValuesMergePolicyFactory. Wouldn't it be nice 
to be able to have this done all at once without forceMerging to one segment.

Proposal:

Somehow avoid the above. Currently LUCENE-7976 is a start in that direction. It 
will make TMP respect max segments size so can avoid forceMerges that result in 
one segment. What it does _not_ do is rewrite segments with zero (or a small 
percentage) deleted documents.

So it doesn't seem like a huge stretch to be able to specify to TMP the option 
to rewrite segments that have no deleted documents. Perhaps a new parameter to 
optimize?

This would likely require another change to TMP or whatever.

So upgrading to a new solr would look like
 1> install the new Solr
 2> execute 
"http://node:port/solr/collection_or_core/update?optimize=true&upgradeAllSegments=true";

What's not clear to me is whether we'd require 
UninvertDocValuesMergePolicyFactory to be specified and wrap TMP or not.

Anyway, let's discuss. I'll create yet another LUCENE JIRA for TMP do rewrite 
all segments that I'll link.

I'll also link several other JIRAs in here, they're coalescing.

  was:
The general problem statement is that the current upgrade path is trappy and 
cumbersome.  It would be a great help "in the field" to make the upgrade 
process less painful.

Additionally one of the most common things users want to do is enable 
docValues, but currently they often have to re-index.

Issues:

1> if I upgrade from 5x to 6x and then 7x, theres no guarantee that when I go 
to 7x all the segments have been rewritten in 6x format. Say I have a segment 
at max size that has no deletions. It'll never be rewritten until it has 
deleted docs. And perhaps 50% deleted docs currently.

2> IndexUpgraderTool explicitly does a forcemerge to 1 segment, which is bad.

3> in a large distributed system, running IndexUpgraderTool on all the nodes is 
cumbersome even if <2> is acceptable.

4> Users who realize specifying docValues on a field would be A Good Thing have 
to re-index. We have UninvertDocValuesMergePolicyFactory. Wouldn't it be nice 
to be able to have this done all at once without forceMerging to one segment.

Proposal:

Somehow avoid the above. Currently LUCENE-7976 is a start in that direction. It 
will make TMP respect max segments size so can avoid forceMerges that result in 
one segment. What it does _not_ do is rewrite segments with zero (or a small 
percentage) deleted documents.

So it  doesn't seem like a huge stretch to be able to specify to TMP the option 
to rewrite segments that have no deleted documents. Perhaps a new parameter to 
optimize?

This would likely require another change to TMP or whatever.

So upgrading to a new solr would look like
1> install the new Solr
2> execute 
"http://node:port/solr/collection_or_core/update?optimize=true&upgradeAllSegments=true";

What's not clear to me is whether we'd require 
UninvertDocValuesMergePolicyFactory to be specified and wrap TMP or not.

Anyway, let's discuss. I'll create yet another LUCENE JIRA for TMP do rewrite 
all segments that I'll link.

I'll also link several other JIRAs in here, they're coalescing.




> Robustly upgrade indexes
> ------------------------
>
>                 Key: SOLR-12259
>                 URL: https://issues.apache.org/jira/browse/SOLR-12259
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: SOLR-12259.patch
>
>
> This needs more careful planning, see the SIP here: 
> https://cwiki.apache.org/confluence/display/SOLR/SIP-2+Support+safe+index+transformations+without+reindexing
>  
> The general problem statement is that the current upgrade path is trappy and 
> cumbersome. It would be a great help "in the field" to make the upgrade 
> process less painful.
> Additionally one of the most common things users want to do is enable 
> docValues, but currently they often have to re-index.
> Issues:
> 1> if I upgrade from 5x to 6x and then 7x, theres no guarantee that when I go 
> to 7x all the segments have been rewritten in 6x format. Say I have a segment 
> at max size that has no deletions. It'll never be rewritten until it has 
> deleted docs. And perhaps 50% deleted docs currently.
> 2> IndexUpgraderTool explicitly does a forcemerge to 1 segment, which is bad.
> 3> in a large distributed system, running IndexUpgraderTool on all the nodes 
> is cumbersome even if <2> is acceptable.
> 4> Users who realize specifying docValues on a field would be A Good Thing 
> have to re-index. We have UninvertDocValuesMergePolicyFactory. Wouldn't it be 
> nice to be able to have this done all at once without forceMerging to one 
> segment.
> Proposal:
> Somehow avoid the above. Currently LUCENE-7976 is a start in that direction. 
> It will make TMP respect max segments size so can avoid forceMerges that 
> result in one segment. What it does _not_ do is rewrite segments with zero 
> (or a small percentage) deleted documents.
> So it doesn't seem like a huge stretch to be able to specify to TMP the 
> option to rewrite segments that have no deleted documents. Perhaps a new 
> parameter to optimize?
> This would likely require another change to TMP or whatever.
> So upgrading to a new solr would look like
>  1> install the new Solr
>  2> execute 
> "http://node:port/solr/collection_or_core/update?optimize=true&upgradeAllSegments=true";
> What's not clear to me is whether we'd require 
> UninvertDocValuesMergePolicyFactory to be specified and wrap TMP or not.
> Anyway, let's discuss. I'll create yet another LUCENE JIRA for TMP do rewrite 
> all segments that I'll link.
> I'll also link several other JIRAs in here, they're coalescing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to