Re: MergeTrigger consistency in MergePolicy "find merges"

2022-06-20 Thread Adrien Grand
Some comments on JIRA suggest that this is expected, because natural merges
can have a variety of triggers while forced merges are always called by the
app. I guess you could argue that MERGE_FINISHED is a different trigger,
but are there use-cases for doing things differently in findForcedMerges
depending on the merge trigger?

https://issues.apache.org/jira/browse/LUCENE-4472?focusedCommentId=13476920=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13476920
.

On Mon, Jun 20, 2022 at 3:26 PM Bruno Roustant 
wrote:

> I agree this AlwaysForceMergePolicy is not working correctly. It's just a
> test I did to easily understand how MergeTrigger.MERGE_FINISHED was working.
>
> Anyway my question is only about the MergeTrigger not present in the call
> to findForcedMerges(), to know if it is expected or inconsistent with the
> other find merges methods.
>
>
> Le lun. 20 juin 2022 à 14:26, Adrien Grand  a écrit :
>
>> Wouldn't this be a bug in the AlwaysForceMergePolicy, which should return
>> no merges if there is already a single segment with no deletes?
>>
>> On Mon, Jun 20, 2022 at 1:30 PM Bruno Roustant 
>> wrote:
>>
>>> If I use a simple "AlwaysForceMergePolicy" in a test, I can see that
>>> when a run IndexWriter.forceMerge(), the first call to
>>> AlwaysForceMergePolicy.findForcedMerges() is done for the
>>> MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
>>> MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
>>> to merge the segments produced by the output of the first explicit forced
>>> merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
>>> runs merges in an infinite loop.
>>>
>>> Le lun. 20 juin 2022 à 11:11, Adrien Grand  a écrit :
>>>
 You seem to imply that `forceMerge` runs a cascaded merge where the
 first merge creates some new segments that become inputs to a second merge.
 Have you considered running a single merge? We had a discussion about
 cascaded forced merges and TieredMergePolicy last year and ended up
 changing `findForcedMerges` to never run cascaded merges:
 https://issues.apache.org/jira/browse/LUCENE-7020.

 On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <
 bruno.roust...@gmail.com> wrote:

> MergePolicy "find merges" methods take a MergeTrigger as parameter,
> except findForcedMerges() and findForcedDeletesMerges().
> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
> selection between the initial explicit call and the subsequent calls
> triggered after the first merges.
>
> Should we add a MergeTrigger parameter to all MergePolicy "find
> merges" methods for consistency?
> If so, is it an internal or public API? (should this change stay in
> the main branch only)
>


 --
 Adrien

>>>
>>
>> --
>> Adrien
>>
>

-- 
Adrien


Re: MergeTrigger consistency in MergePolicy "find merges"

2022-06-20 Thread Bruno Roustant
I agree this AlwaysForceMergePolicy is not working correctly. It's just a
test I did to easily understand how MergeTrigger.MERGE_FINISHED was working.

Anyway my question is only about the MergeTrigger not present in the call
to findForcedMerges(), to know if it is expected or inconsistent with the
other find merges methods.


Le lun. 20 juin 2022 à 14:26, Adrien Grand  a écrit :

> Wouldn't this be a bug in the AlwaysForceMergePolicy, which should return
> no merges if there is already a single segment with no deletes?
>
> On Mon, Jun 20, 2022 at 1:30 PM Bruno Roustant 
> wrote:
>
>> If I use a simple "AlwaysForceMergePolicy" in a test, I can see that when
>> a run IndexWriter.forceMerge(), the first call to
>> AlwaysForceMergePolicy.findForcedMerges() is done for the
>> MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
>> MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
>> to merge the segments produced by the output of the first explicit forced
>> merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
>> runs merges in an infinite loop.
>>
>> Le lun. 20 juin 2022 à 11:11, Adrien Grand  a écrit :
>>
>>> You seem to imply that `forceMerge` runs a cascaded merge where the
>>> first merge creates some new segments that become inputs to a second merge.
>>> Have you considered running a single merge? We had a discussion about
>>> cascaded forced merges and TieredMergePolicy last year and ended up
>>> changing `findForcedMerges` to never run cascaded merges:
>>> https://issues.apache.org/jira/browse/LUCENE-7020.
>>>
>>> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant <
>>> bruno.roust...@gmail.com> wrote:
>>>
 MergePolicy "find merges" methods take a MergeTrigger as parameter,
 except findForcedMerges() and findForcedDeletesMerges().
 In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
 which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
 selection between the initial explicit call and the subsequent calls
 triggered after the first merges.

 Should we add a MergeTrigger parameter to all MergePolicy "find merges"
 methods for consistency?
 If so, is it an internal or public API? (should this change stay in the
 main branch only)

>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>


Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-20 Thread Tomoko Uchida
Thanks for your suggestions; actually ASF should have information on
the account mapping.
For now, I'll just prepare scripts to embed the mapped github accounts
next to the jira author/assignee name; we could ask infra or create
the mapping on our own by inference if we find it's worthwhile to have
it.

I think we can discuss "how" on the issue (LUCENE-10557) - I don't
think there are not so many people who are interested in the full
details of such matters, it's practically important though.

2022年6月20日(月) 21:26 Michael Sokolov :
>
> I think the user mapping must be inferred based on membership in the
> Apache "organization" https://github.com/settings/organizations
>
> On Sun, Jun 19, 2022 at 2:45 AM Dawid Weiss  wrote:
> >
> >
> >> User id mapping is an important consideration for me.
> >
> >
> > Some mapping has to be present somewhere already. Even very old git commits 
> > point at the right people. Perhaps it's based on e-mail addresses or 
> > something?
> >
> > https://github.com/apache/lucene/commit/5a2615650e104c0713407637d65ae0ce7c2b257a
> >
> > When the user isn't available, github just shows the nick, without the link.
> >
> > https://github.com/apache/lucene/commit/89a554ffab239c0118ccd454d76cdf714d793911
> >
> > Maybe infra could help with how it's done already for git integration.
> >
> > Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: MergeTrigger consistency in MergePolicy "find merges"

2022-06-20 Thread Adrien Grand
Wouldn't this be a bug in the AlwaysForceMergePolicy, which should return
no merges if there is already a single segment with no deletes?

On Mon, Jun 20, 2022 at 1:30 PM Bruno Roustant 
wrote:

> If I use a simple "AlwaysForceMergePolicy" in a test, I can see that when
> a run IndexWriter.forceMerge(), the first call to
> AlwaysForceMergePolicy.findForcedMerges() is done for the
> MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
> MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
> to merge the segments produced by the output of the first explicit forced
> merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
> runs merges in an infinite loop.
>
> Le lun. 20 juin 2022 à 11:11, Adrien Grand  a écrit :
>
>> You seem to imply that `forceMerge` runs a cascaded merge where the first
>> merge creates some new segments that become inputs to a second merge. Have
>> you considered running a single merge? We had a discussion about cascaded
>> forced merges and TieredMergePolicy last year and ended up changing
>> `findForcedMerges` to never run cascaded merges:
>> https://issues.apache.org/jira/browse/LUCENE-7020.
>>
>> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant 
>> wrote:
>>
>>> MergePolicy "find merges" methods take a MergeTrigger as parameter,
>>> except findForcedMerges() and findForcedDeletesMerges().
>>> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
>>> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
>>> selection between the initial explicit call and the subsequent calls
>>> triggered after the first merges.
>>>
>>> Should we add a MergeTrigger parameter to all MergePolicy "find merges"
>>> methods for consistency?
>>> If so, is it an internal or public API? (should this change stay in the
>>> main branch only)
>>>
>>
>>
>> --
>> Adrien
>>
>

-- 
Adrien


Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-20 Thread Michael Sokolov
I think the user mapping must be inferred based on membership in the
Apache "organization" https://github.com/settings/organizations

On Sun, Jun 19, 2022 at 2:45 AM Dawid Weiss  wrote:
>
>
>> User id mapping is an important consideration for me.
>
>
> Some mapping has to be present somewhere already. Even very old git commits 
> point at the right people. Perhaps it's based on e-mail addresses or 
> something?
>
> https://github.com/apache/lucene/commit/5a2615650e104c0713407637d65ae0ce7c2b257a
>
> When the user isn't available, github just shows the nick, without the link.
>
> https://github.com/apache/lucene/commit/89a554ffab239c0118ccd454d76cdf714d793911
>
> Maybe infra could help with how it's done already for git integration.
>
> Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: MergeTrigger consistency in MergePolicy "find merges"

2022-06-20 Thread Bruno Roustant
If I use a simple "AlwaysForceMergePolicy" in a test, I can see that when a
run IndexWriter.forceMerge(), the first call to
AlwaysForceMergePolicy.findForcedMerges() is done for the
MergeTrigger.EXPLICIT. But then, at IndexWriter.merge() line 4531,
MergePolicy.findForcedMerges() is called with MergeTrigger.MERGE_FINISHED
to merge the segments produced by the output of the first explicit forced
merge, and so on. For this degenerated AlwaysForceMergePolicy, the test
runs merges in an infinite loop.

Le lun. 20 juin 2022 à 11:11, Adrien Grand  a écrit :

> You seem to imply that `forceMerge` runs a cascaded merge where the first
> merge creates some new segments that become inputs to a second merge. Have
> you considered running a single merge? We had a discussion about cascaded
> forced merges and TieredMergePolicy last year and ended up changing
> `findForcedMerges` to never run cascaded merges:
> https://issues.apache.org/jira/browse/LUCENE-7020.
>
> On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant 
> wrote:
>
>> MergePolicy "find merges" methods take a MergeTrigger as parameter,
>> except findForcedMerges() and findForcedDeletesMerges().
>> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
>> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
>> selection between the initial explicit call and the subsequent calls
>> triggered after the first merges.
>>
>> Should we add a MergeTrigger parameter to all MergePolicy "find merges"
>> methods for consistency?
>> If so, is it an internal or public API? (should this change stay in the
>> main branch only)
>>
>
>
> --
> Adrien
>


Re: Plan for GitHub issue metadata management

2022-06-20 Thread Tomoko Uchida
I haven't used the "project" feature either - maybe it could be an
option but I can't have an opinion on it. Is there anyone who has
experience with it and wants to lead us to use it?

Tomoko

2022年6月20日(月) 18:59 Jens Wille :
>
> Hi,
>
> I'm just a bystander here. But are you aware that the new projects (beta)
> includes support for custom fields?
>
> 
>
> I haven't used them myself yet, but it seems that they might be a viable
> alternative to modeling everything with labels (which is more of a crutch I
> suppose).
>
> Cheers,
> Jens
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Plan for GitHub issue metadata management

2022-06-20 Thread Jens Wille
Hi,

I'm just a bystander here. But are you aware that the new projects (beta)
includes support for custom fields?



I haven't used them myself yet, but it seems that they might be a viable
alternative to modeling everything with labels (which is more of a crutch I
suppose).

Cheers,
Jens


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Plan for GitHub issue metadata management

2022-06-20 Thread Tomoko Uchida
Indeed versions are the most important metadata; I'd like to hear the
thoughts of others.

1. Fix Version(s)

We have only two options: Milestone or Label. One important difference
between them is that an issue can have only one milestone but multiple
labels. The other difference would be that while Milestone is special
metadata, labels are just flexible text tags for searching. I'm
personally fine with Milestone and my reasoning is that - we don't
release a bug fix or improvement in multiple versions anyway. We don't
have two CHANGES entries for one issue; if we resolve an issue in
"10.0.0" and "9.3.0" the CHANGES entry appears only in Lucene 9.3.0's
section.
However I don't have a strong opinion on it and go with others'
suggestions - should we continue to support multiple "fixed versions"
with labels (tags)?

2. Affects Version(s)

I have never used this metadata field but I'm perfectly fine with
supporting this with labels if we need it.

Tomoko

2022年6月20日(月) 18:40 Shai Erera :
>
> Can we support "Affects Versions" with a label too? "affectsVersion: 8.x"?
>
> Regarding Fix Versions, don't we have multiple of these sometimes? E.g. a bug 
> fix may go into "8.1", "9.x" and "main"? Is it OK if we just drop support for 
> this?
>
> On Mon, Jun 20, 2022 at 12:33 PM Tomoko Uchida  
> wrote:
>>
>> Hello all.
>>
>> Besides whether the migration of existing issues should be done or not
>> (we still do not reach an agreement on it), I started to play around
>> with GitHub issue metadata with a test repository.
>>
>> The current migration plan in my mind:
>>
>> * Issue Type -> Supported with labels (e.g. "type:bug"); it also can
>> be attached when opening issues with issue templates.
>> * Issue Priority -> Not supported.
>> * Affects Versions -> Not supported.
>> * Components -> Supported with labels (e.g.: "module:core").
>> * Resolution -> Not supported.
>> * Fix Version(s) -> Partially supported with Milestone; an issue can
>> have only one milestone - I'm fine with it.
>>
>> As you may see I'm going to drop the most of metadata that is
>> supported in Jira for the sake of brevity. If you have objections or
>> other perspectives, could you please speak up.
>>
>> Tomoko
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Plan for GitHub issue metadata management

2022-06-20 Thread Shai Erera
Can we support "Affects Versions" with a label too? "affectsVersion: 8.x"?

Regarding Fix Versions, don't we have multiple of these sometimes? E.g. a
bug fix may go into "8.1", "9.x" and "main"? Is it OK if we just drop
support for this?

On Mon, Jun 20, 2022 at 12:33 PM Tomoko Uchida 
wrote:

> Hello all.
>
> Besides whether the migration of existing issues should be done or not
> (we still do not reach an agreement on it), I started to play around
> with GitHub issue metadata with a test repository.
>
> The current migration plan in my mind:
>
> * Issue Type -> Supported with labels (e.g. "type:bug"); it also can
> be attached when opening issues with issue templates.
> * Issue Priority -> Not supported.
> * Affects Versions -> Not supported.
> * Components -> Supported with labels (e.g.: "module:core").
> * Resolution -> Not supported.
> * Fix Version(s) -> Partially supported with Milestone; an issue can
> have only one milestone - I'm fine with it.
>
> As you may see I'm going to drop the most of metadata that is
> supported in Jira for the sake of brevity. If you have objections or
> other perspectives, could you please speak up.
>
> Tomoko
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Plan for GitHub issue metadata management

2022-06-20 Thread Tomoko Uchida
Hello all.

Besides whether the migration of existing issues should be done or not
(we still do not reach an agreement on it), I started to play around
with GitHub issue metadata with a test repository.

The current migration plan in my mind:

* Issue Type -> Supported with labels (e.g. "type:bug"); it also can
be attached when opening issues with issue templates.
* Issue Priority -> Not supported.
* Affects Versions -> Not supported.
* Components -> Supported with labels (e.g.: "module:core").
* Resolution -> Not supported.
* Fix Version(s) -> Partially supported with Milestone; an issue can
have only one milestone - I'm fine with it.

As you may see I'm going to drop the most of metadata that is
supported in Jira for the sake of brevity. If you have objections or
other perspectives, could you please speak up.

Tomoko

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: MergeTrigger consistency in MergePolicy "find merges"

2022-06-20 Thread Adrien Grand
You seem to imply that `forceMerge` runs a cascaded merge where the first
merge creates some new segments that become inputs to a second merge. Have
you considered running a single merge? We had a discussion about cascaded
forced merges and TieredMergePolicy last year and ended up changing
`findForcedMerges` to never run cascaded merges:
https://issues.apache.org/jira/browse/LUCENE-7020.

On Mon, Jun 20, 2022 at 10:31 AM Bruno Roustant 
wrote:

> MergePolicy "find merges" methods take a MergeTrigger as parameter, except
> findForcedMerges() and findForcedDeletesMerges().
> In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
> which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
> selection between the initial explicit call and the subsequent calls
> triggered after the first merges.
>
> Should we add a MergeTrigger parameter to all MergePolicy "find merges"
> methods for consistency?
> If so, is it an internal or public API? (should this change stay in the
> main branch only)
>


-- 
Adrien


MergeTrigger consistency in MergePolicy "find merges"

2022-06-20 Thread Bruno Roustant
MergePolicy "find merges" methods take a MergeTrigger as parameter, except
findForcedMerges() and findForcedDeletesMerges().
In my use-case, I could leverage a MergeTrigger in findForcedMerges(),
which can be EXPLICIT or MERGE_FINISHED, to differentiate the merge
selection between the initial explicit call and the subsequent calls
triggered after the first merges.

Should we add a MergeTrigger parameter to all MergePolicy "find merges"
methods for consistency?
If so, is it an internal or public API? (should this change stay in the
main branch only)