Re: GDPR compliance

2023-11-29 Thread Ilan Ginzburg
To the valid point Robert makes above about the underlying data still on
the disk (old news):
https://news.sophos.com/en-us/2022/09/23/morgan-stanley-fined-millions-for-selling-off-devices-full-of-customer-pii/

On Wed, Nov 29, 2023 at 11:01 AM Michael Sokolov  wrote:

> Another way is to ensure that all documents get updated on a regular
> cadence whether there are changes in the underlying data or not. Or,
> regenerating the index from scratch all the time. Of course these
> approaches might be more costly for an index that has intrinsically low
> update rates, but they do keep the index fresh without the need for any
> special tracking.
>
> On Tue, Nov 28, 2023, 8:45 PM Patrick Zhai  wrote:
>
>> It's not that insane, it's about several weeks however the big segment
>> can stay there for quite long if there's not enough update for a merge
>> policy to pick it up
>>
>> On Tue, Nov 28, 2023, 17:14 Dongyu Xu  wrote:
>>
>>> What is the expected grace time for the data-deletion request to take
>>> place?
>>>
>>> I'm not expert about the policy but I think something like "I need my
>>> data to be gone in next 2 second" is unreasonable.
>>>
>>> Tony X
>>>
>>> --
>>> *From:* Robert Muir 
>>> *Sent:* Tuesday, November 28, 2023 11:52 AM
>>> *To:* dev@lucene.apache.org 
>>> *Subject:* Re: GDPR compliance
>>>
>>> I don't think there's any problem with GDPR, and I don't think users
>>> should be running unnecessary "optimize". GDRP just says data should
>>> be erased without "undue" delay. waiting for a merge to nuke the
>>> deleted docs isn't "undue", there is a good reason for it.
>>>
>>> On Tue, Nov 28, 2023 at 2:40 PM Patrick Zhai  wrote:
>>> >
>>> > Hi Folks,
>>> > In LinkedIn we need to comply with GDPR for a large part of our data,
>>> and an important part of it is that we need to be sure we have completely
>>> deleted the data the user requested to delete within a certain period of
>>> time.
>>> > The way we have come up with so far is to:
>>> > 1. Record the segment creation time somewhere (not decided yet, maybe
>>> index commit userinfo, maybe some other place outside of lucene)
>>> > 2. Create a new merge policy which delegate most operations to a
>>> normal MP, like TieredMergePolicy, and then add extra single-segment (merge
>>> from 1 segment to 1 segment, basically only do deletion) merges if it finds
>>> any segment is about to violate the GDPR time frame.
>>> >
>>> > So here's my question:
>>> > 1. Is there a better/existing way to do this?
>>> > 2. I would like to directly contribute to Lucene about such a merge
>>> policy since I think GDPR is more or less a common thing. Would like to
>>> know whether people feel like it's necessary or not?
>>> > 3. It's also nice if we can store the segment creation time to the
>>> index directly by IndexWriter (maybe write to SegmentInfo?), I can try to
>>> do that but would like to ask whether there's any objections?
>>> >
>>> > Best
>>> > Patrick
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>


Re: GDPR compliance

2023-11-28 Thread Ilan Ginzburg
Are larger and older segments even certain to ever be merged in practice? I
was assuming that if there is not a lot of new indexed content and not a
lot of older documents being deleted, large older segment might never have
to be merged.


On Tue 28 Nov 2023 at 20:53, Robert Muir  wrote:

> I don't think there's any problem with GDPR, and I don't think users
> should be running unnecessary "optimize". GDRP just says data should
> be erased without "undue" delay. waiting for a merge to nuke the
> deleted docs isn't "undue", there is a good reason for it.
>
> On Tue, Nov 28, 2023 at 2:40 PM Patrick Zhai  wrote:
> >
> > Hi Folks,
> > In LinkedIn we need to comply with GDPR for a large part of our data,
> and an important part of it is that we need to be sure we have completely
> deleted the data the user requested to delete within a certain period of
> time.
> > The way we have come up with so far is to:
> > 1. Record the segment creation time somewhere (not decided yet, maybe
> index commit userinfo, maybe some other place outside of lucene)
> > 2. Create a new merge policy which delegate most operations to a normal
> MP, like TieredMergePolicy, and then add extra single-segment (merge from 1
> segment to 1 segment, basically only do deletion) merges if it finds any
> segment is about to violate the GDPR time frame.
> >
> > So here's my question:
> > 1. Is there a better/existing way to do this?
> > 2. I would like to directly contribute to Lucene about such a merge
> policy since I think GDPR is more or less a common thing. Would like to
> know whether people feel like it's necessary or not?
> > 3. It's also nice if we can store the segment creation time to the index
> directly by IndexWriter (maybe write to SegmentInfo?), I can try to do that
> but would like to ask whether there's any objections?
> >
> > Best
> > Patrick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-09 Thread Ilan Ginzburg
Jira: ilan
GitHub: murblanc

(used to have murblanc as Jira id as well and changed to Ilan when I became
Solr/Lucene commiter).

On Tue, Aug 9, 2022, 5:36 PM Michael McCandless 
wrote:

> OK, added!  Thanks:
>
>
> https://github.com/apache/lucene-jira-archive/commit/efd7c510749ada4841eed714a5f7f817bb11c8cf
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Aug 8, 2022 at 1:09 PM Walter Underwood 
> wrote:
>
>> JiraName,GitHubAccount,JiraDispName
>> wunder,wrunderwood,Walter Underwood
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Aug 6, 2022, at 3:05 AM, Michael McCandless 
>> wrote:
>>
>> Hi Adam, I added your linked accounts here:
>>
>> https://github.com/apache/lucene-jira-archive/commit/c228cb184c073f4b96cd68d45a000cf390455b7c
>>
>> And Tomoko added Rushabh's linked accounts here:
>>
>>
>> https://github.com/apache/lucene-jira-archive/commit/6f9501ec68792c1b287e93770f7a9dfd351b86fb
>>
>> Keep the linked accounts coming!
>>
>> Mike
>>
>> On Thu, Aug 4, 2022 at 7:02 PM Rushabh Shah <
>> rushabh.s...@salesforce.com.invalid> wrote:
>>
>>> Hi,
>>> My mapping is:
>>> JiraName,GitHubAccount,JiraDispName
>>> shahrs87, shahrs87, Rushabh Shah
>>>
>>> Thank you Tomoko and Mike for all of your hard work.
>>>
>>>
>>>
>>>
>>> On Sun, Jul 31, 2022 at 3:08 AM Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
 Hello Lucene users, contributors and developers,

 If you have used Lucene's Jira and you have a GitHub account as well,
 please check whether your user id mapping is in this file:
 https://github.com/apache/lucene-jira-archive/blob/main/migration/mappings-data/account-map.csv.20220722.verified

 If not, please reply to this email and we will try to add you.

 Please forward this email to anyone you know might be impacted and who
 might not be tracking the Lucene lists.


 Full details:

 The Lucene project will soon migrate from Jira to GitHub for issue
 tracking.

 There have been discussions, votes, a migration tool created / iterated
 (thanks to Tomoko Uchida's incredibly hard work), all iterating on Lucene's
 dev list.

 When we run the migration, we would like to map Jira users to the right
 GitHub users to properly @-mention the right person and make it easier for
 you to find issues you have engaged with.

 Mike McCandless

 http://blog.mikemccandless.com

>>> --
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>>


Re: Lucene PMC Chair Bruno Roustant

2022-03-24 Thread Ilan Ginzburg
Congrats Bruno!

Ilan

On Wed, Mar 23, 2022 at 10:47 PM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Yes thank you Mike for handling all the fun PMC and Board issues for the
> past year!!  And thank you Bruno for the next year!!
>
> A year is a long time in a human life but it unfortunately and
> surprisingly passes very quickly.
>
> Mike
>
> On Wed, Mar 23, 2022 at 5:40 PM Martin Gainty  wrote:
>
>> Welcome Bruno !
>>
>> -martin
>> --
>> *From:* Adrien Grand 
>> *Sent:* Wednesday, March 23, 2022 9:20 AM
>> *To:* Lucene Dev 
>> *Subject:* Re: Lucene PMC Chair Bruno Roustant
>>
>> Congratulations Bruno!
>>
>> On Wed, Mar 23, 2022 at 2:03 PM Michael Sokolov 
>> wrote:
>> >
>> > Hello, Lucene developers. Lucene Program Management Committee has
>> > elected a new chair, Bruno Roustant, and the Board has approved.
>> > Bruno, thank you for stepping up, and congratulations!
>> >
>> > -Mike
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>>
>> --
>> Adrien
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> --
> Mike McCandless
>
> http://blog.mikemccandless.com
>


Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Ilan Ginzburg
Welcome Patrick and congrats!

On Sun, Dec 19, 2021 at 10:36 PM Michael Sokolov  wrote:
>
> Welcome Patrick!
>
> On Sun, Dec 19, 2021 at 3:27 PM Xi Chen  wrote:
> >
> > Congratulations and welcome Haoyu!
> >
> > Best,
> > Zach
> >
> > On Dec 19, 2021, at 12:05 PM, Patrick Zhai  wrote:
> >
> > 
> > Thanks everyone!
> >
> > It's a great honor to become a lucene committer and thank you everyone for 
> > building such a friendly community and specially thank you to who has 
> > replied email/ commented on issues/ reviewed PRs related to my work. It is 
> > an enjoyable experience working with lucene community and I'm looking 
> > forward to learn more about lucene as well as contribute more to the 
> > community as a committer.
> >
> > A little bit about myself, I'm currently living in Mountain View and 
> > working at Amazon Search, in the same team as Mike McCandless, Mike Sokolov 
> > and Greg. Besides digging into lucene for some work related projects, I am 
> > also very curious about how some fancy stuffs are implemented inside 
> > lucene, and that probably is one of the reason drives me to be a committer.
> > Besides programming, I enjoy playing video games a lot, I admire 
> > well-designed game and hope I could participate in game development one 
> > day. As for outdoor activities, I like skiing and traveling, due to COVID 
> > it's still hard to travel around but I hope things will be better next year.
> >
> > Thank you again!
> > Patrick
> >
> > David Smiley  于2021年12月19日周日 09:14写道:
> >>
> >> Congratulations Haoyu!
> >>
> >> ~ David Smiley
> >> Apache Lucene/Solr Search Developer
> >> http://www.linkedin.com/in/davidwsmiley
> >>
> >>
> >> On Sun, Dec 19, 2021 at 4:12 AM Dawid Weiss  wrote:
> >>>
> >>> Hello everyone!
> >>>
> >>> Please welcome Haoyu Zhai as the latest Lucene committer. You may also
> >>> know Haoyu as Patrick - this is perhaps his kind gesture to those of
> >>> us whose tongues are less flexible in pronouncing difficult first
> >>> names. :)
> >>>
> >>> It's a tradition to briefly introduce yourself to the group, Patrick.
> >>> Welcome and thank you!
> >>>
> >>> Dawid
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: javadoc fails with no message

2021-08-06 Thread Ilan Ginzburg
Did you create the javadoc package file? Usually its absence leads to
cryptic errors...

On Fri, Aug 6, 2021 at 6:44 PM Michael Sokolov  wrote:

> Hi all, does anybody have helpful tips about  how to chase down
> javadoc build failures? I made some new stuff, and ./gradlew test
> passes, but ./gradlew check fails with this less than fully
> explanatory message:
>
> $ JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64/ ./gradlew check
> > Task :lucene:core:renderJavadoc FAILED
>
> FAILURE: Build failed with an exception.
>
> * Where:
> Script '/home/
> ANT.AMAZON.COM/sokolovm/workspace/lucene/gradle/documentation/render-javadoc.gradle
> '
> line: 450
>
> * What went wrong:
> Execution failed for task ':lucene:core:renderJavadoc'.
> > No value present
>
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info
> or --debug option to get more log output. Run with --scan to get full
> insights.
>
> * Get more help at https://help.gradle.org
>
> BUILD FAILED in 3s
> 44 actionable tasks: 17 executed, 27 up-to-date
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Welcome Greg Miller as Lucene committer

2021-05-31 Thread Ilan Ginzburg
Congrats Greg!

On Sun, May 30, 2021 at 4:35 PM Greg Miller  wrote:

> Thanks everyone! I'm honored to have been nominated and look forward
> to continuing to work with all of you on Lucene! I'm incredibly
> grateful for everyone that has helped me so far. There's a lot to
> learn in Lucene and this community has been a fantastic help ramping
> up, providing thorough PR feedback/ideas/etc. and simply been a great
> group of people to collaborate with.
>
> As far as a brief bio goes, I live in the Seattle area and work for
> Amazon's "Product Search" team, which I joined in January of this
> year. I'm a naturally curious person and find myself fascinated by
> data structure / algorithm problems, so diving into Lucene has been
> really fun! I'm also an avid runner (mostly marathons but right now
> I'm training for my first one-mile race on a track), and love to
> travel with my wife and daughter (although that's been on "pause" for
> obvious reasons for the past year+). My biggest accomplishment of 2021
> so far has been teaching my daughter to ride a bike, but being
> nominated as a Lucene committer is a close second :)
>
> Thanks again everyone and looking forward to continuing to work with all
> of you!
>
> Cheers,
> -Greg
>
> On Sat, May 29, 2021 at 7:59 PM Michael McCandless
>  wrote:
> >
> > Welcome Greg!
> >
> > Mike
> >
> > On Sat, May 29, 2021 at 3:47 PM Adrien Grand  wrote:
> >>
> >> I'm pleased to announce that Greg Miller has accepted the PMC's
> invitation to become a committer.
> >>
> >> Greg, the tradition is that new committers introduce themselves with a
> brief bio.
> >>
> >> Congratulations and welcome!
> >>
> >>
> >> --
> >> Adrien
> >
> > --
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Separate git repo(s) for Solr modules

2021-05-04 Thread Ilan Ginzburg
As with any dependency on any project, you update the dependency project
first then consume the updated dependency in Solr.

If the idea is to be able to modify Lucene and Solr in parallel, then the
project split is counterproductive.

>From the Solr perspective, Lucene and Zookerper are really two “similar”
dependencies and IMO we should think about them in that way.

Ilan

On Tue 4 May 2021 at 09:45, Noble Paul  wrote:

> @Houston
>
> So, Are you suggesting we should not do that ?
>
> On Tue, May 4, 2021 at 2:35 PM Houston Putman 
> wrote:
>
>> In the future we wont be able to “work on both at the same time”, once
>> Lucene 9 is cut. Why not pull that bandaid now?
>>
>> On Mon, May 3, 2021 at 11:32 PM Noble Paul  wrote:
>>
>>> I'm still struggling to understand the workflow when I'm working on a
>>> feature that spans lucene and solr.
>>>
>>> I'm yet to see an argument against sub-modules
>>>
>>> On Wed, Feb 17, 2021 at 3:18 AM Jason Gerlowski 
>>> wrote:
>>>
 > Shoving such a component into lucene-solr repo makes no sense, given
 its branching strategy is based on master / branch_8x

 I get how this could cause issues - I def hadn't thought much about
 multi-version support and branching.  But I don't think moving plugins
 to a separate repo solves that problem for us.  If our first class
 plugins are set up to have different release "lines" that don't line
 up with major Solr versions, it's only a matter of time before two of
 those plugins have branch requirements that conflict.  Unless I'm
 missing something here?

 > I'd prefer that a module only leave our "contribs" area when the
 concerns/limitations become real.  Doing it prematurely could lead to
 atrophy of the module

 +1 to David's comment.   I def hadn't considered the branching-scheme
 issues that Ishan brought up in his last reply, and they seem like
 valid concerns to me.  But the risk and the downsides of "atrophy" are
 serious enough that I'd vote to not risk them until this starts to
 cause us issues in practice.  Even if, for now, that means we won't be
 able to build a single plugin jar that supports (e.g.) 3 major Solr
 versions.  IMO that's a much smaller loss.

 On Tue, Feb 16, 2021 at 9:40 AM David Smiley 
 wrote:
 >
 > On Tue, Feb 16, 2021 at 8:38 AM Eric Pugh <
 ep...@opensourceconnections.com> wrote:
 >>
 >> Testing across multiple versions is always very difficult ;-).  I
 recently saw this very interesting approach to using our Dockerized Solr’s
 to test a component against a number of previous versions of Solr.
 https://github.com/querqy/querqy/pull/147. I’m hopeful it could be a
 model for other packages to adopt.
 >
 >
 > Thanks for the link to that Querqy PR.  That is *very* similar to
 what I do at work (minus multi-version testing), and also similar to how I
 test multiple versions in one of my pet projects by using a CI build matrix
 of a configurable dependency.  I didn't know Testcontainer.org has its own
 Solr module -- https://www.testcontainers.org/modules/solr/ but we
 implemented that ourselves; not hard.
 >
 >>
 >> Trying to maintain across multiple versions is kind of a Sisyphean
 task, and I don’t think plays to open source communities strengths.   It’s
 hard enough to keep all components of Solr up to date with the latest
 Lucene and the latest Solr….  (Try out wt=xlsx Response Writer, it’s
 currently broken on master) .  Reminds me of the Apache Gump project ;-)
 >>
 >> If something is really going to be backcompatible across certain
 versions, then maybe having it’s own repo makes sense,
 >
 >
 > I'd prefer that a module only leave our "contribs" area when the
 concerns/limitations become real.  Doing it prematurely could lead to
 atrophy of the module
 >
 >>
 >> but I suspect it means those components may go stale….   Look at DIH
 and Velocity components that are moved over to their own repos, both are
 getting stale, and I’d argue it’s because they don’t live in the main
 project where all of us have oversight and easy access.
 >
 >
 > Agreed :-(
 >

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


>>>
>>> --
>>> -
>>> Noble Paul
>>>
>>
>
> --
> -
> Noble Paul
>


Re: Welcome Peter Gromov as Lucene committer

2021-04-06 Thread Ilan Ginzburg
Welcome Peter!

On Tue, Apr 6, 2021 at 7:48 PM Robert Muir  wrote:

> I'm pleased to announce that Peter Gromov has accepted the PMC's
> invitation to become a committer.
>
> Peter, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
>


Re: Branch cleaning/ archiving

2021-03-10 Thread Ilan Ginzburg
Any risk in the script that command:
git push ${REMOTE}
cominvent/$BRANCH:refs/tags/history/branches/lucene-solr/$BRANCH
errors out in some exotic way (?) but the script continues anyway and
proceeds with the delete:
git push ${REMOTE} --delete $BRANCH


On Wed, Mar 10, 2021 at 10:35 PM Jan Høydahl  wrote:

> Ok, I took a stab at this
>
> We have 113 branches. Here is a script I prepared that will work directly
> on a git remote, first creating the tag then deleting the branch.
> https://gist.github.com/80a7eea6bacd4e32646a7958d1e9a870
>
> In the script I have added the list of branches that I propose to
> "archive".
>
> Below that there is a commented section of branches that are reported as
> "active" by GitHub that should probably stay for now
> Finally, we have some branches we might want to keep around for refernce?
> I'm not sure how useful it is to keep a branch in solr.git for, say
> branch_8_8, as it will fall behind as branch_8_8 in lucene-solr.git gets
> updated. So probably archive those as well?
>
> The pure lucene branches like LUCENE-9004 won't need a tag at all I
> suppose, but it won't hurt either.
>
> Jan
>
> > 10. mar. 2021 kl. 21:16 skrev Dawid Weiss :
> >
> >> We already did this before, see list of existing tags (git tag -l)
> >
> > I know, I did it, after all... :) This was a move from subversion
> > though... slightly different. Anyway - if you guys want to proceed
> > with this, please go ahead, I don't mind. A spring cleaning is needed
> > every couple of years... If we just leave the main branch it'll be
> > very elegant. People work on their local repos these days anyway, it's
> > not like everyone pollutes the same workspace.
> >
> > D.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: OverseerStatusTest recent failures

2021-02-21 Thread Ilan Ginzburg
I have fixed the issue. A PR is out
https://github.com/apache/lucene-solr/pull/2410/files.
Most of the work was documenting what stats are actually returned. Now
OverseerStatusCmd has more comment lines than code lines.

Will merge it shortly.

Ilan



On Sun, Feb 21, 2021 at 6:05 PM Ilan Ginzburg  wrote:

> Searching in my jenkins folder for failures of this test (label:jenkins
> "FAILED:  org.apache.solr.cloud.OverseerStatusTest.test") 26 emails match.
> Searching for all jenkins master builds emails since the first failure
> email found above (2 days ago), I see 40 messages.
> 26 over 40 is not far from the expected 50% failure rate.
> I believe the ratio in the graph you sent David (currently at 5.7%) is
> averaged over a week, and includes failures from all branches (did some
> other stats on jenkins emails that tend to confirm this assumption).
>
> On Sun, Feb 21, 2021 at 10:53 AM Ilan Ginzburg  wrote:
>
>> Yes Marcus this is the commit.
>>
>> David I would have expected 50% failures, as 50% of the runs use
>> distributed updates. I’ll try to understand better as I fix the issue.
>>
>> Ilan
>>
>> On Sun 21 Feb 2021 at 06:17, David Smiley  wrote:
>>
>>> Interesting.  Do you have a guess as to why the failures there are ~5%
>>> and not 100% reproducible?
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg 
>>> wrote:
>>>
>>>> Indeed the issue is due to my changes.
>>>>
>>>> In OverseerStatusCmd I've skipped some stat collection when running in
>>>> distributed cluster state updates mode because I thought these were only
>>>> stats related to cluster state updates.
>>>> Obviously that was too aggressive and some of the stats are related to
>>>> the Collection API.
>>>>
>>>> I will make sure to skip returning only the stats that are related to
>>>> cluster state updater and restore returning collection api stats (when
>>>> running in distributed cluster updates mode, otherwise all stats are
>>>> returned).
>>>>
>>>> Tomorrow...
>>>>
>>>> Ilan
>>>>
>>>> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg 
>>>> wrote:
>>>>
>>>>> Thank you David for reporting this.
>>>>>
>>>>> Seems due to my recent changes. I reproduce the failure locally and
>>>>> will look at this tomorrow.
>>>>>
>>>>> With the distributed cluster state updates i've introduced a
>>>>> randomization for using either Overseer based cluster state updates or
>>>>> distributed cluster state updates in tests. This failure seems to happen 
>>>>> in
>>>>> the distributed state update case. I suspect it is due to Overseer
>>>>> returning less stats than expected by the test (which is expected: 
>>>>> Overseer
>>>>> cannot return stats about cluster state updates if it does not handle
>>>>> cluster state updates).
>>>>>
>>>>> The following line in the logs tells that the run is using distributed
>>>>> cluster state:
>>>>> 972874 INFO  (jetty-launcher-8973-thread-2) [ ]
>>>>> o.a.s.c.DistributedClusterStateUpdater Creating
>>>>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
>>>>> will be using distributed cluster state updates.
>>>>>
>>>>> Ilan
>>>>>
>>>>>
>>>>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley 
>>>>> wrote:
>>>>>
>>>>>> I encountered a failure from OverseerStatusTest locally.  According
>>>>>> to our test failure trends, this guy only just recently started failing
>>>>>> ~4-5% of the time, but previously was fine.  Only master branch.
>>>>>>
>>>>>>
>>>>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>>>>>
>>>>>> ~ David Smiley
>>>>>> Apache Lucene/Solr Search Developer
>>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>>
>>>>>


Re: OverseerStatusTest recent failures

2021-02-21 Thread Ilan Ginzburg
Searching in my jenkins folder for failures of this test (label:jenkins
"FAILED:  org.apache.solr.cloud.OverseerStatusTest.test") 26 emails match.
Searching for all jenkins master builds emails since the first failure
email found above (2 days ago), I see 40 messages.
26 over 40 is not far from the expected 50% failure rate.
I believe the ratio in the graph you sent David (currently at 5.7%) is
averaged over a week, and includes failures from all branches (did some
other stats on jenkins emails that tend to confirm this assumption).

On Sun, Feb 21, 2021 at 10:53 AM Ilan Ginzburg  wrote:

> Yes Marcus this is the commit.
>
> David I would have expected 50% failures, as 50% of the runs use
> distributed updates. I’ll try to understand better as I fix the issue.
>
> Ilan
>
> On Sun 21 Feb 2021 at 06:17, David Smiley  wrote:
>
>> Interesting.  Do you have a guess as to why the failures there are ~5%
>> and not 100% reproducible?
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg  wrote:
>>
>>> Indeed the issue is due to my changes.
>>>
>>> In OverseerStatusCmd I've skipped some stat collection when running in
>>> distributed cluster state updates mode because I thought these were only
>>> stats related to cluster state updates.
>>> Obviously that was too aggressive and some of the stats are related to
>>> the Collection API.
>>>
>>> I will make sure to skip returning only the stats that are related to
>>> cluster state updater and restore returning collection api stats (when
>>> running in distributed cluster updates mode, otherwise all stats are
>>> returned).
>>>
>>> Tomorrow...
>>>
>>> Ilan
>>>
>>> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg 
>>> wrote:
>>>
>>>> Thank you David for reporting this.
>>>>
>>>> Seems due to my recent changes. I reproduce the failure locally and
>>>> will look at this tomorrow.
>>>>
>>>> With the distributed cluster state updates i've introduced a
>>>> randomization for using either Overseer based cluster state updates or
>>>> distributed cluster state updates in tests. This failure seems to happen in
>>>> the distributed state update case. I suspect it is due to Overseer
>>>> returning less stats than expected by the test (which is expected: Overseer
>>>> cannot return stats about cluster state updates if it does not handle
>>>> cluster state updates).
>>>>
>>>> The following line in the logs tells that the run is using distributed
>>>> cluster state:
>>>> 972874 INFO  (jetty-launcher-8973-thread-2) [ ]
>>>> o.a.s.c.DistributedClusterStateUpdater Creating
>>>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
>>>> will be using distributed cluster state updates.
>>>>
>>>> Ilan
>>>>
>>>>
>>>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley 
>>>> wrote:
>>>>
>>>>> I encountered a failure from OverseerStatusTest locally.  According to
>>>>> our test failure trends, this guy only just recently started failing ~4-5%
>>>>> of the time, but previously was fine.  Only master branch.
>>>>>
>>>>>
>>>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>>>>
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>>
>>>>


Re: OverseerStatusTest recent failures

2021-02-21 Thread Ilan Ginzburg
Yes Marcus this is the commit.

David I would have expected 50% failures, as 50% of the runs use
distributed updates. I’ll try to understand better as I fix the issue.

Ilan

On Sun 21 Feb 2021 at 06:17, David Smiley  wrote:

> Interesting.  Do you have a guess as to why the failures there are ~5% and
> not 100% reproducible?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sat, Feb 20, 2021 at 6:41 PM Ilan Ginzburg  wrote:
>
>> Indeed the issue is due to my changes.
>>
>> In OverseerStatusCmd I've skipped some stat collection when running in
>> distributed cluster state updates mode because I thought these were only
>> stats related to cluster state updates.
>> Obviously that was too aggressive and some of the stats are related to
>> the Collection API.
>>
>> I will make sure to skip returning only the stats that are related to
>> cluster state updater and restore returning collection api stats (when
>> running in distributed cluster updates mode, otherwise all stats are
>> returned).
>>
>> Tomorrow...
>>
>> Ilan
>>
>> On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg 
>> wrote:
>>
>>> Thank you David for reporting this.
>>>
>>> Seems due to my recent changes. I reproduce the failure locally and will
>>> look at this tomorrow.
>>>
>>> With the distributed cluster state updates i've introduced a
>>> randomization for using either Overseer based cluster state updates or
>>> distributed cluster state updates in tests. This failure seems to happen in
>>> the distributed state update case. I suspect it is due to Overseer
>>> returning less stats than expected by the test (which is expected: Overseer
>>> cannot return stats about cluster state updates if it does not handle
>>> cluster state updates).
>>>
>>> The following line in the logs tells that the run is using distributed
>>> cluster state:
>>> 972874 INFO  (jetty-launcher-8973-thread-2) [ ]
>>> o.a.s.c.DistributedClusterStateUpdater Creating
>>> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
>>> will be using distributed cluster state updates.
>>>
>>> Ilan
>>>
>>>
>>> On Sat, Feb 20, 2021 at 3:00 PM David Smiley  wrote:
>>>
>>>> I encountered a failure from OverseerStatusTest locally.  According to
>>>> our test failure trends, this guy only just recently started failing ~4-5%
>>>> of the time, but previously was fine.  Only master branch.
>>>>
>>>>
>>>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>


Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-20 Thread Ilan Ginzburg
Congratulations, Mike!

- Ilan

On Wed, Feb 17, 2021 at 10:32 PM Anshum Gupta 
wrote:

> Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
> President position.
>
> This year we nominated and elected Michael Sokolov as the Chair, a
> decision that the board approved in its February 2021 meeting.
>
> Congratulations, Mike!
>
> --
> Anshum Gupta
>


Re: OverseerStatusTest recent failures

2021-02-20 Thread Ilan Ginzburg
Indeed the issue is due to my changes.

In OverseerStatusCmd I've skipped some stat collection when running in
distributed cluster state updates mode because I thought these were only
stats related to cluster state updates.
Obviously that was too aggressive and some of the stats are related to the
Collection API.

I will make sure to skip returning only the stats that are related to
cluster state updater and restore returning collection api stats (when
running in distributed cluster updates mode, otherwise all stats are
returned).

Tomorrow...

Ilan

On Sun, Feb 21, 2021 at 12:22 AM Ilan Ginzburg  wrote:

> Thank you David for reporting this.
>
> Seems due to my recent changes. I reproduce the failure locally and will
> look at this tomorrow.
>
> With the distributed cluster state updates i've introduced a randomization
> for using either Overseer based cluster state updates or distributed
> cluster state updates in tests. This failure seems to happen in the
> distributed state update case. I suspect it is due to Overseer returning
> less stats than expected by the test (which is expected: Overseer cannot
> return stats about cluster state updates if it does not handle cluster
> state updates).
>
> The following line in the logs tells that the run is using distributed
> cluster state:
> 972874 INFO  (jetty-launcher-8973-thread-2) [ ]
> o.a.s.c.DistributedClusterStateUpdater Creating
> DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
> will be using distributed cluster state updates.
>
> Ilan
>
>
> On Sat, Feb 20, 2021 at 3:00 PM David Smiley  wrote:
>
>> I encountered a failure from OverseerStatusTest locally.  According to
>> our test failure trends, this guy only just recently started failing ~4-5%
>> of the time, but previously was fine.  Only master branch.
>>
>>
>> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>


Re: OverseerStatusTest recent failures

2021-02-20 Thread Ilan Ginzburg
Thank you David for reporting this.

Seems due to my recent changes. I reproduce the failure locally and will
look at this tomorrow.

With the distributed cluster state updates i've introduced a randomization
for using either Overseer based cluster state updates or distributed
cluster state updates in tests. This failure seems to happen in the
distributed state update case. I suspect it is due to Overseer returning
less stats than expected by the test (which is expected: Overseer cannot
return stats about cluster state updates if it does not handle cluster
state updates).

The following line in the logs tells that the run is using distributed
cluster state:
972874 INFO  (jetty-launcher-8973-thread-2) [ ]
o.a.s.c.DistributedClusterStateUpdater Creating
DistributedClusterStateUpdater with useDistributedStateUpdate=true. Solr
will be using distributed cluster state updates.

Ilan


On Sat, Feb 20, 2021 at 3:00 PM David Smiley  wrote:

> I encountered a failure from OverseerStatusTest locally.  According to our
> test failure trends, this guy only just recently started failing ~4-5% of
> the time, but previously was fine.  Only master branch.
>
>
> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.OverseerStatusTest.test
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


Re: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-19 Thread Ilan Ginzburg
Congratulations Jan!

On Thu, Feb 18, 2021 at 7:56 PM Anshum Gupta  wrote:
>
> Hi everyone,
>
> I’d like to inform everyone that the newly formed Apache Solr PMC nominated
> and elected Jan Høydahl for the position of the Solr PMC Chair and Vice
> President. This decision was approved by the board in its February 2021
> meeting.
>
> Congratulations Jan!
>
> --
> Anshum Gupta

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] ConfigSet ZK to file system fallback

2021-01-24 Thread Ilan Ginzburg
An aspect that would be interesting to consider IMO is upgrade and
configuration changes.
For example a collection in use across Solr version upgrade might require
different configuration (config set) with the old and new Solr versions.
Solr itself can require changes in config across updates.

Backward compatibility is the usual answer (the new code continues working
with the old config that can be updated once all nodes have been deployed)
but this imposes constraints on new code.
If there was a way for the new Solr code to "magically" use a different
config set for the collection (and for Solr config in general) there would
be more freedom to add or change features, change default behavior across
Solr versions etc.

Ilan

On Sat 23 Jan 2021 at 22:22, Gus Heck  wrote:

> I'm in agreement with Eric here that fewer ways (or at least a clearer
> default way) of supplying resources would be better. Additionally, it
> should be easy to specify that this resource that I've shared should be
> loaded on a per SolrCore or per node basis (or even better per collection
> present on the node, accessible under a standard name to replicas belonging
> to that collection?). Not many cases beyond the simplest single collection
> install few shards where you want a 1GB resource to be duplicated in memory
> across N cores running on the same node, though obviously there's ample
> cases where the 10k stop words file is meant to differ across collections.
>
> As it stands Eric's list seems like something that should be in the
> documentation somewhere just so people can properly troubleshoot where
> something they don't expect to be loaded is getting loaded from, or why
> their attempts to load something new aren't working...  especially if it
> were ordered to show the precedence of these options.
>
> As for ease of editing configurations, I've long felt that this should be
> possible via the admin UI though there's been much worry about security
> implications there. Personally, I think that those concerns are resolvable,
> but have not found time to make that case. Aside from that I think we need
> to support tooling to enable easy management of config sets rather than
> expanding the possible number of places the configurations might get loaded
> from.
>
> Several years ago I wrote a plugin for gradle that is very very basic, but
> after some configuration so that it can see zookeeper, it will happily pull
> configs down and push them up for you which is convenient for keeping
> configs under version control during development. There's LOTS to improve
> there, most especially adding support to manage multiple configs at a time,
> and I had hoped that folks would use it and have suggestions,
> contributions, but I've got no indication that anyone but me uses it. (
> https://github.com/nsoft/solr-gradle)
>
> -Gus
>
> On Fri, Jan 22, 2021 at 8:19 AM Eric Pugh 
> wrote:
>
>> There is a lot in here ;-).
>>
>> With the caveat that I don’t have recent experience that many of you do
>> with massive solr clusters, I think that we need to commit to fewer, not
>> more, ways of maintaining the supporting resources that these clusters
>> need..   I’d like to see ways of managing our Solr clusters that encourage
>> easy change and experimentation, and encourage us to separate the physical
>> layer (version of Solr, networking setup, packages used) from the logical
>> layer (individual collections and their supporting code and resources).
>>
>> I think the configSet was a huge jump forward..   My workflow is to think
>> 1) What’s unusual about this Solr setup?  What is the physical layer need
>> to be?  Special package?  Special code?   Build a Docker image.
>> 2) Fire up a three node Solr cluster, wait till it’s up and responsive
>> via checking APIs.
>> 3) Now think about my specific use case.   What collections do I need?
>> Is it just 1, or is it 5 or 10 collections.  Are they on the same configSet
>> or different.   Great, zip up the configSet and pop it into Solr via APIs.
>>
>> 4) Create the collections in the shapes I need with the APIs, and now
>> start iterating on what I need to do.  Use the APIs to create fields, or
>> set up different ParamSets.
>>
>> However, with configSets we only did half the job, because we still don’t
>> have a single well understood way of handling Jars and other resources.  We
>> have many ways of doing it.   Which generates constant user confusion and
>> contributes to the perspective that “Solr is hard to use”.
>>
>> Right now, across the Solr landscape I can think of many ways of adding
>> “external” files to my Solr:
>>
>> 1) Classic ./lib as a place to put things.
>> 2) The new to me solr.allow.unsafe.resourceloading=true approach
>> 3) The userfiles directory in Solr accessed by streaming expressions load
>> function.
>> 4) The “package store” for packages located in file store
>> 5) The blob store .system concept from before the package store
>> 6) the LTR feature store (which I guess is 

Re: Old programmers do fade away

2020-12-30 Thread Ilan Ginzburg
Hey Eric,

Sad and happy to read your message. You've been a clear voice in the Lucene
Solr community and I was always AMAZED how willing you are to help and
explain, over and over again when needed.
That's the sad part.

The happy part is that those squirrels do need to learn and the electric
fence at the top does sound reasonable. Hope tomatoes don't care too much
about lack of freedom.

Really appreciated interacting with you in my short time here. Have fun!

Ilan

On Wed, Dec 30, 2020 at 3:09 PM Erick Erickson 
wrote:

> 40 years is enough. OK, it's only been 39 1/2 years. Dear Lord, has it
> really been that long? Programming's been fun, I've gotten to solve puzzles
> every day. The art and science of programming has changed over that time.
> Let me tell you about the joys of debugging with a Z80 stack emulator that
> required that you to look on the stack for variables and trace function
> calls by knowing how to follow frame pointers. Oh the tedium! Oh the (lack
> of) speed! Not to mention that 64K of memory was all you had to work with.
> I had a co-worker who could predict the number of bytes by which the
> program would shrink based on extracting common code to functions. The
> "good old days"...weren't...
>
> I'd been thinking that I'd treat Lucene/Solr as a hobby, doing occasional
> work on it when I was bored over long winter nights. I've discovered,
> though, that I've been increasingly reluctant to crack open the code. I
> guess that after this much time, I'm ready to hang up my spurs. One major
> factor is the realization that there's so much going on with Lucene/Solr
> that simply being aware of the changes, much less trying to really
> understand them, isn't something I can do casually.
>
> I bought a welder and find myself more interested in playing with that
> than programming. Wait until you see the squirrel-proof garden enclosure
> I'm building with it. If my initial plan doesn't work, next up is an
> electric fence along the top. The laser-sighted automatic machine gun
> emplacement will take more planning...Ahhh, probably won't be able to get a
> permit from the township for that though. Do you think the police would
> notice? Perhaps I should add that the local police station is two blocks
> away and in the line of fire. But an infrared laser powerful enough to
> "pre-cook" them wouldn't be as obvious would it?
>
> Why am I so fixated on squirrels? One of the joys of gardening is fresh
> tomatoes rather than those red things they sell in the store. The squirrels
> ATE EVERY ONE OF MY TOMATOES WHILE THEY WERE STILL GREEN LAST YEAR! And the
> melons. In the words of B. Bunny: "Of course you realize this means war" (
> https://www.youtube.com/watch?v=4XNr-BQgpd0)...
>
> Then there's working in the garden and landscaping, the desk I want to
> build for my wife, travel as soon as I can, maybe seeing if some sailboats
> need crew...you get the idea.
>
> It's been a privilege to work with this group, you're some of the best and
> brightest. Many thanks to all who've generously given me their time and
> guidance. It's been a constant source of amazement to me how willing people
> are to take time out of their own life and work to help me when I've had
> questions. I owe a lot of people beers ;)
>
> I'll be stopping my list subscriptions, Slack channels (dm me if you need
> something), un-assigning any JIRAs and that kind of thing over the next
> while. If anyone's interested in taking over the BadApple report, let me
> know and I can put the code up somewhere. It takes about 10 minutes to do
> each week. I won't disappear entirely, things like the code-reformatting
> effort are nicely self-contained for instance and something I can to
> casually.
>
> My e-mail address if you need to get in touch with me is: "
> erick.erick...@gmail.com". There's a correlation between gmail addresses
> that are just a name with no numbers and a person's age... A co-worker came
> over to my desk in pre-historical times and said "there's this new mail
> service you might want to sign up for"... Like I said, 40 years is enough.
>
> Best to all,
> Erick
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: [DISCUSS] Cross Data-Center Replication in Apache Solr

2020-12-05 Thread Ilan Ginzburg
That's an interesting initiative Anshum!

I can see at least two different approaches here, your mention of SolrJ
seems to hint at the first one:
1. Get the data as it comes from the client and fork it to local and remote
data centers,
2. Create (an asynchronous) stream replicating local data center data to
remote.

Option 1 is strongly consistent but adds latency and potentially blocking
on the critical path.
Option 2 could look like remote PULL replicas, might have lower impact on
the local data center but has to deal with the remote data center always
being somewhat behind. If the client application can handle that, the
performance and efficiency gain (as well as simpler implementation? It
doesn't require another persistence layer) might be worth it...

Ilan

On Fri, Dec 4, 2020 at 5:24 PM Anshum Gupta  wrote:

> Hi everyone,
>
>
> Large scale Solr installations often require cross data-center replication
> in order to achieve data replication for both, access latency reasons as
> well as disaster recovery. In the past users have either designed their own
> solutions to deal with this or have tried to rely on the now-deprecated
> CDCR.
>
>
> It would be really good to have support for cross data-center replication
> within Solr, that is offered and supported by the community. This would
> allow the effort around this shared problem to converge.
>
>
> I’d like to propose a new solution based on my experiences at my day job.
> The key points about this approach:
>
>1. Uses an external, configurable, messaging system in the middle for
>actual replication/mirroring.
>2. We offer an abstraction and some default implementations based on
>what we can support and what users really want. An example here would be
>Kafka.
>3. This would be a separate repository allowing it to have its own
>release cadence. We shouldn’t have to release this with every Solr release
>as the overlap is just limited to SolrJ interactions.
>
>
> I’ll share a more detailed and evolving document soon with the design for
> everyone else to contribute to but wanted to share this as I’m starting to
> work on this and wanted to avoid parallel efforts towards the same end-goal.
>
> --
> Anshum Gupta
>


Re: Welcome Houston Putman to the PMC

2020-12-01 Thread Ilan Ginzburg
Congratulations Houston!

On Wed 2 Dec 2020 at 00:17, Munendra S N  wrote:

> Congratulations and welcome, Houston
>
> On Wed, Dec 2, 2020 at 3:37 AM Timothy Potter 
> wrote:
>
>> Welcome Houston!
>>
>> On Tue, Dec 1, 2020 at 2:43 PM Tomás Fernández Löbbe <
>> tomasflo...@gmail.com> wrote:
>>
>>> Welcome Houston!!
>>>
>>> On Tue, Dec 1, 2020 at 1:28 PM Anshum Gupta 
>>> wrote:
>>>
 Congratulations and welcome, Houston!

 On Tue, Dec 1, 2020 at 1:19 PM Mike Drob  wrote:

> I am pleased to announce that Houston Putman has accepted the PMC's
> invitation to join.
>
> Congratulations and welcome, Houston!
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

 --
 Anshum Gupta

>>>


Re: Welcome Julie Tibshirani as Lucene/Solr committer

2020-11-18 Thread Ilan Ginzburg
Welcome Julie and congrats!

On Thu, Nov 19, 2020 at 3:51 AM Julie Tibshirani 
wrote:

> Thank you for the warm welcome! It’s a big honor for me -- I’ve been a
> Lucene fan since the start of my software career. I’m excited to contribute
> to such a great project.
>
> I’m a developer at Elastic focused on core search features. My
> professional background is in information retrieval and data systems. I
> also have an interest in statistical computing and machine learning
> software. I’m originally from Canada but have lived in the SF Bay Area for
> many years now. Some of my favorite things…
> * Color: purple
> * Album: Siamese Dream
> * Java keyword: final
>
> Julie
>
> On Wed, Nov 18, 2020 at 6:33 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Welcome Julie!
>>
>> On Thu, 19 Nov, 2020, 12:10 am Erick Erickson, 
>> wrote:
>>
>>> Welcome Julie!
>>>
>>> > On Nov 18, 2020, at 1:21 PM, Alexandre Rafalovitch 
>>> wrote:
>>> >
>>> > Juliet from the house of Elasticsearch meets a interesting,
>>> relevancy-aware  committer from the house of Solr.
>>> >
>>> > Such a romantic beginning. Not sure I want to know the end of that
>>> heroine's journey.
>>> >
>>> > :-)
>>> >
>>> > On Wed., Nov. 18, 2020, 12:59 p.m. Dawid Weiss, 
>>> wrote:
>>> >
>>> > Congratulations and welcome, Julie.
>>> >
>>> > I think juliet is not a bad nick at all, you just need to who -all |
>>> grep "romeo"... :)
>>> >
>>> > Dawid
>>> >
>>> > On Wed, Nov 18, 2020 at 4:08 PM Michael Sokolov 
>>> wrote:
>>> > I'm pleased to announce that Julie Tibshirani has accepted the PMC's
>>> > invitation to become a committer.
>>> >
>>> > Julie, the tradition is that new committers introduce themselves with
>>> > a brief bio.
>>> >
>>> > I think we may still be sorting out the details of your Apache account
>>> > (julie@ may have been taken?), but as soon as that has been sorted out
>>> >  and karma has been granted, you can use your new powers to add
>>> > yourself to the committers section of the Who We Are page on the
>>> > website: 
>>> >
>>> > Congratulations and welcome!
>>> >
>>> > Mike Sokolov
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>


Re: Index documents in async way

2020-10-09 Thread Ilan Ginzburg
I like the idea.

Two (main) points are not clear for me:
- Order of updates: If the current leader fails (its tlog becoming
inaccessible) and another leader is elected and indexes some more,
what happens when the first leader comes back? What does it do with
its tlog and how to know which part needs to be indexed and which part
should not (more recent versions of same docs already indexed by the
other leader).
- Durability: If the tlog only exists on a single node before the
client gets an ACK and that node fails "forever", the update is lost.
We need a way to even detect that an update was lost (which might not
be obvious).

Unclear to me if your previous answers address these points.

Ilan

On Fri, Oct 9, 2020 at 3:54 PM Cao Mạnh Đạt  wrote:
>
> Thank you Tomas
>
> >Atomic updates, can those be supported? I guess yes if we can guarantee that 
> >messages are read once and only once.
> It won't be straightforward since we have multiple consumers on the tlog 
> queue. But it is possible with appropriate locking
>
> >I'm guessing we'd need to read messages in an ordered way, so it'd be a 
> >single Kafka partition per Solr shard, right? (Don't know Pulsar)
> It will likely be the case, but like I said async updates will be the first 
> piece, switching to using Kafka gonna be an another area to look at
>
> >May be difficult to determine what replicas should do after a document 
> >update failure. Do they continue processing (which means if it was a 
> >transient error they'll become inconsistent) or do they stop? but if none of 
> >the replicas could process the document then they would all go to recovery?
> Good question, I had not thought about this, but I think the current model of 
> SolrCloud needs to answer this question too. i.e: the leader failed but had 
> others success.
>
> > maybe try to recover from other active replicas?
> I think it is totally possible
>
> > Maybe we could have a way to stream those responses out? (i.e. via another 
> > queue)? Maybe with an option to only stream out errors or something.
> It can be, but for REST users, it gonna be difficult for them
>
> >I don't think that'c correct? see DUH2.doNormalUpdate.
> You're right, we actually run the update first then writing to the tlog later
>
> > How would this work in your mind with one of the distributed queues?
> For a distributed queue, basically for every commit we need to store the 
> latest consumed offset corresponding to the commit. An easy solution here can 
> be blocking everything then do the commit, the commit data will store the 
> latest consumed offset
>
> On Fri, Oct 9, 2020 at 11:49 AM Tomás Fernández Löbbe  
> wrote:
>>
>> Interesting idea Đạt. The first questions/comments that come to my mind 
>> would be:
>> * Atomic updates, can those be supported? I guess yes if we can guarantee 
>> that messages are read once and only once.
>> * I'm guessing we'd need to read messages in an ordered way, so it'd be a 
>> single Kafka partition per Solr shard, right? (Don't know Pulsar)
>> * May be difficult to determine what replicas should do after a document 
>> update failure. Do they continue processing (which means if it was a 
>> transient error they'll become inconsistent) or do they stop? maybe try to 
>> recover from other active replicas? but if none of the replicas could 
>> process the document then they would all go to recovery?
>>
>> > Then the user will call another endpoint for tracking the response like 
>> > GET status_updates?trackId=,
>> Maybe we could have a way to stream those responses out? (i.e. via another 
>> queue)? Maybe with an option to only stream out errors or something.
>>
>> > Currently we are also adding to tlog first then call writer.addDoc later
>> I don't think that'c correct? see DUH2.doNormalUpdate.
>>
>> > I think it won't be very different from what we are having now, since on 
>> > commit (producer threads do the commit) we rotate to a new tlog.
>> How would this work in your mind with one of the distributed queues?
>>
>> I think this is a great idea, something that needs to be deeply thought, but 
>> could make big improvements. Thanks for bringing this up, Đạt.
>>
>> On Thu, Oct 8, 2020 at 7:39 PM Đạt Cao Mạnh  wrote:
>>>
>>> > Can there be a situation where the index writer fails after the document 
>>> > was added to tlog and a success is sent to the user? I think we want to 
>>> > avoid such a situation, isn't it?
>>> > I suppose failures would be returned to the client one the async response?
>>> To make things more clear, the response for async update will be something 
>>> like this
>>> { "trackId" : "" }
>>> Then the user will call another endpoint for tracking the response like GET 
>>> status_updates?trackId=, the response will tell that 
>>> whether the update is in_queue, processing, succeed or failed. Currently we 
>>> are also adding to tlog first then call writer.addDoc later.
>>> Later we can convert current sync operations by waiting until the update 
>>> gets 

Re: Solr Alpha (EA) release of Reference Branch

2020-10-07 Thread Ilan Ginzburg
TBH, a PR with more than 1400 changed files is hard to look at. How many of
us will invest a few weeks at least to really understand it? We should
assume that if we don't bring these changes piece by piece, we risk having
an unstable version of SolrCloud for a while.
When I look at some ongoing (unrelated) PR's judged too complex and being
asked to split and simplify, this one would be at least two orders of
magnitude "worse".

Maybe this means releasing 9 without this branch but with all recent
changes so we have a relatively stable release and merging the branch into
master right afterwards as the base for Solr 10, then accepting a
relatively long "baking" and bug fixing period?
A beta release of 10 might then make more sense, since it will be "release
candidate" type code and not a separate branch that has diverged.

Ilan

Le mer. 7 oct. 2020 à 13:43, Jan Høydahl  a écrit :

> We want/need these improvements, for sure!
> Agree treating this whole thing as a black box is dangerous. At the same
> time I realize that this will be one or a few huge merges anyway!
> Rob’s suggestion is interesting! If at all possible to get the branch up
> to date with master, and make a PR, then it’s a great way for all of us to
> have a deeper look.
> And, who knows, perhaps end of this month there is consensus for merging
> into master sooner rather than later?
> Especially if we have had reliably passing Jenkins runs of the new branch,
> with all tests enabled, for some time!
> Risk is of course that 9.0.0 could be an unusually unstable release, but
> people don’t expect bug free x.0.0 releases. And to avoid cherry-pick hell
> for 6-12 months, perhaps that’s not a bad option after all.
>
> Jan
>
> 7. okt. 2020 kl. 07:48 skrev Robert Muir :
>
>
> On Tue, Oct 6, 2020 at 10:45 PM Anshum Gupta 
> wrote:
>
>>
>> I haven’t looked at the current ref branch recently, but the folks who
>> have looked at it, if you think that this code can be merged into master
>> even as big chunks, that’d be the most confidence building way forward.
>>
>>
> +1 for considering this approach. merge up master into the branch, and
> make a big-ass PR to merge back. let people help review (maybe improve) the
> change as a whole. It's just a big PR, some huge ones like this have been
> done in lucene before too, unofficially called "unfuck" branches (sorry if
> you are offended at my terminology). Sometimes you just fix, refactor,
> cleanup, and keep iterating and see where it can lead. sometimes you revert
> a bunch of commits because you followed the wrong rabbit-hole, etc.
> Sometimes it may seem inconvenient, but I think we can all agree It's
> important to have folks that want to not just take the small fix, but see
> where it can go and make the whole thing better. Remember Mike's flexible
> indexing branch?
>
> So why not try this way, look at actual code changes and try to get into
> the master branch? Of course Uwe is willing to point build resources at it
> either way, but if you want to maximize testing, start with the devs and
> everyone's jenkins first before throwing at users. Master branch will get
> you more testing, for sure.
>
>
>


Re: Solr Alpha (EA) release of Reference Branch

2020-10-06 Thread Ilan Ginzburg
 outlined
The biggest risk currently is the absorption of the search side async
work from master - I'm familiar with that, I've worked on it myself,
the code involved is derived from an old branch of mine, but async is
a whole different animal and trying to nail it without any downsides
to the old synchronous model is a tough nut
one that I was already battling on the dist update side, so it's good
stuff to work on and do, but its taking some effort to get in shape

On Tue, Oct 6, 2020 at 8:00 PM Tomás Fernández Löbbe
 wrote:
>
> > Let's say we cut 9x and now there is a new master taken from the reference 
> > branch.
> I never said “make a new master”, I said merge changes in ref branch into 
> master. If things are broken into pieces like Ishan is suggesting, those 
> changes can be merged into 9.x too. I only suggested this because you felt 
> unsure about merging to master now and I guess this is due to fear of 
> introducing bugs so close to a potential 9.0 release, is that not right?
>
>
> > We will never be able to reconcile these 2 branches
> Sorry, but how is that different if we do an alpha release from the branch 
> now? What would be the process after that? Let's say people don't find issues 
> and we want to merge those changes, what’s the plan then?
>
> > Choice 1:
> I’m fine with choice 1 if that’s what you want, as long as it’s not an 
> official release for the reasons stated above.
>
>
> > I promise to do code review & cleanup as much as possible. But I'm hesitant 
> > to give a stamp of approval to make it THE official release
> What do you mean? I thought this is what you were suggesting, make an 
> official release from the reference_impl branch?
>
>
> I think Ilan’s last email is on spot, and I agree 100% with what he can 
> express much better than I can :)
>
> > Mark's descriptions in Slack go in the right way but are still too high 
> > level
> Can someone share those here? or in Jira?
>
> On Tue, Oct 6, 2020 at 5:09 AM Noble Paul  wrote:
>>
>> > I think the danger is high to treat this branch as a black box (or an "all 
>> > or nothing").
>>
>> True Ilan.  Ideally, I would like a few of us to study the code &
>> start pulling in changes we are confident of (even to 8x branch, why
>> not). We cannot burden a single developer to do everything.
>>
>> This cannot be a task just for one or 2 devs. We all will have to work
>> together to decompose the changes and digest them into master. I can
>> do my bit.
>>
>> But, I'm sure we may hit a point where certain changes cannot be
>> isolated and absorbed. We will have to collectively make a call, how
>> to absorb them
>>
>> On Tue, Oct 6, 2020 at 9:00 PM Ishan Chattopadhyaya
>>  wrote:
>> >
>> >
>> > I'm willing to help and I believe others will too if the amount of work 
>> > for contributing is reasonable (i.e. not a three months effort).
>> >
>> > I looked into the possibility of doing so. To me, it seemed to be that it 
>> > is very hard to do so: possibly 1 year project for me. Problem is that it 
>> > is hard to pull out a particular class of improvements (say thread 
>> > management improvement) and have all tests pass with it (because tests 
>> > have gotten extensive improvements of their own) and also observe the 
>> > effect of the improvement. IIUC, every improvement to Solr seemed to 
>> > require many iterations to get the tests happy. I remember Mark telling me 
>> > that it may not even be possible for him to do something like that (i.e. 
>> > bring all changes into master as tiny pieces).
>> >
>> > What I volunteered to do, however, is to decompose roughly all the general 
>> > improvements into smaller, manageable commits. However, making sure all 
>> > tests pass at every commit point is beyond my capability.
>> >
>> > On Tue, 6 Oct, 2020, 3:10 pm Ilan Ginzburg,  wrote:
>> >>
>> >> Another option to integrate this work into the main code line would be to 
>> >> understand what changes have been made and where (Mark's descriptions in 
>> >> Slack go in the right way but are still too high level), and then port or 
>> >> even redo them in main, one by one.
>> >>
>> >> I think the danger is high to treat this branch as a black box (or an 
>> >> "all or nothing"). Using the merging itself to change our understanding 
>> >> and increase our knowledge of what was done can greatly reduce the risk.
>> >>
>> >> We do develop new features in Solr 9 without beta releasin

Re: Solr Alpha (EA) release of Reference Branch

2020-10-06 Thread Ilan Ginzburg
Another option to integrate this work into the main code line would be to
understand what changes have been made and where (Mark's descriptions in
Slack go in the right way but are still too high level), and then port or
even redo them in main, one by one.

I think the danger is high to treat this branch as a black box (or an "all
or nothing"). Using the merging itself to change our understanding and
increase our knowledge of what was done can greatly reduce the risk.

We do develop new features in Solr 9 without beta releasing them, so if we
port Mark's improvements by small chunks (and maybe in the process decide
that some should not be ported or not now) I don't see why this can't
integrate to become like other improvements done to the code. If specific
changes do require a beta release, do that release from master and pick the
right moment.

I'm willing to help and I believe others will too if the amount of work for
contributing is reasonable (i.e. not a three months effort). This requires
documenting the changes done in that branch, pointing to where these
changes happened and then picking them up one by one and porting them more
or less independently of each other. We might only port a subset of changes
by the time 9.0 is released, that's fine we can continue in following
releases.

My 2 cents...
Ilan

Le mar. 6 oct. 2020 à 09:56, Noble Paul  a écrit :

> Yes, A docker image will definitely help. I wasn't trying to downplay that
>
> On Tue, Oct 6, 2020 at 6:55 PM Ishan Chattopadhyaya
>  wrote:
> >
> >
> > > Docker is not a big requirement for large scale installations. Most of
> them already have their own install scripts. Availability of docker is not
> important for them. If a user is only encouraged to install Solr because of
> a docker image , most likely they are not running a large enough cluster
> >
> > I disagree, Noble. Having a docker image us going to be useful to some
> clients, with complex usecases. Great point, David!
> >
> > On Tue, 6 Oct, 2020, 1:09 pm Ishan Chattopadhyaya, <
> ichattopadhy...@gmail.com> wrote:
> >>
> >> As I said, I'm *personally* not confident in putting such a big
> changeset into master that wasn't vetted in a real user environment widely.
> I have, in the past, done enough bad things to Solr (directly or
> indirectly), and I don't want to repeat the same. Also, I'll be very
> uncomfortable if someone else did so.
> >>
> >> Having said this, if someone else wants to port the changes over to
> master *without first getting enough real world testing*, feel free to do
> so, and I can focus my efforts elsewhere.
> >>
> >> On Tue, 6 Oct, 2020, 9:22 am Tomás Fernández Löbbe, <
> tomasflo...@gmail.com> wrote:
> >>>
> >>> I was thinking (and I haven’t flushed it out completely but will throw
> the idea) that an alternative approach with this timeline could be to cut
> 9x branch around November/December? And then you could merge into master,
> it would have the latest  changes from master plus the ref branch changes.
> From there any nightly build could be use to help test/debug.
> >>>
> >>> That said I don’t know for sure what are the changes in the branch
> that do not belong in 9. The problem with them being 10x only is that
> backports would potentially be more difficult for all the life of 9.
> >>>
> >>> On Mon, Oct 5, 2020 at 4:54 PM Noble Paul 
> wrote:
> 
>  >I don't think it can be said what committers do and don't do with
> regards to running Solr.  All of us would answer this differently and at
> different points in time.
> 
>  " I have run it in one large cluster, so it is certified to be bug
> free/stable" I don't think it's a reasonable approach. We need as much
> feedback from our users because each of them stress Solr in a different
> way. This is not to suggest that committers are not doing testing or their
> tests are not valid. When I talk to the committers out here they say they
> do not see any performance stability issues at all. But, my client reports
> issues on a day to day basis.
> 
> 
> 
>  > Definitely publish a Docker image BTW -- it's the best way to try
> out any software.
> 
>  Docker is not a big requirement for large scale installations. Most
> of them already have their own install scripts. Availability of docker is
> not important for them. If a user is only encouraged to install Solr
> because of a docker image , most likely they are not running a large enough
> cluster
> 
> 
> 
>  On Tue, Oct 6, 2020, 6:30 AM David Smiley  wrote:
> >
> > Thanks so much for your responses Ishan... I'm getting much more
> information in this thread than my attempts to get questions answered on
> the JIRA issue months ago.  And especially,  thank you for volunteering for
> the difficult porting efforts!
> >
> > Tomas said:
> >>
> >>  I do agree with the previous comments that calling it "Solr 10"
> (even with the "-alpha") would confuse users, maybe use "reference"? or
> 

Removing Overseer

2020-10-05 Thread Ilan Ginzburg
I'm sharing the initial drop of a proposal to remove the Overseer from
SolrCloud.

https://docs.google.com/document/d/1u4QHsIHuIxlglIW6hekYlXGNOP0HjLGVX5N6inkj6Ok/

This is a structural change that I believe requires a large consensus
to be successful or even started. Feedback is most welcome and very
much awaited!

Thanks,
Ilan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Alpha (EA) release of Reference Branch

2020-10-03 Thread Ilan Ginzburg
Thanks Ishan for the initiative!
I think that’s a good idea if it allows testing that branch, assuming some
are ready to invest what it takes and run this in production (maybe not
with user facing prod traffic?).

I do not think naming it Solr 10 is a good idea though, as it is likely
very different from what will end up being in Solr 10 (and even from what
will be in Solr 9.

I do hope we manage to port to master all these improvements!

Ilan

On Sat 3 Oct 2020 at 21:42, Ishan Chattopadhyaya 
wrote:

> Hi Devs,
>
> As you might be aware, the reference_impl branch has a lot of improvements
> that we want to see in Solr master. However, it is currently a large
> deviation from master and hence the stability and reliability (though
> improved in certain aspects) remains to be tested in real production
> environments before we gain confidence in bringing those changes to master.
>
> I propose that we do a one off preview release from that branch, say Solr
> 10 alpha (early access) or any other name that someone suggests, so that
> users could try it out and report regressions or improvements etc.
>
> I volunteer to be the RM and planning to start the process around 1
> December-15 December timeframe. Until then, we can tighten the loose ends
> on the branch and plan for such a release.
>
> Is there any thoughts, concerns, questions?
>
> Regards,
> Ishan
>


Re: Backward compatability handling across major versions

2020-10-01 Thread Ilan Ginzburg
In my opinion, when we really need to break backward compatibility (be
it a change of API or of how features are made available, for example
Autoscaling), I think the friendly way to do it is to introduce the
new implementation first (co-existing with the old one!), deprecate
but keep the old way of doing, and after a few releases remove from
the code the old way of doing.

This gives users time and freedom of how to manage their transition
(and does not force a transition when upgrading to a specific
version), with both the old and new ways of doing things available for
a while.

Sometimes this is not possible and we have to be less friendly to our
users, but we should really try to limit these cases as much as
possible (implies discussions and exploring available options).

Ilan

On Thu, Oct 1, 2020 at 10:30 AM Noble Paul  wrote:
>
> In fact I was shocked at the cavalier attitude with which backward
> compatibility is broken. If we are going to make a backward
> incompatible change
>
> There should be a JIRA with the proper discussions
>
> * What is the change?
> * Why is the change important?
> * What is the strategy for someone who does a rolling upgrade?
> * Is it possible to avoid it?
> * Can the change be done in a backward compatible way so that the
> users are not inconvenienced
>
> On Thu, Oct 1, 2020 at 6:25 PM Ishan Chattopadhyaya
>  wrote:
> >
> > Hi Devs,
> > As per earlier discussions, we want to do a better job of handling major 
> > version upgrades, possibly support rolling upgrades wherever possible. This 
> > implies that we don't break backward compatibility without a strong reason 
> > and adequate discussion around it.
> >
> > Recently, there was a PR that attempted to sneak in a backward incompatible 
> > change to an endpoint for plugins (package management). This change was 
> > totally unrelated to the JIRA/PR and there was absolutely no discussion or 
> > even an attempt to address the upgrade strategy with that change. The 
> > attitude was a careless one, on the lines of we can break backward 
> > compatibility in a major release. 
> > https://github.com/apache/lucene-solr/pull/1758#discussion_r494134314
> >
> > Do we have any consensus on whether we need a separate JIRA or broader 
> > discussion on any backward compatibility breaks? Or shall we let these 
> > changes be sneaked in, unless someone notices very carefully a few lines of 
> > changes in a 25 class PR?
> >
> > Looking for some suggestions here.
> > Thanks and regards,
> > Ishan
>
>
>
> --
> -
> Noble Paul
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



What is "Solr core"?

2020-10-01 Thread Ilan Ginzburg
In code review/design discussions I've seen a few time comments made
about a feature or piece of code: "it doesn't belong in [Solr] core".

What's the definition of Solr "core" other than it being an IntelliJ
module? Does core have access to things that can't be accessed from
elsewhere? (like an OS kernel that can do processor tricks that use
code is not allowed to). Or is it a dependency graph where "core"
depends on nothing outside of core, but anything outside of core can
depend on core?

In other words, what's the cost of moving "outside of core" something
that's in core, and what's the value of doing so?

Thanks,
Ilan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Placement plugin PR commit - soon

2020-09-14 Thread Ilan Ginzburg
Advance notice:

I plan to commit to master/9.0 coming Wednesday September 16th the "Placement
plugin" PR  corresponding
to SOLR-14613 .
This will be the first drop of code for the replacement of the Autoscaling
code that was previously removed from 9.0.

This code is disabled by default and requires the user to set a
configuration in ZK's /clusterprops.json to be enabled.

Ilan


Re: Solr configuration options

2020-09-03 Thread Ilan Ginzburg
> > stop making life more difficult for everyone.
>
> >
>
> >> In order for clusterpropos.json to replace what's currently done with
> solr.xml, we'd need to introduce a mechanism to make configuration
> available out of the box
>
> >
>
> > How hard is it to insert a JSON into your ZK ?
>
> >
>
> > The problem with Solr developers is they only have one workflow in
>
> > their mind, the way they use it in their org. How many of us
>
> > understand the pain of restarting the entire cluster because we wish
>
> > to change a single config param for replica placement.
>
> >
>
> > We really do not care if a newcomer will be able to edit an XML file
>
> > in every node to use this feature. You know why? We really do not want
>
> > anyone to use this feature, Most of us are paid my our respective
>
> > organizations and as long as it meets their needs they're totally
>
> > happy. Why don't you all go back, and work on your own internal fork
>
> > of Solr if that is all that you guys want. Why even pretend that
>
> > something is pluggable & offer some value to our users.
>
> >
>
> > The existing autoscaling config was stored in ZK , did users complain
>
> > that they have difficulty in storing that file in ZK?
>
> >
>
> > On Thu, Sep 3, 2020 at 7:20 PM Jan Høydahl 
> wrote:
>
> >>
>
> >> Let’s not do another magic overlay json file on top of xml with all the
> confusion it creates.
>
> >> +1 to stick with solr.xml for current and ongoing feature development
> and not hold up development based on things that may or may not happen in
> the future. Discussing removal of solr.xml is a typical SIP candidate?
>
> >>
>
> >> Jan
>
> >>
>
> >> 3. sep. 2020 kl. 10:06 skrev Ilan Ginzburg :
>
> >>
>
> >> Noble,
>
> >>
>
> >> In order for clusterpropos.json to replace what's currently done with
> solr.xml, we'd need to introduce a mechanism to make configuration
> available out of the box (when Zookeeper is still empty). And if
> clusterpropos.json is to be used in standalone mode, it also must live
> somewhere on disk as well (no Zookeeper in standalone mode).
>
> >>
>
> >> I believe shardHandlerFactoryConfig is in solr.xml for all nodes to
> know which shard handler to use, not for configuring a different one on
> each node.
>
> >>
>
> >> Priming an empty Zookeeper with an initial version of
> clusterpropos.json on startup is easy (first node up pushes its local
> copy). But after this happened once, if Solr is upgraded with a new default
> clusterprops.json, it is hard (to very hard) to update the Zookeeper
> version, high risk of erasing configuration that the user added or does not
> want to change.
>
> >>
>
> >> How about a variant? Keep a local solr.xml file with default configs
> and support overriding of these configs from Zookeeper's clusterprops.json.
> This approach does not have the out of the box issue mentioned above, and
> practically also solves the "updating defaults" issue: if the user cherry
> picked some solr.xml configuration values and overrode them
> clusterprops.json, it is then his responsibility to maintain them there.
> Newly introduced configuration values in solr.xml (due to a new Solr
> version) are not impacted since they were not overridden.
>
> >>
>
> >> I believe this approach is not too far from a suggestion you seem to
> make to hard code default configs to get rid of solr.xml. The difference is
> that the hard coding is done in solr.xml rather than in some
> defaultSolrConfig.java class. This makes changing default configuration
> easy and not requiring recompilation but is otherwise not conceptually
> different.
>
> >>
>
> >> Ilan
>
> >>
>
> >>
>
> >> On Thu, Sep 3, 2020 at 7:05 AM Noble Paul  wrote:
>
> >>>
>
> >>> Let's take a step back and take a look at the history of Solr.
>
> >>>
>
> >>> Long ago there was only standalone Solr with a single core
>
> >>> there were 3 files
>
> >>>
>
> >>> * solr.xml : everything required for CoreContainer went here
>
> >>> * solr.config.xml : per core configurations go here
>
> >>> * schema.xml: this is not relevant for this discussion
>
> >>>
>
> >>> Now we are in the cloud world where everything lives in ZK. This also
>
> >>> means there are potentially 1000's of nodes reading configuratio

Re: Solr configuration options

2020-09-03 Thread Ilan Ginzburg
 pushed to machines) is a
> minus. If and when autoscaling is happy again I'd like to be able to start
> an AMI in AWS pointing at zk (or similar) and have it join automatically,
> and then receive replicas to absorb load (per whatever autoscaling is
> specified), and then be able to issue a single command to a node to sunset
> the node that moves replicas back off of it (again per autoscaling
> preferences, failing if autoscaling constraints would be violated) and then
> asks the node to shut down so that the instance in AWS (or wherever) can be
> shut down safely.  This is a black friday,  new tenants/lost tenants, or
> new feature/EOL feature sort of use case.
> > >>>>>
> > >>>>> Thus IMHO all config for cloud should live somewhere in ZK. File
> system access should not be required to add/remove capacity. If multiple
> node configurations need to be supported we should have nodeTypes directory
> in zk (similar to configsets for collections), possible node specific
> configs there and an env var that can be read to determine the type (with
> some cluster level designation of a default node type). I think that would
> be sufficient to parameterize AMI stuff (or containers) by reading tags
> into env variables
> > >>>>>
> > >>>>> As for knowing what a node loaded, we really should be able to
> emit any config file we've loaded (without reference to disk or zk). They
> aren't that big and in most cases don't change that fast, so caching a
> simple copy as a string in memory (but only if THAT node loaded it) for
> verification would seem smart. Having a file on disk doesn't tell you if
> solr loaded with that version or if it's changed since solr loaded it
> either.
> > >>>>>
> > >>>>> Anyway, that's the pie in my sky...
> > >>>>>
> > >>>>> -Gus
> > >>>>>
> > >>>>> On Fri, Aug 28, 2020 at 11:51 AM Ilan Ginzburg 
> wrote:
> > >>>>>>
> > >>>>>> What I'm really looking for (and currently my understanding is
> that solr.xml is the only option) is a cluster config a Solr dev can set as
> a default when introducing a new feature for example, so that the config is
> picked out of the box in SolrCloud, yet allowing the end user to override
> it if he so wishes.
> > >>>>>>
> > >>>>>> But "cluster config" in this context with a caveat: when doing a
> rolling upgrade, nodes running new code need the new cluster config, nodes
> running old code need the previous cluster config... Having a per node
> solr.xml deployed atomically with the code as currently the case has
> disadvantages, but solves this problem effectively in a very simple way. If
> we were to move to a central cluster config, we'd likely need to introduce
> config versioning or as Noble suggested elsewhere, only write code that's
> backward compatible (w.r.t. config), deploy that code everywhere then once
> no old code is running, update the cluster config. I find this approach
> complicated from both dev and operational perspective with an unclear added
> value.
> > >>>>>>
> > >>>>>> Ilan
> > >>>>>>
> > >>>>>> PS. I've stumbled upon the loading of solr.xml from Zookeeper in
> the past but couldn't find it as I wrote my message so I thought I imagined
> it...
> > >>>>>>
> > >>>>>> It's in SolrDispatchFilter.loadNodeConfig(). It establishes a
> connection to ZK for fetching solr.xml then closes it.
> > >>>>>> It relies on system property waitForZk as the connection timeout
> (in seconds, defaults to 30) and system property zkHost as the Zookeeper
> host.
> > >>>>>>
> > >>>>>> I believe solr.xml can only end up in ZK through the use of
> ZkCLI. Then the user is on his own to manage SolrCloud version upgrades: if
> a new solr.xml is included as part of a new version of SolrCloud, the user
> having pushed a previous version into ZK will not see the update.
> > >>>>>> I wonder if putting solr.xml in ZK is a common practice.
> > >>>>>>
> > >>>>>> On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl <
> jan@cominvent.com> wrote:
> > >>>>>>>
> > >>>>>>> I interpret solr.xml as the node-local configuration for a
> single node.
> > >>>>>>> clusterprops.json is the cluster-wide configuration applying to
> all 

Re: Solr configuration options

2020-08-28 Thread Ilan Ginzburg
As far as the new autoscaling you mention as an example Gus, its
configuration goes in solr.xml as per current work in progress. This means
that if you put this solr.xml in ZK, the autoscaling config will happen as
you describe (not the adding removing nodes part, not there yet, but
reading the config and doing the right thing).

You would have to pass a few params in the image (or to the container) to
connect to ZK.

On Fri, Aug 28, 2020 at 7:55 PM Tomás Fernández Löbbe 
wrote:

> I think if you are using AMIs (or Docker), you could put the node
> configuration inside the AMI (or Docker image), as Ilan said, together with
> the binaries. Say you have a custom top-level handler (Collections, Cores,
> Info, whatever), which takes some arguments and it's configured in solr.xml
> and you are doing an upgrade, you probably want your old nodes (running
> with your old AMI/Docker image with old jars) to keep the old configuration
> and your new nodes to use the new.
>
> On Fri, Aug 28, 2020 at 10:42 AM Gus Heck  wrote:
>
>> Putting solr.xml in zookeeper means you can add a node simply by starting
>> solr pointing to the zookeeper, and ensure a consistent solr.xml for the
>> new node if you've customized it. Since I rarely (never) hit use cases
>> where I need different per node solr.xml. I generally advocate putting it
>> in ZK, I'd say heterogeneous node configs is the special case for advanced
>> use here.  I'm a fan of a (hypothetical future) world where nodes can be
>> added/removed simply without need for local configuration. It would be
>> desirable IMHO to have a smooth node add and remove process and having to
>> install a file into a distribution manually after unpacking it (or having
>> coordinate variations of config to be pushed to machines) is a minus. If
>> and when autoscaling is happy again I'd like to be able to start an AMI in
>> AWS pointing at zk (or similar) and have it join automatically, and then
>> receive replicas to absorb load (per whatever autoscaling is specified),
>> and then be able to issue a single command to a node to sunset the node
>> that moves replicas back off of it (again per autoscaling preferences,
>> failing if autoscaling constraints would be violated) and then asks the
>> node to shut down so that the instance in AWS (or wherever) can be shut
>> down safely.  This is a black friday,  new tenants/lost tenants, or new
>> feature/EOL feature sort of use case.
>>
>> Thus IMHO all config for cloud should live somewhere in ZK. File system
>> access should not be required to add/remove capacity. If multiple node
>> configurations need to be supported we should have nodeTypes directory in
>> zk (similar to configsets for collections), possible node specific configs
>> there and an env var that can be read to determine the type (with some
>> cluster level designation of a default node type). I think that would be
>> sufficient to parameterize AMI stuff (or containers) by reading tags into
>> env variables
>>
>> As for knowing what a node loaded, we really should be able to emit any
>> config file we've loaded (without reference to disk or zk). They aren't
>> that big and in most cases don't change that fast, so caching a simple copy
>> as a string in memory (but only if THAT node loaded it) for verification
>> would seem smart. Having a file on disk doesn't tell you if solr loaded
>> with that version or if it's changed since solr loaded it either.
>>
>> Anyway, that's the pie in my sky...
>>
>> -Gus
>>
>> On Fri, Aug 28, 2020 at 11:51 AM Ilan Ginzburg 
>> wrote:
>>
>>> What I'm really looking for (and currently my understanding is that
>>> solr.xml is the only option) is *a cluster config a Solr dev can set as
>>> a default* when introducing a new feature for example, so that the
>>> config is picked out of the box in SolrCloud, yet allowing the end user to
>>> override it if he so wishes.
>>>
>>> But "cluster config" in this context *with a caveat*: when doing a
>>> rolling upgrade, nodes running new code need the new cluster config, nodes
>>> running old code need the previous cluster config... Having a per node
>>> solr.xml deployed atomically with the code as currently the case has
>>> disadvantages, but solves this problem effectively in a very simple way. If
>>> we were to move to a central cluster config, we'd likely need to introduce
>>> config versioning or as Noble suggested elsewhere, only write code that's
>>> backward compatible (w.r.t. config), deploy that code everywhere then once
>>> no old code is running, update the cluster confi

Re: Solr configuration options

2020-08-28 Thread Ilan Ginzburg
What I'm really looking for (and currently my understanding is that solr.xml
is the only option) is *a cluster config a Solr dev can set as a default* when
introducing a new feature for example, so that the config is picked out of
the box in SolrCloud, yet allowing the end user to override it if he so
wishes.

But "cluster config" in this context *with a caveat*: when doing a rolling
upgrade, nodes running new code need the new cluster config, nodes running
old code need the previous cluster config... Having a per node
solr.xml deployed
atomically with the code as currently the case has disadvantages, but
solves this problem effectively in a very simple way. If we were to move to
a central cluster config, we'd likely need to introduce config versioning
or as Noble suggested elsewhere, only write code that's backward compatible
(w.r.t. config), deploy that code everywhere then once no old code is
running, update the cluster config. I find this approach complicated from
both dev and operational perspective with an unclear added value.

Ilan

PS. I've stumbled upon the loading of solr.xml from Zookeeper in the past
but couldn't find it as I wrote my message so I thought I imagined it...

It's in SolrDispatchFilter.loadNodeConfig(). It establishes a connection to
ZK for fetching solr.xml then closes it.
It relies on system property waitForZk as the connection timeout (in
seconds, defaults to 30) and system property zkHost as the Zookeeper host.

I believe solr.xml can only end up in ZK through the use of ZkCLI. Then the
user is on his own to manage SolrCloud version upgrades: if a new solr.xml
is included as part of a new version of SolrCloud, the user having pushed a
previous version into ZK will not see the update.
I wonder if putting solr.xml in ZK is a common practice.

On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl  wrote:

> I interpret solr.xml as the node-local configuration for a single node.
> clusterprops.json is the cluster-wide configuration applying to all nodes.
> solrconfig.xml is of course per core etc
>
> solr.in.sh is the per-node ENV-VAR way of configuring a node, and many of
> those are picked up in solr.xml (other in bin/solr).
>
> I think it is important to keep a file-local config file which can only be
> modified if you have shell access to that local node, it provides an extra
> layer of security.
> And in certain cases a node may need a different configuration from
> another node, i.e. during an upgrade.
>
> I put solr.xml in zookeeper. It may have been a mistake, since it may not
> make all that much sense to load solr.xml which is a node-level file, from
> ZK. But if it uses var substitutions for all node-level stuff, it will
> still work since those vars are pulled from local properties when parsed
> anyway.
>
> I’m also somewhat against hijacking clusterprops.json as a general purpose
> JSON config file for the cluster. It was supposed to be for simple
> properties.
>
> Jan
>
> > 28. aug. 2020 kl. 14:23 skrev Erick Erickson :
> >
> > Solr.xml can also exist on Zookeeper, it doesn’t _have_ to exist
> locally. You do have to restart to have any changes take effect.
> >
> > Long ago in a Solr far away solr.xml was where all the cores were
> defined. This was before “core discovery” was put in. Since solr.xml had to
> be there anyway and was read at startup, other global information was added
> and it’s lived on...
> >
> > Then clusterprops.json came along as a place to put, well, cluster-wide
> properties so having solr.xml too seems awkward. Although if you do have
> solr.xml locally to each node, you could theoretically have different
> settings for different Solr instances. Frankly I consider this more of a
> bug than a feature.
> >
> > I know there have been some talk about removing solr.xml entirely, but
> I’m not sure what the thinking is about what to do instead. Whatever we do
> needs to accommodate standalone. We could do the same trick we do now, and
> essentially move all the current options in solr.xml to clusterprops.json
> (or other ZK node) and read it locally for stand-alone. The API could even
> be used to change it if it was stored locally.
> >
> > That still leaves the chicken-and-egg problem if connecting to ZK in the
> first place.
> >
> >> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg  wrote:
> >>
> >> I want to ramp-up/discuss/inventory configuration options in Solr.
> Here's my understanding of what exists and what could/should be used
> depending on the need. Please correct/complete as needed (or point to
> documentation I might have missed).
> >>
> >> There are currently 3 sources of general configuration I'm aware of:
> >>  • Collection specific config bootstrapped by file solrconfig.xml
> and

Solr configuration options

2020-08-28 Thread Ilan Ginzburg
I want to ramp-up/discuss/inventory configuration options in Solr. Here's
my understanding of what exists and what could/should be used depending on
the need. Please correct/complete as needed (or point to documentation I
might have missed).


*There are currently 3 sources of general configuration I'm aware of:*

   - Collection specific config bootstrapped by file *solrconfig.xml* and
   copied into the initial (_default) then subsequent Config Sets in
   Zookeeper.
   - Cluster wide config in Zookeeper */clusterprops.json* editable
   globally through Zookeeper interaction using an API. Not bootstrapped by
   anything (i.e. does not exist until the user explicitly creates it)
   - Node config file *solr.xml* deployed with Solr on each node and loaded
   when Solr starts. Changes to this file are per node and require node
   restart to be taken into account.

The Collection specific config (file solrconfig.xml then in Zookeeper
/configs/**/solrconfig.xml) allows Solr devs to set
reasonable defaults (the file is part of the Solr distribution). Content
can be changed by users as they create new Config Sets persisted in
Zookeeper.

Zookeeper's /clusterprops.json can be edited through the collection admin
API CLUSTERPROP. If users do not set anything there, the file doesn't even
exist in Zookeeper therefore `Solr devs cannot use it to set a default
cluster config, there's no clusterprops.json file in the Solr distrib like
there's a solrconfig.xml.

File solr.xml is used by Solr devs to set some reasonable default
configuration (parametrized through property files or system properties).
There's no API to change that file, users would have to edit/redeploy the
file on each node and restart the Solr JVM on that node for the new config
to be taken into account.



*Based on the above, my vision (or mental model) of what to use depending
on the need:*solrconfig.xml is the only per collection config. IMO it does
its job correctly: Solr devs can set defaults, users tailor the content to
what they need for new config sets. It's the only option for per collection
config anyway.

The real hesitation could be between solr.xml and Zookeeper
/clusterprops.json. What should go where?

For user configs (anything the user does to the Solr cluster AFTER it was
deployed and started), /clusterprops.json seems to be the obvious choice
and offers the right abstractions (global config, no need to worry about
individual nodes, all nodes pick up configs and changes to configs
dynamically).

For configs that need to be available without requiring user intervention
or needed before the connection to ZK is established, *there's currently no
other choice than using solr.xml*. Such configuration obviously include
parameters that are needed to connect to ZK (timeouts, credential provider
and hopefully one day an option to either use direct ZK interaction code or
Curator code), but also configuration of general features that should be
the default without requiring users to opt in yet allowing then to easily
opt out by editing solr.xml before deploying to their cluster (in the
future, this could include which Lucene version to load in Solr for
example).

To summarize:

   - Collection specific config? --> solrconfig.xml
   - User provided cluster config once SolrCloud is running? --> ZK
   /clusterprops.json
   - Solr dev provided cluster config? --> solr.xml


Going forward, some (but only some!) of the config that currently can only
live in solr.xml could be made to go to /clusterprops.json or another ZK
based config file. This would require adding code to create that ZK file
upon initial cluster start (to not force the user to push it) and devise a
mechanism (likely a script, could be tricky though) to update that file in
ZK when a new release of Solr is deployed and a previous version of that
file already exists. Not impossible tasks, but not trivial ones either.
Whatever the needs of such an approach are, it might be easier to keep the
existing solr.xml as a file and allow users to define overrides in
Zookeeper for the configuration parameters from solr.xml that make sense to
be overridden in ZK (obviously ZK credentials or connection timeout do not
make sense in that context, but defining the shard handler implementation
class does since it is likely loaded after a node managed to connect to ZK).

Some config will have to stay in a local Node file system file and only
there no matter what: Zookeeper timeout definition or any node
configuration that is needed before the node connects to Zookeeper.


Re: RoadMap?

2020-08-11 Thread Ilan Ginzburg
Maybe also add “in progress”? So items do not disappear suddenly from the
page when work really starts on them?

On Tue 11 Aug 2020 at 17:15, Gus Heck  wrote:

> Cool, since I brought it up, I can volunteer to help manage the page. We
> should get jira issue links in there wherever possible. Do we want to build
> an initial list and have some sort of Proposed/Planned workflow so readers
> can have confidence (or appropriate lack of confidence) in what they see
> there? voting on things seems like too much but maybe folks who care watch
> the page, and if something is on there for a week without objection it can
> be called accepted? If a discussion starts here it can be marked
> "Considering" so... something like this:
>
> 4 states: Proposed, Considering, Planned, Rejected
>
> Workflow like this:
> Proposed ---(no objection 1 wk) --> Planned
> Proposed ---(discussion)--> Considering
> Considering (agreement) --> Planned
> Considering (deferred) ---> Proposed (later release)
> Considering (unsuitable) -> Rejected
> Considering (promoted) ---> Proposed (earlier release)
> Planned (difficulty found) ---> Considering
>
> Anything in "Considering" should have an active dev list thread, and if it
> didn't happen on the list it didn't happen :). Any of that (or differences
> of opinion during Considering) can be overridden by a formal vote of course
>
> -Gus
>
>
>
>
> On Tue, Aug 11, 2020 at 10:29 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> I've created a placeholder document here:
>> https://cwiki.apache.org/confluence/display/SOLR/Roadmap
>> Let us put in all our items there.
>>
>> On Tue, Aug 11, 2020 at 4:45 PM Jan Høydahl 
>> wrote:
>>
>>> Let’s revive this email thread about Roadmap.
>>>
>>>
>>>
>>>
>>>
>>> With so many large initiatives going on, and the TLP split also, I think
>>> it makes perfect sense with a Roadmap.
>>>
>>>
>>> I know we’re not used to that kind of thing - we tend to just let things
>>> play out as it happens to land in various releases, but this time is
>>> special, and I think we’d benefit from more coordination. I don’t know how
>>> to enforce such coordination though, other than appealing to all committers
>>> to endorse the roadmap and respect it when they merge things. We may not be
>>> able to set a release date for 9.0 right now, but we may be able to define
>>> preconditions and scope certain features to 9.0 or 9.1 rather than 8.7 or
>>> 8.8 - that kind of coarse-grained decisions. We also may need a person that
>>> «owns» the Roadmap confluence page and actively promotes it, tries to keep
>>> it up to date and reminds the rest of us about its existence. A roadmap
>>> must NOT be a brake slowing us down, but a tool helping us avoid silly
>>> mistakes.
>>>
>>>
>>>
>>>
>>>
>>> Jan
>>>
>>>
>>>
>>>
>>>
>>> > 5. jul. 2020 kl. 02:39 skrev Noble Paul :
>>>
>>>
>>> >
>>>
>>>
>>> > I think the logical thing to do today is completely rip out all
>>>
>>>
>>> > autoscaling code as it exists today.
>>>
>>>
>>> > Let's deprecate that in 8.7 and build something for "assign-strategy".
>>>
>>>
>>> > Austoscaling , if required, should not be a part of Solr
>>>
>>>
>>> >
>>>
>>>
>>> >
>>>
>>>
>>> >
>>>
>>>
>>> > On Fri, Jul 3, 2020 at 5:48 PM Jan Høydahl 
>>> wrote:
>>>
>>>
>>> >>
>>>
>>>
>>> >> +1
>>>
>>>
>>> >>
>>>
>>>
>>> >> Why don’t we make a Roadmap wiki page as Cassandra suggests, and
>>> indicate what major things needs to happen when.
>>>
>>>
>>> >> Perhaps if we can get the Solr TLP and git-split ball rolling as a
>>> pre-9.0 task, then perhaps 8.8 could be the last joint release (6.6, 7.7,
>>> 8.8 hehe)?
>>>
>>>
>>> >> That would enable Lucene to ship 9.0 without waiting for a ton of
>>> alpha-quality Solr features, and Solr could have its own Roadmap wiki.
>>>
>>>
>>> >>
>>>
>>>
>>> >> Jan
>>>
>>>
>>> >>
>>>
>>>
>>> >> 3. jul. 2020 kl. 09:19 skrev Dawid Weiss :
>>>
>>>
>>> >>
>>>
>>>
>>> >>
>>>
>>>
>>> >>> I totally expect some things to bubble up when we try to release
>>> with Gradle, the tarball being one. I don’t think that’s a very big issue,
>>> but if you have lots of “not very big” issues they do add up.
>>>
>>>
>>> >>
>>>
>>>
>>> >>
>>>
>>>
>>> >> Adding a tarball is literally 3-5 lines of code (you add a task that
>>> builds a tarball or a zip file from the outputs of solr/packaging toDir
>>> task)... The bigger issue with gradle is that somebody has to step up and
>>> try to identify any other issues and/or missing bits when trying to do a
>>> full release cycle.
>>>
>>>
>>> >>
>>>
>>>
>>> >> D.
>>>
>>>
>>> >>
>>>
>>>
>>> >>
>>>
>>>
>>> >
>>>
>>>
>>> >
>>>
>>>
>>> > --
>>>
>>>
>>> > -
>>>
>>>
>>> > Noble Paul
>>>
>>>
>>> >
>>>
>>>
>>> > -
>>>
>>>
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>
>>>
>>> > For additional commands, e-mail: 

Re: Naming of non-SolrCloud clusters in the Ref Guide

2020-08-06 Thread Ilan Ginzburg
Both "legacy" and "SolrCloud" clusters are search server clusters. Seen
from far enough, they look the same.

In "legacy" the management code is elsewhere (developed by the client
operating the cluster, running on other machines using a diferent logic and
potentially another DB than Zookeeper) whereas in "SolrCloud" the
management code is embedded in the search server(s) code and it happens
that (currently) this code relies on Zookeeper.

I see SolrCloud as a "managed cluster" vs. legacy that would be "Self
managed" by the client, or "U manage" (non managed when looking at it from
the Solr codebase perspective).

Same idea as coordinated vs uncoordinated basically. I don't know why but I
prefer "managed".

Ilan

On Thu, Aug 6, 2020 at 5:49 PM Cassandra Targett 
wrote:

> On Aug 6, 2020, 10:22 AM -0500, Gus Heck , wrote:
>
> WRT the name "uncoordinated mode" I fear it could be read (or even become
> known as) as "clumsy mode" which is humorous but possibly not what we're
> going for :)
>
>
> I had also considered “non-coordinated”, and prefer it but couldn’t
> articulate why. The association of “uncoordinated" with clumsiness might be
> what was bugging me.
>
>  I'd perhaps suggest Cluster mode for SolrCloud though I'm not entirely
> sure if Legacy Solr (in curren parlance) is not a "cluster" too, cluster
> being a somewhat vague term. However Clustered Mode and Legacy Mode seem
> more on target. I think "Legacy" could be changed since we're not really
> planning on abandoning it (are we?), but
>
>
> One can have a cluster and not run SolrCloud. I think from an operations
> perspective, several servers all running Solr is considered a cluster, no
> matter what tools are being used to get them to talk to each other.
>
> I think “Legacy” (also used today already in some contexts) is problematic
> because there aren’t plans to abandon it. Also “Legacy replication” is
> pretty close to exactly what PULL replicas use to poll leaders and pull new
> index segments when needed. IOW, it’s not “legacy”, it’s very actively
> being used in a growing number of clusters. That might be an implementation
> detail users aren’t aware of, but I feel the term is really lacking mostly
> in that it just doesn’t say anything besides “it’s older”.
>
> the adjective there SHOULD communicate reduced functionality because there
> are plenty of features that are cloud (cluster) only.
>
>
> In my view, the reduced functionality of non-SolrCloud clusters is mostly
> around coordination of requests, leader election, configs, and other
> similar automated activities one does manually otherwise. So, I feel that
> sort of proves my point - a word that conveys lack of coordination is a
> good option for what it’s called. If there is a better antonym for
> “coordinated”, I’m all for considering it but haven’t yet been able to
> think of/find one.
>
> I think it’s important to think about what differentiates the two ways of
> managing a Solr cluster and derive the naming from that. What features of
> SolrCloud don’t exist in the non-SolrCloud approach? What words help us
> generalize those gaps and can any of them be an appropriate name?
>
>
> -Gus
>
>
> On Thu, Aug 6, 2020 at 10:27 AM Cassandra Targett 
> wrote:
>
> The work in SOLR-14702 has left us with some awkward phrasing (which is
> still better than what it was) around non-SolrCloud clusters that I've
> offered to help fix.
>
>
> I think we've struggled for years to find a good name for non-SolrCloud
> clusters and we've used a number of variations: "legacy replication" (which
> it isn't, since PULL replicas use the same thing), "Standalone mode" (which
> it isn't because it's a cluster), now "leader/follower mode" (which could
> be confusing because SolrCloud has leaders).
>
>
> Yesterday I thought about what really differentiates a SolrCloud cluster
> and a non-SolrCloud cluster and it occurred to me that a key difference is
> the former is coordinated by ZooKeeper, while the latter is not. That led
> me to think that perhaps "coordinated mode" can someday be a better
> replacement for the term "SolrCloud", while "uncoordinated mode" could be a
> replacement today for all these other non-SolrCloud mode variations.
>
>
> I've opened https://issues.apache.org/jira/browse/SOLR-14716 and will
> create a branch for work in progress, but before I forge too far ahead, I
> want to draw attention to it first to give a chance for discussion so we're
> in agreement.
>
>
> Thanks,
>
> Cassandra
>
>
>
> --
>
> http://www.needhamsoftware.com (work)
>
> http://www.the111shift.com (play)
>
>


Re: Approach for a new Autoscaling framework

2020-07-26 Thread Ilan Ginzburg
Varun, you're correct.
This PR was built based on what's needed for creation (easiest starting
point for me and likely most urgent need). It's still totally WIP and
following steps include building the API required for move and other
placement based needs, then also everything related to triggers (see the
Jira).

Collection API commands (Solr provided implementation, not a plug-in) will
build the requests they need, then call the plug-in (custom one or a defaut
one), and use the returned "work items" (more types of work items will be
introduced of course) to do the job (know where to place or where to move
or what to remove or add etc.)

Ilan

Le dim. 26 juil. 2020 à 04:13, Varun Thacker  a écrit :

> Hi Ilan,
>
> I like where we're going with
> https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am
> wrong, but my understanding of this PR is we're defining the interfaces for
> creating policies
>
> What's not clear to me is how will existing collection APIs like
> create-collections/add-replica etc make use of it? Is that something that
> has been discussed somewhere that I could read up on?
>
>
>
> On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg  wrote:
>
>> Thanks Gus!
>> This makes a lot of sense but significantly increases IMO the scope and
>> effort to define an "Autoscaling" framework interface.
>>
>> I'd be happy to try to see what concepts could be shared and how a
>> generic plugin facade could be defined.
>>
>> What are the other types of plugins that would share such a unified
>> approach? Do they already exist under another form or are just projects at
>> this stage, like Autoscaling plugins?
>>
>> But... Assuming this is the first "facade" layer to be defined between
>> Solr and external code, it might be hard to make it generic and get it
>> right. There's value in starting simple, understanding the tradeoffs and
>> generalizing later.
>>
>> Also I'd like to make sure we're not paying a performance "genericity
>> tax" in Autoscaling for unneeded features.
>>
>> Ilan
>>
>> Le sam. 25 juil. 2020 à 16:02, Gus Heck  a écrit :
>>
>>> Scanned through the PR and read some of this thread. I likely have
>>> missed much other discussion, so forgive me if I'm dredging up somethings
>>> that are already discussed elsewhere.
>>>
>>> The idea of designing the interfaces defining what information is
>>> available seems good here, but I worry that it's too auto-scaling focused.
>>> In my imagination, I would see solr having a standard informational
>>> interface that is useful to any plugin of any sort. Autoscaling should be
>>> leveraging that and we should be enhancing that to enable autoscaling. The
>>> current state of  the system is one key type of information, but another
>>> type of information that should exist within solr and be exposed to plugins
>>> (including autoscaling) is events. When a new node joins there should be an
>>> event for example so that plugins can listen for that rather than
>>> incessantly polling and comparing the list of 100 nodes to a cached list of
>>> 100 nodes.
>>>
>>> In the PR I see a bunch of classes all off in a separate package, which
>>> looks like an autoscaling fiefdom which will be tempted if not forced to
>>> duplicate lots of stuff relative to other plugins and/or core.
>>>
>>> As a side note I would think the metrics system could be a plugin that
>>> leverages the same set of informational interfaces
>>>
>>> So there should be 3 parts to this as I imagine it.
>>>
>>> 1) Enhancements to the **plugin system** that make information about the
>>> cluster available solr to ALL plugins
>>> 2) Enhancements to the **plugin system** API's provided to ALL plugins
>>> that allow them to mutate solr safely.
>>> 3) A plugin that we intend to support for our users currently using auto
>>> scaling utilizes the enhanced information to provide a similar level of
>>> functionality as is *promised* by our current documentation of autoscaling,
>>> there might be some gaps or differences but we should be discussing what
>>> they are and providing recommended workarounds for users that relied on
>>> those promises to the users. Even if there were cases where we failed to
>>> deliver, if there were at least some conditions under which we could
>>> deliver the promised functionality those should be supported. Only if we
>>> never were able to deliver and it never worked under any circumstance
>>> 

Re: Approach for a new Autoscaling framework

2020-07-25 Thread Ilan Ginzburg
fort should
> grow it enough to move autoscaling onto it without dropping (much)
> functionality that we've previously published.
>
> -Gus
>
> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl  wrote:
>
>> Not clear to me what type of "alternative proposal" you're thinking of Jan
>>
>>
>> That would be the responsibility of Noble and others who have concerns to
>> detail - and try convince other peers.
>> It’s hard for me as a spectator to know whether to agree with Noble
>> without a clear picture of what the alternative API or approach would look
>> like.
>> I’m often a fan of loosely typed APIs since they tend to cause less
>> boilerplate code, but strong typing may indeed be a sound choice in this
>> API.
>>
>> Jan Høydahl
>>
>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg :
>>
>> 
>> In my opinion we have to (and therefore will) ship at least a basic prod
>> ready implementation on top of the API that does simple things (not sure
>> about rack, but for example balance cores and disk size without co locating
>> replicas of same shard on same node).
>> Without such an implementation, I suspect adoption will be low. Moreover,
>> it's always a lot more friendly to start coding from a working example than
>> from scratch.
>>
>> Not clear to me what type of "alternative proposal" you're thinking of
>> Jan. Alternative API proposal? Alternative approach to replace Autoscaling?
>>
>> Ilan
>>
>> Ilan
>>
>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl 
>> wrote:
>>
>>> Important discussion indeed.
>>>
>>> I don’t have time to dive deep into the PR or make up my mind whether
>>> there is a simpler and more future proof way of designing these APIs. But I
>>> understand that autoscaling is a complex beast and it is important we get
>>> it right.
>>>
>>> One question regarding having to write code vs config. Is the plan to
>>> ship some very simple light weight default placement rules ootb that gives
>>> 80% of users what they need with simple config, or would every user need to
>>> write code to e.g. spread replicas across hosts/racks? I’d be interested in
>>> seeing an alternative proposal laid out, perhaps not in code but with a
>>> design that can be compared and discussed.
>>>
>>> Jan Høydahl
>>>
>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman :
>>>
>>> 
>>> I think this is a valid thing to discuss on the dev list, since this
>>> isn't just about code comments.
>>> It seems to me that Ilan wants to discuss the philosophy around how to
>>> design plugins and the interfaces in Solr which the plugins will talk to.
>>> This is broad and affects much more than just the Autoscaling framework.
>>>
>>> As a community & product, we have so far agreed that Solr should be
>>> lighter weight and additional features should live in plugins that are
>>> managed separately from Solr itself.
>>> At that point we need to think about the lifetime and support of these
>>> plugins. People love to refactor stuff in the solr core, which before
>>> plugins wasn't a large issue.
>>> However if we are now intending for many customers to rely on plugins,
>>> then we need to come up with standards and guarantees so that these plugins
>>> don't:
>>>
>>>- Stall people from upgrading Solr (minor or major versions)
>>>- Hinder the development of Solr Core
>>>- Cause us more headaches trying to keep multiple repos of plugins
>>>up to date with recent versions of Solr
>>>
>>>
>>> I am not completely sure where I stand right now, but this is definitely
>>> something that we should be thinking about when migrating all of this
>>> functionality to plugins.
>>>
>>> - Houston
>>>
>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya 
>>> wrote:
>>>
>>>> I think we should move the discussion back to the PR because it has
>>>> more context and inline comments are possible. Having this discussion in 4
>>>> places (jira, pr, slack and dev list is very hard to keep track of).
>>>>
>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, 
>>>> wrote:
>>>>
>>>>> [I’m moving a discussion from the PR
>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>>>> <https://issues.apache.org/jira/bro

Re: Welcome Mike Drob to the PMC

2020-07-24 Thread Ilan Ginzburg
Congratulations Mike, happy to hear that!

Ilan

On Fri, Jul 24, 2020 at 9:56 PM Anshum Gupta  wrote:

> I am pleased to announce that Mike Drob has accepted the PMC's invitation
> to join.
>
> Congratulations and welcome, Mike!
>
> --
> Anshum Gupta
>


Re: Approach for a new Autoscaling framework

2020-07-23 Thread Ilan Ginzburg
In my opinion we have to (and therefore will) ship at least a basic prod
ready implementation on top of the API that does simple things (not sure
about rack, but for example balance cores and disk size without co locating
replicas of same shard on same node).
Without such an implementation, I suspect adoption will be low. Moreover,
it's always a lot more friendly to start coding from a working example than
from scratch.

Not clear to me what type of "alternative proposal" you're thinking of Jan.
Alternative API proposal? Alternative approach to replace Autoscaling?

Ilan

Ilan

On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl  wrote:

> Important discussion indeed.
>
> I don’t have time to dive deep into the PR or make up my mind whether
> there is a simpler and more future proof way of designing these APIs. But I
> understand that autoscaling is a complex beast and it is important we get
> it right.
>
> One question regarding having to write code vs config. Is the plan to ship
> some very simple light weight default placement rules ootb that gives 80%
> of users what they need with simple config, or would every user need to
> write code to e.g. spread replicas across hosts/racks? I’d be interested in
> seeing an alternative proposal laid out, perhaps not in code but with a
> design that can be compared and discussed.
>
> Jan Høydahl
>
> 23. jul. 2020 kl. 17:53 skrev Houston Putman :
>
> 
> I think this is a valid thing to discuss on the dev list, since this isn't
> just about code comments.
> It seems to me that Ilan wants to discuss the philosophy around how to
> design plugins and the interfaces in Solr which the plugins will talk to.
> This is broad and affects much more than just the Autoscaling framework.
>
> As a community & product, we have so far agreed that Solr should be
> lighter weight and additional features should live in plugins that are
> managed separately from Solr itself.
> At that point we need to think about the lifetime and support of these
> plugins. People love to refactor stuff in the solr core, which before
> plugins wasn't a large issue.
> However if we are now intending for many customers to rely on plugins,
> then we need to come up with standards and guarantees so that these plugins
> don't:
>
>- Stall people from upgrading Solr (minor or major versions)
>- Hinder the development of Solr Core
>- Cause us more headaches trying to keep multiple repos of plugins up
>to date with recent versions of Solr
>
>
> I am not completely sure where I stand right now, but this is definitely
> something that we should be thinking about when migrating all of this
> functionality to plugins.
>
> - Houston
>
> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya 
> wrote:
>
>> I think we should move the discussion back to the PR because it has more
>> context and inline comments are possible. Having this discussion in 4
>> places (jira, pr, slack and dev list is very hard to keep track of).
>>
>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg,  wrote:
>>
>>> [I’m moving a discussion from the PR
>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613
>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list for
>>> a wider audience. This is about replacing the now (in master) gone
>>> Autoscaling framework with a way for clients to write their customized
>>> placement code]
>>>
>>> It took me a long time to write this mail and it's quite long, sorry.
>>> Please anybody interested in the future of Autoscaling (not only those I
>>> cc'ed) do read it and provide feedback. Very impacting decisions have to be
>>> made now.
>>>
>>> Thanks Noble for your feedback.
>>> I believe it is important that we are aligned on what we build here,
>>> esp. at the early defining stages (now).
>>>
>>> Let me try to elaborate on your concerns and provide in general the
>>> rationale behind the approach.
>>>
>>> *> Anyone who wishes to implement this should not require to learn a lot
>>> before even getting started*
>>> For somebody who knows Solr (what is a Node, Collection, Shard, Replica)
>>> and basic notions related to Autoscaling (getting variables representing
>>> current state to make decisions), there’s not much to learn. The framework
>>> uses the same concepts, often with the same names.
>>>
>>> *> I don't believe we should have a set of interfaces that duplicate
>>> existing classes just for this functionality.*
>>> Where appropriate we can have existing classes be the implementations
>>&

Approach for a new Autoscaling framework

2020-07-23 Thread Ilan Ginzburg
[I’m moving a discussion from the PR
 for SOLR-14613
 to the dev list for a
wider audience. This is about replacing the now (in master) gone
Autoscaling framework with a way for clients to write their customized
placement code]

It took me a long time to write this mail and it's quite long, sorry.
Please anybody interested in the future of Autoscaling (not only those I
cc'ed) do read it and provide feedback. Very impacting decisions have to be
made now.

Thanks Noble for your feedback.
I believe it is important that we are aligned on what we build here, esp.
at the early defining stages (now).

Let me try to elaborate on your concerns and provide in general the
rationale behind the approach.

*> Anyone who wishes to implement this should not require to learn a lot
before even getting started*
For somebody who knows Solr (what is a Node, Collection, Shard, Replica)
and basic notions related to Autoscaling (getting variables representing
current state to make decisions), there’s not much to learn. The framework
uses the same concepts, often with the same names.

*> I don't believe we should have a set of interfaces that duplicate
existing classes just for this functionality.*
Where appropriate we can have existing classes be the implementations for
these interfaces and be passed to the plugins, that would be perfectly ok.
The proposal doesn’t include implementations at this stage, therefore
there’s no duplication, or not yet... (we must get the interfaces right and
agreed upon before implementation). If some interface methods in the
proposal have a different name from equivalent methods in internal classes
we plan to use, of course let's rename one or the other.

Existing internal abstractions are most of the time concrete classes and
not interfaces (Replica, Slice, DocCollection, ClusterState). Making these
visible to contrib code living elsewhere is making future refactoring hard
and contrib code will most likely end up reaching to methods it shouldn’t
be using. If we define a clean set of interfaces for plugins, I wouldn’t
hesitate to break external plugins that reach out to other internal Solr
classes, but will make everything possible to keep the API backward
compatible so existing plugins can be recompiled without change.

*> 24 interfaces to do this is definitely over engineering*
I don’t consider the number of classes or interfaces a metric of complexity
or of engineering quality. There are sample

plugin implementations to serve as a base for plugin writers (and for us
defining this framework) and I believe the process is relatively simple.
Trying to do the same things with existing Solr classes might prove a lot
harder (but might be worth the effort for comparison purposes to make sure
we agree on the approach? For example, getting sister replicas of a given
replica in the proposed API is: replica.getShard()

.getReplicas()
.
Doing so with the internal classes likely involves getting the DocCollection
and Slice name from the Replica, then get the DocCollection from the
cluster state, there get the Slice based on its name and finally
getReplicas() from the Slice). I consider the role of this new framework is
to make life as easy as possible for writing placement code and the like,
make life easy for us to maintain it, make it easy to write a simulation
engine (should be at least an order of magnitude simpler than the previous
one), etc.

An example regarding readability and number of interfaces: rather than
defining an enum with runtime annotation for building its instances (
Variable.Type
)
and then very generic access methods, the proposal defines a specific
interface for each “variable type” (called properties
).
Rather than concatenating strings to specify the data to return from a
remote node (based on snitches
,
see doc
),
the proposal is explicit and strongly typed (here

example
to get a specific system property from a node). This definitely does
increase the number of interfaces, but reduces IMO the effort to code to
these abstractions and provides a lot more compile 

Re: 8.6.1 Release

2020-07-22 Thread Ilan Ginzburg
Shouldn't we add a note right away to 8.6 notifying of the issue?

Le mer. 22 juil. 2020 à 20:08, Atri Sharma  a écrit :

> +1, thanks Houston.
>
> On Wed, Jul 22, 2020 at 10:51 PM Houston Putman 
> wrote:
> >
> > If we agree that this warrants a patch release, I volunteer to do the
> release.
> >
> > I do think a patch release is reasonable even if users have to take an
> action when upgrading from 8.6.0. I imagine most users haven't upgraded to
> 8.6.0 yet, so if we make the patch now we will make life easier for
> everyone that upgrades between now and when 8.7 is released.
> >
> > On Wed, Jul 22, 2020 at 12:50 PM Atri Sharma  wrote:
> >>
> >> Ignore this, I misread your email.
> >>
> >> On Wed, Jul 22, 2020 at 9:11 PM Atri Sharma  wrote:
> >> >
> >> > Should we not revert the change so that users upgrading from 8.6 to
> >> > 8.6.1 get the earlier default policy?
> >> >
> >> > On Wed, Jul 22, 2020 at 9:09 PM Houston Putman <
> houstonput...@gmail.com> wrote:
> >> > >
> >> > > +1
> >> > >
> >> > > Question about the change. Since this patch added a default
> autoscaling policy, if users upgrade to 8.6 and then 8.6.1, does the
> default autoscaling policy stay once they have upgraded? If so we probably
> want to include instructions in the release notes on how to fix this issue
> once upgrading.
> >> > >
> >> > > - Houston
> >> > >
> >> > > On Wed, Jul 22, 2020 at 1:53 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
> >> > >>
> >> > >> Hi,
> >> > >> There was a performance regression identified in 8.6.0 release due
> to SOLR-12845. I think it is serious enough to warrant an immediate bug fix
> release.
> >> > >>
> >> > >> I propose a 8.6.1 release. Unfortunately, I'll be unable to
> volunteer for this release owning to some other commitments, however
> Andrzej mentioned in Slack that he might be able to volunteer for this post
> 27th.
> >> > >>
> >> > >> Are there any thoughts/concerns regarding this?
> >> > >> Regards,
> >> > >> Ishan
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > Atri
> >> > Apache Concerted
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Atri
> >> Apache Concerted
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
>
> --
> Regards,
>
> Atri
> Apache Concerted
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: 8.6.1 Release

2020-07-22 Thread Ilan Ginzburg
I didn't look at the issue, but if it is due to a default inefficient
policy, instead of a new release (that as Houston points out will not even
solve the issue), can't we communicate a workaround, namely a way to reset
the default policy to some other value after 8.6 deploy that would make the
problem disappear?

But maybe the issue is more than config?

Ilan

On Wed, Jul 22, 2020 at 5:46 PM Houston Putman 
wrote:

> +1
>
> Question about the change. Since this patch added a default autoscaling
> policy, if users upgrade to 8.6 and then 8.6.1, does the default
> autoscaling policy stay once they have upgraded? If so we probably want to
> include instructions in the release notes on how to fix this issue once
> upgrading.
>
> - Houston
>
> On Wed, Jul 22, 2020 at 1:53 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> Hi,
>> There was a performance regression identified in 8.6.0 release due to
>> SOLR-12845. I think it is serious enough to warrant an immediate bug fix
>> release.
>>
>> I propose a 8.6.1 release. Unfortunately, I'll be unable to volunteer for
>> this release owning to some other commitments, however Andrzej mentioned in
>> Slack that he might be able to volunteer for this post 27th.
>>
>> Are there any thoughts/concerns regarding this?
>> Regards,
>> Ishan
>>
>


Re: SolrCloud upgrade process

2020-06-30 Thread Ilan Ginzburg
If there could be a way to force the new version to continue writing in the
previous format for a while, that would allow switching to writing the new
format once all nodes have been upgraded (or more likely when the cluster
admin decides so).

Ilan

Le mar. 30 juin 2020 à 21:34, David Smiley  a écrit :

> I was considering the process of a Solr upgrade to a SolrCloud cluster
> within a minor version (e.g. 8.3 -> 8.4).
>
> A concern I have is the implication of new Lucene index formats.  Lucene
> 8.4 bumped the Codec version because of postings being written differently
> to be more SIMD friendly --
> https://issues.apache.org/jira/browse/LUCENE-9027
> Lucene 8.4 will read an index created with Lucene 8.3 -- great; but Lucene
> 8.3 obviously can't read an index created with Lucene 8.4.  I'm not picking
> on this specific JIRA/change; it could be many others.  There's another
> coming in 8.6.
>
> We've got some documentation on the upgrade process:
>
> https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/upgrading-a-solr-cluster.adoc
> The instructions describe a rolling upgrading of each node one at a time.
> Makes sense.  However, it's possible for a shard on an already upgraded
> node to become leader, have some documents written to it, and then a
> replica on a non-upgraded node might end up replicating segments from the
> leader.  This is possible with all replica types, though I think more
> likely with TLOG & PULL.  I am not sure if there are any protections for
> this (e.g. in replication handler / index fetcher); there should be.  I
> think that SolrCloud should prevent a replica from becoming a leader if
> there exists another replica (for the same shard) that has a lower Solr
> version.
>
> I can think of two work-arounds:
> (A) shut down the whole cluster to do the upgrade (forced down time)
> (B) initiate read-only status for all collections
> https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/collection-management.adoc#L258
>  and
> also be careful not to create new collections during this time.  Then do
> the rolling upgrade as described in the docs above, and then remove the
> read-only status.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


Re: Should Solr expose OS specific file paths?

2020-06-26 Thread Ilan Ginzburg
I believe output from Solr (logs or returned strings representing paths)
should conform to the host platform convention.
Solr should accept either convention as input regardless of the platform
it's running on.

Ilan

On Fri, Jun 26, 2020 at 6:11 PM David Smiley  wrote:

> I started a conversation in Slack with Jason but think Slack is too narrow
> of an audience so I'm going to copy-paste it here.  We want your opinions!
> Here's a copy of it; I hope this doesn't look terrible on the mailing list.
>
> David Smiley  Today at 11:49 AM
> 
> I'd like to debate a point relating to Solr's APIs and file paths it might
> expose, as it relates to the operating system Solr runs on.
> 6 replies
> --
> David Smiley   20 minutes ago
> 
> Apparently there is a streaming expression that can expose file paths
> rooted at a "/userfiles" dir on Solr
> David Smiley   17 minutes ago
> 
> I have a strong opinion that any service exposing (returns) file paths to
> other machines should do so in a way that is operating-system neutral.
> David Smiley   14 minutes ago
> 
> Thus if you're going to list the contents of a directory tree of all files
> at once, then you do it like "foo/bar/baz.txt" always, even if the server
> happens to be running on Windows
> David Smiley   13 minutes ago
> 
> The client parsing this should not have to know what OS the server is
> running in order to parse it.  It should just assume '/' separator.
> gerlowskija   13 minutes ago
> 
> Whereas in discussing this with David - I have the opposite opinion.  I
> think we should take-in/return paths in an OS-specific way.Because that’s
> what existing APIs expect currently (CoreAdmin, etc.), and because I
> imagine an admin setting up their cluster on Windows, well, expects to use
> Windows filepaths.
>


Re: Review your squash-merge message to remove duplicate text

2020-06-24 Thread Ilan Ginzburg
I could only git show the last id in your email David.

That means that for most squash and merge the dialog box should be left
empty, as the PR title should already have the relevant info (Jira ID +
short description), right? And when the PR title does not contain this
info, we should edit it prior to commit.
Making sure I understand you correctly.

Ilan

On Thu, Jun 25, 2020 at 6:00 AM David Smiley  wrote:

> Fellow committers,
>
> When hitting that squash-merge button in GitHub UI, the first line of the
> dialog for the commit message is set apart from the rest of the message.
> Then it's followed by a multi-line text box for the remaining lines.  The
> beginning line of the rest of the message is often duplicated or
> effectively redundant with that first title line.  Please edit your message
> so that the redundancy is gone because the whole thing winds up in the
> commit message, which is ultimately rather untidy/unclean.
>
> Most commits are fine.  Here are some examples of the problem I speak of
> by 3 separate individuals:
>
> 57a892f896f543913d6b22a81577f69184cd94b6 
> 25428013fb0ed8f8fdbebdef3f1d65dea77129c2
> 26075fc1dc06766a9d2af8bd5dd14243c0463a6b
>
> Thanks,
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


Re: The moment you've all been waiting for PLEASE READ, Gradle builds will start failing on warnings on 9x!

2020-06-24 Thread Ilan Ginzburg
What I mean is that when creating a PR, one has to (more precisely "can")
check the box that ant precommit was run.
I'm suggesting we either replace this by gradle precommit or add gradle
precommit in the instructions.

I know I was following all the items in the checklist when submitting a PR
only to see gradle precommit then fail, it would be more friendly for
newcomers to know what to do right away.

I'm not suggesting doing anything differently than what you describe, just
a way to make it clearer to everybody what the expectations are.

Ilan

On Wed, Jun 24, 2020 at 10:26 PM Erick Erickson 
wrote:

> Ilan:
>
> Failing on newly introduced warnings is much more draconian than having to
> run
> “gradlew check -x test” (or precommit). Compilation will fail. So I don’t
> think there’s
> any reason to add a note, it’ll smack you right in the face.
>
> “gradle check -x test” is preferred now to “gradle precommit". “gradle
> check” by
> itself will do both precommit+ and testing all in one go. “gradle helpAnt”
> will show
> you most equivalent tasks to what we’re all familiar with from ant for
> retraining
> purposes ;). The Gradle precommit task was put there for “back compat”.
> Since
> “check” depends on “precommit”, check may contain, now or in future, more
> than precommit. IDK whether they’re identical at this point, Dawid put in
> that magic.
>
> Hmmm we might want to add a note that “gradlew check” does both. I think I
> might
> add that in the help text for gradle though..
>
> precommit has never really been optional, or at least people who don’t run
> it and
> check in code that fails precommit were expected to fix it immediately.
> And _all_
> of us have done that at one point or another. I have to say that Gradle is
> much
> faster, and just being to do “gradle check” and go do something else for a
> while has
> made it much more likely that I’ll run it more often.
>
> Erick
>
> > On Jun 24, 2020, at 2:32 PM, Ilan Ginzburg  wrote:
> >
> > Thank you Erick! This is useful and saves time (I was able to set up
> gradle with the assistance you gave me a while ago).
> >
> > I guess that also means Gradle precommit is no longer optional and
> likely the text initializing PR's descriptions should mention that in some
> way...
> >
> > On Wed, Jun 24, 2020 at 9:13 AM Atri Sharma  wrote:
> > Thank you so much, Erick!
> >
> > On Wed, Jun 24, 2020 at 2:48 AM Erick Erickson 
> wrote:
> > >
> > > As of my push a few minutes ago, Gradle compiling on 9x  WILL FAIL if
> there are any warnings in the code. See LUCENE-9411. I’ve finally finished
> suppressing over 8,000 warnings in Solr, so could check this in. Many
> thanks to Dawid for helping me with the Gradle parts. The goal now is to
> not add any _more_ SuppressWarnings if at all possible. I hope we can start
> taking the suppressions out when we’re working on code, so when working on
> code please consider removing some of them.
> > >
> > > I was hoping that we could also fail ant builds, but there are some
> tricky dependencies in third party code that weren’t easy to resolve in the
> ant world due to licensing issues, if you’re interested in details, see the
> JIRA or ping me on Slack. One consequence of this is that 8x will NOT fail
> on warnings, neither will Ant builds on 9x. If someone wants to try working
> that out, please feel free but I’m just really tired of banging my head
> against that wall.
> > >
> > > So please, Please, PLEASE start compiling 9x with Gradle or cover your
> ears to keep from hearing me complain. And I’ve been taking lessons from my
> 3 1/2 year old grandson on doing that LOUDLY.
> > >
> > > About SuppressWarnings. There were so many of them that there was no
> hope of actually fixing the underlying causes in one go. I’ve enhanced the
> BadApples report to start reporting on the number of SuppressWarnings in
> each file week to week when they increase or decrease. I’ll be nudging
> people if the number of SuppressWarnings starts going up, starting Monday.
> I can’t help but think understanding generics will be improved by working
> through new warnings.
> > >
> > > A couple of side notes for IntelliJ users (IDK about other IDEs, but
> I’d be surprised if there weren’t similar capabilities):
> > >
> > > - When you just open the project, Gradle is automatically configured.
> There’s no need to execute the “gradlew idea” task.
> > >
> > > - You can execute tasks in IntelliJ _really easily_ by clicking on
> them in the gradle window, it’s on the extreme right. It seems much more
> robust than trying the same thing in Ant.
> > >
>

Re: The moment you've all been waiting for PLEASE READ, Gradle builds will start failing on warnings on 9x!

2020-06-24 Thread Ilan Ginzburg
Thank you Erick! This is useful and saves time (I was able to set up gradle
with the assistance you gave me a while ago).

I guess that also means Gradle precommit is no longer optional and likely
the text initializing PR's descriptions should mention that in some way...

On Wed, Jun 24, 2020 at 9:13 AM Atri Sharma  wrote:

> Thank you so much, Erick!
>
> On Wed, Jun 24, 2020 at 2:48 AM Erick Erickson 
> wrote:
> >
> > As of my push a few minutes ago, Gradle compiling on 9x  WILL FAIL if
> there are any warnings in the code. See LUCENE-9411. I’ve finally finished
> suppressing over 8,000 warnings in Solr, so could check this in. Many
> thanks to Dawid for helping me with the Gradle parts. The goal now is to
> not add any _more_ SuppressWarnings if at all possible. I hope we can start
> taking the suppressions out when we’re working on code, so when working on
> code please consider removing some of them.
> >
> > I was hoping that we could also fail ant builds, but there are some
> tricky dependencies in third party code that weren’t easy to resolve in the
> ant world due to licensing issues, if you’re interested in details, see the
> JIRA or ping me on Slack. One consequence of this is that 8x will NOT fail
> on warnings, neither will Ant builds on 9x. If someone wants to try working
> that out, please feel free but I’m just really tired of banging my head
> against that wall.
> >
> > So please, Please, PLEASE start compiling 9x with Gradle or cover your
> ears to keep from hearing me complain. And I’ve been taking lessons from my
> 3 1/2 year old grandson on doing that LOUDLY.
> >
> > About SuppressWarnings. There were so many of them that there was no
> hope of actually fixing the underlying causes in one go. I’ve enhanced the
> BadApples report to start reporting on the number of SuppressWarnings in
> each file week to week when they increase or decrease. I’ll be nudging
> people if the number of SuppressWarnings starts going up, starting Monday.
> I can’t help but think understanding generics will be improved by working
> through new warnings.
> >
> > A couple of side notes for IntelliJ users (IDK about other IDEs, but I’d
> be surprised if there weren’t similar capabilities):
> >
> > - When you just open the project, Gradle is automatically configured.
> There’s no need to execute the “gradlew idea” task.
> >
> > - You can execute tasks in IntelliJ _really easily_ by clicking on them
> in the gradle window, it’s on the extreme right. It seems much more robust
> than trying the same thing in Ant.
> >
> > -- The “assemble” task will bring up a convenient window showing errors
> (including warnings) that you can click on and get right to the offending
> code. “classes” and “testClasses” are also very useful tasks to execute in
> this context.
> >
> > - The “inspections” in IntelliJ point out a lot of things, but not
> anything with SuppressWarnings. It may be worth coming to consensus on
> which inspections are worth enabling. And perhaps distributing a
> configuration. For instance, do we really care for inspections reporting
> “blah could be final”? They’re highlighted in yellow in my setup, and I’ve
> done nothing special. Spend some time looking at those when you’re working
> on code… the number of “method may return null” inspections is scary. Have
> we’ve ever had the released code generate an NPE or anything like that
> .
> >
> > - Please do NOT suppress the _inspections_ in IntelliJ. One of the
> choices IntelliJ offers is to suppress an inspection, and it adds a
> “suppressInspection” comment to the source code specific to IntelliJ. This
> is different than Javas’s SuppressWarnings, and we shouldn’t include
> comments in the code specific to a particular IDE.
> >
> > - The motivation here is that we need all the help from the compiler we
> can get when it comes to as large and complex a code base as Lucene/Solr.
> Yes, it feels constraining. Yes, it means we won’t feel as productive
> because we have to take time to address things we’ve been ignoring. The
> leap of faith is that if we spend a bit of time up front, we can avoid
> having to spend a lot _more_ time fixing errors later in the release cycle.
> The time it takes to fix a problem goes up exponentially the farther down
> the cycle it’s caught. Fixing something when developing may take T minutes.
> Some time later when test start failing, it takes T*X. And when you
> consider community-wide implications of releasing code, getting feedback
> from the field, filing a JIRA, trying to reproduce the problem, checking
> the code, and pushing a fix, the cost of fixing something after it’s
> released goes up enormously. I’m not saying that addressing all the
> complaints something like IntelliJ’s inspections show will magically make
> it unnecessary to make point releases, but avoiding just a few is a win.
> 
> >
> > Erick
> > -
> > To unsubscribe, e-mail: 

Re: Welcome Ilan Ginzburg as Lucene/Solr committer

2020-06-22 Thread Ilan Ginzburg
Thank you, merci, תודה for the trust and the welcome, Noble and everybody!

I’m based in France near Grenoble, a flat city high tech hub surrounded by
mountains.

For the past 7 years I’ve been working on Search at Salesforce. Currently
focusing on SolrCloud scaling.
I also work on making Solr nodes stateless, separating compute from storage
to better fit Public Cloud environments (see
Activate '18 <https://youtu.be/UeTFpNeJ1Fo>, Activate '19
<https://youtu.be/6fE5KvOfb6A>, SOLR-13101
<https://github.com/apache/lucene-solr/tree/jira/SOLR-13101> WIP).

Past employers include EMC/Documentum, HP Labs Palo Alto, Intel. Earlier
still I created the Apple 2 game Saracen (old timers here might remember?).

Other activities include a lot of paragliding, cycling, hiking, drumming (
here <https://youtu.be/-tZnpr3_VXU> during COVID) and a few stints working
as a photographer.

I hold a MA in business administration and a PhD in computer science
(parallel computing).

I’m extremely happy to join the Lucene/Solr community, the future looks
exciting!

Ilan

On Mon 22 Jun 2020 at 15:22, Joel Bernstein  wrote:

> Welcome Ilan!
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, Jun 22, 2020 at 9:11 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Welcome Ilan!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sun, Jun 21, 2020 at 5:44 AM Noble Paul  wrote:
>>
>>> Hi all,
>>>
>>> Please join me in welcoming Ilan Ginzburg as the latest Lucene/Solr
>>> committer.
>>> Ilan, it's tradition for you to introduce yourself with a brief bio.
>>>
>>> Congratulations and Welcome!
>>> Noble
>>>
>>


SolrCloud Autoscaling implementation docs?

2020-06-18 Thread Ilan Ginzburg
Are there any docs, notes or Jiras with actual discussions of Autoscaling
internal implementation classes such as Suggestion, Suggester and
subclasses, Violation, Variable and its implementations, Clause,
ComputedType and the like?

I did get how Policy.Session works, now I need to understand how
Autoscaling makes placement (or movement) decisions.

Thanks,
Ilan


Re: [VOTE] Lucene logo contest

2020-06-16 Thread Ilan Ginzburg
A is cleaner and more modern but C is a lot friendlier and "warmer" (and
less pretentious).
Depending on what the logo is expected to convey, A or C.

On Tue, Jun 16, 2020 at 3:12 PM Dawid Weiss  wrote:

> A is nice and modern... but I still like the current logo better, so
> for me it's "C".
>
> Dawid
>
> On Tue, Jun 16, 2020 at 12:08 AM Ryan Ernst  wrote:
> >
> > Dear Lucene and Solr developers!
> >
> > In February a contest was started to design a new logo for Lucene [1].
> That contest concluded, and I am now (admittedly a little late!) calling a
> vote.
> >
> > The entries are labeled as follows:
> >
> > A. Submitted by Dustin Haver [2]
> >
> > B. Submitted by Stamatis Zampetakis [3] Note that this has several
> variants. Within the linked entry there are 7 patterns and 7 color
> palettes. Any vote for B should contain the pattern number, like B1 or B3.
> If a B variant wins, we will have a followup vote on the color palette.
> >
> > C. The current Lucene logo [4]
> >
> > Please vote for one of the three (or nine depending on your
> perspective!) above choices. Note that anyone in the Lucene+Solr community
> is invited to express their opinion, though only Lucene+Solr PMC cast
> binding votes (indicate non-binding votes in your reply, please). This vote
> will close one week from today, Mon, June 22, 2020.
> >
> > Thanks!
> >
> > [1] https://issues.apache.org/jira/browse/LUCENE-9221
> > [2]
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
> > [3]
> https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf
> > [4]
> https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: StringBuffer usage

2020-06-06 Thread Ilan Ginzburg
From
https://stackoverflow.com/questions/355089/difference-between-stringbuilder-and-stringbuffer

“ StringBuffer
 is
synchronized, StringBuilder
 is
not.”

So if threading and Java memory model requirements are managed already by
the code, StringBuilder will be more efficient. Otherwise StringBuffer is
the right choice.

Ilan

On Sat 6 Jun 2020 at 22:48, Erick Erickson  wrote:

> When is there a good reason to use StringBuffer rather than StringBuilder?
> While going through some of the warnings I happened to run across a few of
> these. I haven’t changed them, and at least one (AuditEvent) has a comment
> about back-compat and is quite recent…
>
> Otherwise I though StringBuilder was preferred. Assuming the one in
> AuditEvent is required, can/should we exclude only that one (and any other
> legitimate ones) from ForbiddenAPI check and add StringBuffer to the check?
>
> Worth a JIRA? Or has this been discussed before and we just live with it?
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Why MultiThreadedOCPTest (sometimes) fails

2020-05-30 Thread Ilan Ginzburg
Erick,

Serializing isn't really an option here, we want to test that execution
order is *not* submission order...

I believe if we wanted to verify the property that this test seems to
assert: "when there are more tasks than the number of executor threads and
all are blocked on a single lock, then a task enqueued afterwards and
requiring another available lock will be run right away", then maybe a unit
test of OverseerTaskProcessor/OverseerCollectionConfigSetProcessor would be
appropriate. But doing so would require refactoring: the run() method of
OverseerTaskProcessor is 200 lines.

I'll likely go with an "overkill" option. It will still not be free from
timing issues, but hopefully less vulnerable. And I'll try to identify when
test preconditions do not hold, in order to fail with an informative
message such as "slow task finished execution too soon, cannot test running
tasks in parallel".

Will create a new Jira.

Thanks for your week-end feedback,
Ilan

On Sat, May 30, 2020 at 11:26 PM Erick Erickson 
wrote:

> Ilan:
>
> Please raise a new JIRA and attach any fixes there, a Git PR or patch
> whichever you prefer. Your analysis will get lost in the noise for
> SOLR-12801. And thanks for digging! We usually title them something like
> “harden MultiThreadedOCPTest” or “fix failing MultiThreadedOCPTest” or
> similar.
>
> I haven’t really looked at the code you’re analyzing, it’s the weekend
> after all ;). So ignore this question if it makes no sense. Anyway, I’m
> always suspicious of using delays for just this kind of reason; depending
> on the hardware something may fail sometimes. Is there any way to make the
> sequencing more of a positive lock? Unfortunately that can be tricky, I’ve
> wound up doing things like serializing all tasks which…er…isn’t a good fix
> ;). But it sounds like your “overkill” section is along those lines?
>
> Up to you.
>
> Meanwhile, I’ve started beasting that test on my spare machine. If you’re
> curious about what that is:
> https://gist.github.com/markrmiller/dbdb792216dc98b018ad
>
> But don’t worry about that part. The point is I can run the test multiple
> times, with N running in parallel, hopefully making my machine exhibit the
> problem sometime over the night. If I can get it to fail before but not
> after your fix, it provides some reassurance that your fix is working. Not
> totally certain of course. Otherwise, we’ll just commit your fixes and see
> if Hoss’ rollups stop showing it.
>
> Thanks again!
> Erick
>
>
>
> > On May 30, 2020, at 1:42 PM, Ilan Ginzburg  wrote:
> >
> > MultiThreadedOCPTest
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Why MultiThreadedOCPTest (sometimes) fails

2020-05-30 Thread Ilan Ginzburg
Following Erick’s Bad  report, I looked at MultiThreadedOCPTest.test().
I've found a failure in testFillWorkQueue() in Jenkins logs (not able to
reproduce locally).

This test enqueues a large number of tasks (115, more than the 100
Collection API parallel executors) to the Collection API queue for a
collection COLL_A, then observes a short delay and enqueues a task for
another collection COLL_B.
It verifies that the COLL_B task (that does not require the same lock as
the COLL_A tasks) completes before the third (?) COLL_A task.

*Test failures happen for a disarmingly simple reason:* when enqueues are
slowed down enough, the first 3 tasks on COLL_A complete even before the
COLL_B task gets enqueued!

In the failed Jenkins test execution, the COLL_B task enqueue happened
1275ms after the enqueue of the first COLL_A, leaving plenty of time for a
few (and possibly all) COLL_A tasks to complete.

I suggest two changes (is adding a PR to SOLR-12801
 the right way to do it?
Will somebody merge it from there?):

   - Make the “blocking” COLL_A task longer to execute (increase from 1 to
   2 seconds) to compensate for slow enqueues. Hopefully 2 seconds is
   sufficient… If it’s not, we can increase it more later.
   - Verify the COLL_B task (a 1ms task) finishes before the *first* COLL_A
   task (the long running one) and not the 3rd. This would be a better
   indication that even though the collection queue was filled with tasks
   waiting for a busy lock, a non competing task was picked and executed right
   away.

There would still be a grey area: what if the enqueue of the COLL_B task
happened before the first COLL_A task even started to execute? If we wanted
to deal with that, we could enqueue all COLL_A tasks, make the second COLL_A
task long running (and not the first), enqueue the COLL_B task once the
first COLL_A task has completed and verify the COLL_B task completes before
the second (long running) COLL_A task. I believe that's slightly overkill
(yet easier to implement than to describe, so can include that as well in
the PR if deemed useful).

Ilan


Re: BadApple report

2020-05-25 Thread Ilan Ginzburg
Thanks that helps. I'll try to have a look at some of the failures related
to areas I know.

Ilan

On Mon, May 25, 2020 at 7:07 PM Erick Erickson 
wrote:

> Ilan:
>
> That’s, unfortunately, not an easy question. Hoss’s rollups are here:
> http://fucit.org/solr-jenkins-reports/failure-report.html which show the
> rates, but not where they came from.
>
> Here’s an example of a failure from Jenkins, if you follow the link you
> can see the full output, (click “console output”, then “full log”):
> https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/3181/. I usually
> see the individual ones go by by subscribing to “bui...@lucene.apache.org
> ”.
>
> Otherwise, what I often do is use Mark Miller’s “beasting” script to see
> if I can get it to reproduce locally and go from there:
>
> https://gist.github.com/markrmiller/dbdb792216dc98b018ad
>
> It’s all complicated by the fact that the failures are intermittent.
>
> Best,
> Erick
>
> > On May 25, 2020, at 11:22 AM, Ilan Ginzburg  wrote:
> >
> > Where are the test failure details?
> >
> > On Mon, May 25, 2020 at 4:47 PM Erick Erickson 
> wrote:
> > Here’s the summary:
> >
> > Raw fail count by week totals, most recent week first (corresponds to
> bits):
> > Week: 0  had  113 failures
> > Week: 1  had  103 failures
> > Week: 2  had  102 failures
> > Week: 3  had  343 failures
> >
> >
> > Failures in Hoss' reports for the last 4 rollups.
> >
> > There were 511 unannotated tests that failed in Hoss' rollups. Ordered
> by the date I downloaded the rollup file, newest->oldest. See above for the
> dates the files were collected
> > These tests were NOT BadApple'd or AwaitsFix'd
> >
> > Failures in the last 4 reports..
> >Report   Pct runsfails   test
> >  0123   0.7 1593 40  BasicDistributedZkTest.test
> >  0123   2.1 1518 28  MultiThreadedOCPTest.test
> >  0123   0.7 1613 14  RollingRestartTest.test
> >  0123   7.1 1635 44
> ScheduledTriggerIntegrationTest.testScheduledTrigger
> >  0123   2.4 1614 17
> SearchRateTriggerTest.testWaitForElapsed
> >  0123   0.2 1614  6
> ShardSplitTest.testSplitShardWithRuleLink
> >  0123   0.5 1577  5
> SolrCloudReportersTest.testExplicitConfiguration
> >  0123   0.7 1560 19  TestInPlaceUpdatesDistrib.test
> >  0123   1.0 1566 17  TestPackages.testPluginLoading
> >  0123   0.8 1598  7
> TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
> >  0123   0.7 1598  8  TestSimScenario.testAutoAddReplicas
> > 
> >
> >
> > Full report:
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: BadApple report

2020-05-25 Thread Ilan Ginzburg
Where are the test failure details?

On Mon, May 25, 2020 at 4:47 PM Erick Erickson 
wrote:

> Here’s the summary:
>
> Raw fail count by week totals, most recent week first (corresponds to
> bits):
> Week: 0  had  113 failures
> Week: 1  had  103 failures
> Week: 2  had  102 failures
> Week: 3  had  343 failures
>
>
> Failures in Hoss' reports for the last 4 rollups.
>
> There were 511 unannotated tests that failed in Hoss' rollups. Ordered by
> the date I downloaded the rollup file, newest->oldest. See above for the
> dates the files were collected
> These tests were NOT BadApple'd or AwaitsFix'd
>
> Failures in the last 4 reports..
>Report   Pct runsfails   test
>  0123   0.7 1593 40  BasicDistributedZkTest.test
>  0123   2.1 1518 28  MultiThreadedOCPTest.test
>  0123   0.7 1613 14  RollingRestartTest.test
>  0123   7.1 1635 44
> ScheduledTriggerIntegrationTest.testScheduledTrigger
>  0123   2.4 1614 17
> SearchRateTriggerTest.testWaitForElapsed
>  0123   0.2 1614  6
> ShardSplitTest.testSplitShardWithRuleLink
>  0123   0.5 1577  5
> SolrCloudReportersTest.testExplicitConfiguration
>  0123   0.7 1560 19  TestInPlaceUpdatesDistrib.test
>  0123   1.0 1566 17  TestPackages.testPluginLoading
>  0123   0.8 1598  7
> TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast
>  0123   0.7 1598  8  TestSimScenario.testAutoAddReplicas
> 
>
>
> Full report:
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


Re: Gradle precommit checks

2020-05-20 Thread Ilan Ginzburg
Thanks for your detailed response. I agree. Simplicity brings value.
And I agree that performance degradation of format method vs. native concat
is negligible in the warn context.

I need to look at the trace method to see the difference between the two
calls (because I do believe that ("text" + obj) is equivalent to ("text" +
obj.toString()). Is this assumption wrong?)

Will run gradle precommit locally to stop being annoyed by these details.

Ilan


Le mer. 20 mai 2020 à 20:49, Erick Erickson  a
écrit :

> Ilan:
>
> Technically it may be true (although frankly I don’t know for sure), but
> practically at warn level any extra work isn’t worth the confusion.
>
>  What do I mean by that? Well, we built up a huge debt with all sorts of
> debug and info and even trace messages that either concatenated strings or
> called methods or… See LUCENE-7788 for the amount of code that had to be
> changed.
>
> So when I wrote the checker, I decided to flag all the concatenations in
> the code reasoning that for warn level messages, any extra work was so
> trivial that you couldn’t really measure it overall, and it’d be worth it
> to keep from “broken window" issues. I.e. someone sees log.warn(“whatever”
> + “whenever”) and thinks it’s OK to use that pattern for debug, trace or
> info.
>
> I don’t particularly care if warn messages are a little inefficient on the
> theory that there should be so few of them that it’s not measurable.
>
> All that said, this is absolutely a judgement call that I made trying to
> balance the technical with the practical. Given the number of people who
> contribute to the code, I think it’s worthwhile to keep certain patterns
> out of the code. Especially given how obscure logging costs are. The
> difference between 'log.trace(“message {}”, object.toString())’ and
> 'log.trace(“message {}”, object)’ for instance is unknown to a _lot_ of
> developers. Including me before I started looking at logging in general ;)
>
> Best,
> Erick
>
> > On May 20, 2020, at 1:05 PM, Ilan Ginzburg  wrote:
> >
> > This might have been discussed previously but since I'm seeing this
> behavior...
> >
> > Gradle precommit check does not allow code such as:
> > log.warn("Only in tree one: " + t1);
> >
> > And forces changing it into:
> > log.warn("Only in tree one: {}", t1);
> >
> > I do understand such constraints for debug level logs to not pay the
> concatenation cost if the log is not output (unless the log is inside an if
> block testing debug level), but for logs generally turned on (which I
> assume warn logs are), this is counter productive: the second form is less
> efficient during execution than the first one.
> >
> > Ilan
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Gradle precommit checks

2020-05-20 Thread Ilan Ginzburg
This might have been discussed previously but since I'm seeing this
behavior...

Gradle precommit check does not allow code such as:
log.warn("Only in tree one: " + t1);

And forces changing it into:
log.warn("Only in tree one: {}", t1);

I do understand such constraints for debug level logs to not pay the
concatenation cost if the log is not output (unless the log is inside an if
block testing debug level), but for logs generally turned on (which I
assume warn logs are), this is counter productive: the second form is less
efficient during execution than the first one.

Ilan


Re: Question on changes to /admin/zookeeper handler

2020-05-17 Thread Ilan Ginzburg
Answering to myself: this is used from the Admin UI - Cloud - Graph page.
Therefore need to stay consistent (including the "Json within Json"
format for collection data), and reflect any URL changes in
services.js.

On Sun, May 17, 2020 at 4:30 PM Ilan Ginzburg  wrote:
>
> I'm in the process of removing everything /clusterstate.json from Solr
> 9.0 (https://issues.apache.org/jira/browse/SOLR-12823).
>
> There's a choice to be made regarding /admin/zookeeper endpoint
> (ZookeeperInfoHandler).
>
> Looking at http://localhost:8985/solr/admin/zookeeper?path=/clusterstate.json
> (with or without =true) there's not much info in this call when
> all collections are stateFormat=2.
>
> But 
> http://localhost:8985/solr/admin/zookeeper?path=/clusterstate.json=true=graph
> is a way to get a (paginated?) list of all collections (including
> stateFormat=2 collections).
>
> With /clusterstate.json gone, shall we continue to support the same
> URL (the last one) to list all collections (obviously minus znode
> specific info about  /clusterstate.json) or maybe make things a bit
> more explicit, for example
> http://localhost:8985/solr/admin/zookeeper/listcollections to provide
> that output?
> I prefer the second approach, cleaner than pretending that
> /clusterstate.json still exists.
>
> Other options I've missed here?
>
> Thanks,
> Ilan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Question on changes to /admin/zookeeper handler

2020-05-17 Thread Ilan Ginzburg
I'm in the process of removing everything /clusterstate.json from Solr
9.0 (https://issues.apache.org/jira/browse/SOLR-12823).

There's a choice to be made regarding /admin/zookeeper endpoint
(ZookeeperInfoHandler).

Looking at http://localhost:8985/solr/admin/zookeeper?path=/clusterstate.json
(with or without =true) there's not much info in this call when
all collections are stateFormat=2.

But 
http://localhost:8985/solr/admin/zookeeper?path=/clusterstate.json=true=graph
is a way to get a (paginated?) list of all collections (including
stateFormat=2 collections).

With /clusterstate.json gone, shall we continue to support the same
URL (the last one) to list all collections (obviously minus znode
specific info about  /clusterstate.json) or maybe make things a bit
more explicit, for example
http://localhost:8985/solr/admin/zookeeper/listcollections to provide
that output?
I prefer the second approach, cleaner than pretending that
/clusterstate.json still exists.

Other options I've missed here?

Thanks,
Ilan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 8.0

2020-05-12 Thread Ilan Ginzburg
I could work on a PR to remove it in Solr 9, unless you think it's
super tricky work.

Ilan

On Tue, May 12, 2020 at 3:32 PM Erick Erickson  wrote:
>
> Definitely +1, but I won’t have the bandwidth to help.
>
> > On May 12, 2020, at 9:03 AM, Jan Høydahl (Jira)  wrote:
> >
> >
> >[ 
> > https://issues.apache.org/jira/browse/SOLR-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105402#comment-17105402
> >  ]
> >
> > Jan Høydahl commented on SOLR-12823:
> > 
> >
> > Any plan to remove clusterstate.json for Solr 9? And perhaps let Overseer 
> > fail if it is non-empty, i.e. if not all collections are migrated to format 
> > 2?
> >
> >> remove clusterstate.json in Lucene/Solr 8.0
> >> ---
> >>
> >>Key: SOLR-12823
> >>URL: https://issues.apache.org/jira/browse/SOLR-12823
> >>Project: Solr
> >> Issue Type: Task
> >>   Reporter: Varun Thacker
> >>   Priority: Major
> >>
> >> clusterstate.json is an artifact of a pre 5.0 Solr release. We should 
> >> remove that in 8.0
> >> It stays empty unless you explicitly ask to create the collection with the 
> >> old "stateFormat" and there is no reason for one to create a collection 
> >> with the old stateFormat.
> >> We should also remove the "stateFormat" argument in create collection
> >> We should also remove MIGRATESTATEVERSION as well
> >>
> >>
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >
> > -
> > To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: issues-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Overseer documentation

2020-04-24 Thread Ilan Ginzburg
Thanks David, Mike and Shalin. Glad you find this useful!

Mike, do you mind pointing out here or in the doc which of the bugs
you saw in production? That way I can start filing Jiras for actual
issues.

Ilan

On Fri, Apr 24, 2020 at 4:01 AM Shalin Shekhar Mangar
 wrote:
>
> This is good stuff Ilan. Thank you for writing and sharing with us. I intend 
> to take a deeper look at this next week.
>
> On Wed, Apr 22, 2020 at 2:36 AM Ilan Ginzburg  wrote:
>>
>> Hello Solr devs,
>>
>> This is my first post here. I work at Salesforce in France, we're
>> adopting SolrCloud and we need it to scale more than it currently
>> does.
>>
>> I've looked at Overseer and documented my understanding. I'm sharing
>> the result, it might help others and is a way to get feedback (I might
>> have misunderstood some things) and/or collaboration on continuing
>> documenting the implementation. Basically I started writing the doc I
>> wanted to find.
>>
>> In the process, I believe I've identified what may be a few bugs
>> (there's a section listing them at the beginning). I've found these by
>> reading code (not running code), so take with a grain of salt.
>> I plan to file Jiras for those bugs that do seem real and are
>> important enough, and then also start working on some to help
>> fix/improve.
>>
>> https://docs.google.com/document/d/1KTHq3noZBVUQ7QNuBGEhujZ_duwTVpAsvN3Nz5anQUY/
>>
>> This is WIP. Please do not hesitate to provide feedback/leave comments.
>>
>> Thanks,
>> Ilan
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: using S3 as the Directory for Solr

2020-04-24 Thread Ilan Ginzburg
Hi Rahul,

I don't have a direct answer to your question as I don't know of any
S3 based Directory implementation. Such an implementation would likely
be more complex than an HDFS one.

Reason is S3 has eventual consistency. When an S3 file is updated you
might still read the old content for a while, and similarly a
"directory listing" in S3 might not immediately show recently added
files. This basically requires storing metadata elsewhere (see for
example https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html
and 
https://blog.cloudera.com/introducing-s3guard-s3-consistency-for-apache-hadoop/).

At Salesforce we are working on storing SolrCloud search indexes on
S3/GCS in a way that allows these indexes to be shared between
replicas. We use Zookeeper to deal with S3's eventual consistency.
See Activate presentation last year
https://www.youtube.com/watch?v=6fE5KvOfb6A, and code at
https://github.com/apache/lucene-solr/tree/jira/SOLR-13101 (SHARED
replica type is a good entry point for looking at the changes).

In a nutshell, we use nodes' local disks as a cache that can be lost
when a node fails, and read/write segments from/to S3 as needed.
Queries are then always served from local disk and indexing always
happens locally then pushed to S3.

Could a solution to your use case be built instead using S3 based
backup/restore? Would require the right data partitioning (to get
reasonable restore then query latency for cold data and reasonable
backup time for modified indexes) and likely a friendly indexing
pipeline that can resubmit data indexed since last backup...

Ilan

On Fri, Apr 24, 2020 at 5:06 AM dhurandar S  wrote:
>
> Hi Jan,
>
> Thank you for your reply. The reason we are looking for S3 is since the 
> volume is close to 10 Petabytes.
> We are okay to have higher latency of say twice or thrice that of placing 
> data on the local disk. But we have a requirement to have long-range data and 
> providing Seach capability on that.  Every other storage apart from S3 turned 
> out to be very expensive at that scale.
>
> Basically I want to replace
>
> -Dsolr.directoryFactory=HdfsDirectoryFactory \
>
>  with S3 based implementation.
>
>
> regards,
> Rahul
>
>
>
>
>
> On Thu, Apr 23, 2020 at 3:12 AM Jan Høydahl  wrote:
>>
>> Hi,
>>
>> Is your data so partitioned that it makes sense to consider splitting up
>> in multiple collections and make some arrangement that will keep only
>> a few collections live at a time, loading index files from S3 on demand?
>>
>> I cannot see how an S3 directory would be able to effectively cache files
>> in S3 and what units the index files would be stored as?
>>
>> Have you investigated EFS as an alternative? That would look like a
>> normal filesystem to Solr but might be cheaper storage wise, but much
>> slower.
>>
>> Jan
>>
>> > 23. apr. 2020 kl. 06:57 skrev dhurandar S :
>> >
>> > Hi,
>> >
>> > I am looking to use S3 as the place to store indexes. Just how Solr uses
>> > HdfsDirectory to store the index and all the other documents.
>> >
>> > We want to provide a search capability that is okay to be a little slow but
>> > cheaper in terms of the cost. We have close to 2 petabytes of data on which
>> > we want to provide the Search using Solr.
>> >
>> > Are there any open-source implementations around using S3 as the Directory
>> > for Solr ??
>> >
>> > Any recommendations on this approach?
>> >
>> > regards,
>> > Rahul
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Overseer documentation

2020-04-21 Thread Ilan Ginzburg
Hello Solr devs,

This is my first post here. I work at Salesforce in France, we're
adopting SolrCloud and we need it to scale more than it currently
does.

I've looked at Overseer and documented my understanding. I'm sharing
the result, it might help others and is a way to get feedback (I might
have misunderstood some things) and/or collaboration on continuing
documenting the implementation. Basically I started writing the doc I
wanted to find.

In the process, I believe I've identified what may be a few bugs
(there's a section listing them at the beginning). I've found these by
reading code (not running code), so take with a grain of salt.
I plan to file Jiras for those bugs that do seem real and are
important enough, and then also start working on some to help
fix/improve.

https://docs.google.com/document/d/1KTHq3noZBVUQ7QNuBGEhujZ_duwTVpAsvN3Nz5anQUY/

This is WIP. Please do not hesitate to provide feedback/leave comments.

Thanks,
Ilan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org