from:"Alexandre Rafalovitch"

Re: Announcing githubsearch!

2024-02-19 Thread Alexandre Rafalovitch

Next step towards full meta would be by writing a Lucene Introduction
book,  using code examples from this project and the dataset this project
uses.

Regards,
Alex

On Mon, Feb 19, 2024, 11:40 a.m. Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi Team,
>
> ~1.5 years ago (August 2022) we migrated our Lucene issue tracking from
> Jira to GitHub. Thank you Tomoko for all the hard work doing such a
> complex, multi-phased, high-fidelity migration!
>
> I finally finished also migrating jirasearch to GitHub:
> githubsearch.mikemccandless.com. It was tricky because GitHub issues/PRs
> are fundamentally more complex than Jira's data model, and the GitHub REST
> API is also quite rich / heavily normalized. All of the source code for
> githubsearch lives here
> .
> The UI remains its barebones self ;)
>
> Githubsearch
> 
> is dog food for us: it showcases Lucene (currently 9.8.0), and many of its
> fun features like infix autosuggest, block join queries (each comment is a
> sub-document on the issue/PR), DrillSideways faceting, near-real-time
> indexing/searching, synonyms (try “oome
> ”),
> expressions, non-relevance and blended-relevance sort, etc.  (This old
> blog post
> 
>  goes
> into detail.)  Plus, it’s meta-fun to use Lucene to search its own issues,
> to help us be more productive in improving Lucene!  Nicely recursive.
>
> In addition to good ol’ searching by text, githubsearch
>  has some new/fun features:
>
>- Drill down to just PRs or issues
>- Filter by “review requested” for a given user: poor Adrien has 8
>(open) now
>
> 
>(sorry)! Or see your mentions (Robert is mentioned in 27 open
>issues/PRs
>
> ).
>Or PRs that you reviewed (Uwe has reviewed 9 still-open PRs
>
> ).
>Or issues and PRs where a user has had any involvement at all (Dawid
>has interacted on 197 issues/PRs
>
> 
>).
>- Find still-open PRs that were created by a New Contributor
>
> 
>(an author who has no changes merged into our repository) or
>Contributor
>
> 
>(non-committer who has had some changes merged into our repository) or
>Member
>
> 
>- Here are the uber-stale (last touched more than a month ago) open
>PRs by outside contributors
>
> .
>We should ideally keep this at 0, but it’s 83 now!
>- “Link to this search” to get a short-er, more permanent URL (it is
>NOT a URL shortener, though!)
>- Save named searches you frequently run (they just save to local
>cookie state on that one browser)
>
> I’m sure there are exciting bugs, feedback/patches welcome!  If you see
> problems, please reply to this email or file an issue here
> .
>
> Note that jirasearch 
> remains running, to search Solr, Tika and Infra issues.
>
> Happy Searching,
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>

Re: Soften Jira's note when opening new issues?

2021-09-21 Thread Alexandre Rafalovitch

Looks great to me.

Thank you for listening.
   Alex

On Tue, 21 Sept 2021 at 03:23, Adrien Grand  wrote:
>
> I think you made a good point, Alexandre. Would something like this read 
> better:
>
> ```
> This project has a user mailing list and an IRC channel for support. If you 
> are looking for support, or if you are not sure whether the behavior that you 
> are observing is expected or not, please discuss it there first.
> ```
>
> On Mon, Sep 20, 2021 at 2:22 PM Alexandre Rafalovitch  
> wrote:
>>
>> +1.
>> Ideally, the final version could still be several shorter sentences. To 
>> avoid needing to be a programmer to parse the deeply nested, if totally 
>> logical, structure.
>>
>> On Mon., Sep. 20, 2021, 4:33 a.m. Adrien Grand,  wrote:
>>>
>>> Hello,
>>>
>>> Jira gives the following note when opening an issue:
>>>
>>> ```
>>> This project has a user mailing list and an IRC channel for support. Please 
>>> ensure that you have discussed your problem using one of those resources 
>>> BEFORE creating this ticket.
>>> ```
>>>
>>> This can be quite intimidating for someone who has never worked with us 
>>> before, and we don't apply this logic for ourselves, for instance I feel 
>>> free to open JIRAs without discussing them first on IRC or dev@l.a.o. Given 
>>> that we are not seeing much irrelevant traffic on JIRA, I'd like to soften 
>>> the message to something like below:
>>>
>>> ```
>>> If you are looking for support, or if you are not sure whether the behavior 
>>> that you are observing is expected or not, please discuss your problem on 
>>> the user mailing-list instead before creating a ticket.
>>> ```
>>>
>>> What do you think?
>>>
>>> --
>>> Adrien
>
>
>
> --
> Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Soften Jira's note when opening new issues?

2021-09-20 Thread Alexandre Rafalovitch

+1.
Ideally, the final version could still be several shorter sentences. To
avoid needing to be a programmer to parse the deeply nested, if totally
logical, structure.

On Mon., Sep. 20, 2021, 4:33 a.m. Adrien Grand,  wrote:

> Hello,
>
> Jira gives the following note when opening an issue:
>
> ```
> This project has a user mailing list and an IRC channel for support.
> Please ensure that you have discussed your problem using one of those
> resources BEFORE creating this ticket.
> ```
>
> This can be quite intimidating for someone who has never worked with us
> before, and we don't apply this logic for ourselves, for instance I feel
> free to open JIRAs without discussing them first on IRC or dev@l.a.o.
> Given that we are not seeing much irrelevant traffic on JIRA, I'd like to
> soften the message to something like below:
>
> ```
> If you are looking for support, or if you are not sure whether the
> behavior that you are observing is expected or not, please discuss your
> problem on the user mailing-list instead before creating a ticket.
> ```
>
> What do you think?
>
> --
> Adrien
>

Re: Welcome Greg Miller as Lucene committer

2021-06-02 Thread Alexandre Rafalovitch

Welcome Greg,

Great to have you.

Regards,
   Alex.

On Sun, 30 May 2021 at 10:35, Greg Miller  wrote:
>
> Thanks everyone! I'm honored to have been nominated and look forward
> to continuing to work with all of you on Lucene! I'm incredibly
> grateful for everyone that has helped me so far. There's a lot to
> learn in Lucene and this community has been a fantastic help ramping
> up, providing thorough PR feedback/ideas/etc. and simply been a great
> group of people to collaborate with.
>
> As far as a brief bio goes, I live in the Seattle area and work for
> Amazon's "Product Search" team, which I joined in January of this
> year. I'm a naturally curious person and find myself fascinated by
> data structure / algorithm problems, so diving into Lucene has been
> really fun! I'm also an avid runner (mostly marathons but right now
> I'm training for my first one-mile race on a track), and love to
> travel with my wife and daughter (although that's been on "pause" for
> obvious reasons for the past year+). My biggest accomplishment of 2021
> so far has been teaching my daughter to ride a bike, but being
> nominated as a Lucene committer is a close second :)
>
> Thanks again everyone and looking forward to continuing to work with all of 
> you!
>
> Cheers,
> -Greg
>
> On Sat, May 29, 2021 at 7:59 PM Michael McCandless
>  wrote:
> >
> > Welcome Greg!
> >
> > Mike
> >
> > On Sat, May 29, 2021 at 3:47 PM Adrien Grand  wrote:
> >>
> >> I'm pleased to announce that Greg Miller has accepted the PMC's invitation 
> >> to become a committer.
> >>
> >> Greg, the tradition is that new committers introduce themselves with a 
> >> brief bio.
> >>
> >> Congratulations and welcome!
> >>
> >>
> >> --
> >> Adrien
> >
> > --
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Who has access to Google Analytics for Lucene site?

2021-03-19 Thread Alexandre Rafalovitch

We have already discussed and agreed on removing it when the time
comes. This discussion is specifically about trying to regain access
to the analytics that we already collected, at a higher granularity
than what current Infra stats seems to provide.

Even if we took analytics out today, avoiding years of potential
insights seems wasteful. I have reviewed the discussion linked and it
does not seem to present any additional arguments to contradict the
current position.

Regards,
   Alex.

On Fri, 19 Mar 2021 at 00:12, Justin Mclean  wrote:
>
> Hi,
>
> AS discussed above it's probably best not to use Google Analytics as the ASF 
> currently discourages it use. Please see: 
> https://issues.apache.org/jira/browse/LEGAL-470
>
> Privacy is likely to ask projects to remove it in the near future.
>
> Infra should be able you to get download stats if you need those.
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Who has access to Google Analytics for Lucene site?

2021-03-03 Thread Alexandre Rafalovitch

What is missing is the disaggregated statistics (Referrers to specific
pages, etc). And possibly a lot more, as I just pulled a couple of
examples of the top of my head, I am not a GA specialist, it is just
one of many things I do in my overall job. The specific metrics
available will actually depend on the version of GA tracker being run,
on options enabled in GA Admin, etc.

And if people don't see the value in having more detailed statistics,
I will not waste my time on doing it. I have no commercial interest
riding on the decision.

My understanding was that the analytics was in place but there was
nobody volunteering to leverage it, so we were paying the "information
leakage tax" without getting anything out of it. I've offered to solve
the "nobody volunteered" part to - at least - have a fully informed
discussion.

This conversation feels like it is veering towards a formal vote on
"information leakage tax". If that's actually what we want to do, I am
+0 on keeping it for at least 3 month for Lucene and +1 for having it
for Solr with a review at the end of that.

Regards,
   Alex.

On Wed, 3 Mar 2021 at 09:41, Robert Muir  wrote:
>
> I'm not trying to come across as anti-analytics, i'm not. But I feel a lot of 
> those questions can be answered by the aggregate stats already provided by 
> apache (presumably from httpd access_log), without adding 
> privacy-invading-google-tracker javascripts and cookies. So, while your 
> answers are good, they don't justify google analytics in my eyes.
>
> As an example, lets look at 
> https://uls.apache.org/exports/lucene.apache.org.yaml and consider your list
> 1. You can see breakdown of pageviews and "visitors" by day. I don't know how 
> they determine unique "visitor" since it isn't cookie tracking: maybe some 
> combo of (IP address, TLS session ID, user agent), but whatever they have is 
> good enough for me.
> 2. I can see most popular pages and your 6.6 ref guide stuff
> 3. Top referrers gives you a rough idea of where people are coming from 
> (including internal referrers). So people are clicking links on those pages.
> 4. see #1.
> 5. see #3. Google provides no additional magic here, this is referer (sic) 
> header either way.
> 6. i think the download process is actually hacked up/convoluted just to 
> force some GA tracking. At least i know if i disable javascript, the download 
> buttons still work.
> 7. what is missing?
>
>
> On Wed, Mar 3, 2021 at 9:15 AM Alexandre Rafalovitch  
> wrote:
>>
>> I block any analytics I can find. I am with you on the overall positioning. 
>> And yes, the absolute numbers lie.
>>
>> At the same time, we can get a lot of relative numbers and trends that are 
>> valuable in other ways.
>>
>> For example:
>> 1) Are the social media announcements of new releases drive people to 
>> download Solr?
>> 2) Which Ref Guide pages (if we had GA there) are most popular and why can't 
>> we convince users to use the latest version instead of 6.6 (looking at 
>> referrals). My specific peeve is that I think URPs page should be a lot more 
>> visible, I would love to see if my assumptions are true by seeing if people 
>> discover that page, relative to other pages.
>> 3) What is the page flow on the website? Are there any pages that are 
>> complete invisible because of how we linked to them? Are there super popular 
>> pages that are completely out of date?
>> 4) Do we have increase or decrease in traffic matching specific events
>> 5) Is there a specific partner/agency site that is driving a lot of 
>> attention to Solr; can we replicate that with others?
>> 6) Do we even count downloads in GA? Because GA is for HTML pages only by 
>> default
>> 7) If any of this is valuable, but we want to pull out GA anyway, this would 
>> help to know what tracking information we would like from Apache Infra?
>>
>> In general, these kinds of questions are the domain of Developer 
>> Relationships role. Lucene/Solr project does not have one as such, which may 
>> explain why not many people understand the values of modern analytics 
>> solutions. I am offering my time to make the value of analytics concrete, so 
>> we are making the next decision based on  reality rather than our collective 
>> imagination of what analytics actually does or does not.
>>
>> Regards,
>>Alex.
>>
>>
>>
>>
>> On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir,  wrote:
>>>
>>>
>>>
>>> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov  wrote:
>>>>
>>>> Before you look, should we have a betting pool on the number of
>>>> downloads/day

Re: Who has access to Google Analytics for Lucene site?

2021-03-03 Thread Alexandre Rafalovitch

I block any analytics I can find. I am with you on the overall positioning.
And yes, the absolute numbers lie.

At the same time, we can get a lot of relative numbers and trends that are
valuable in other ways.

For example:
1) Are the social media announcements of new releases drive people to
download Solr?
2) Which Ref Guide pages (if we had GA there) are most popular and why
can't we convince users to use the latest version instead of 6.6 (looking
at referrals). My specific peeve is that I think URPs page should be a lot
more visible, I would love to see if my assumptions are true by seeing if
people discover that page, relative to other pages.
3) What is the page flow on the website? Are there any pages that are
complete invisible because of how we linked to them? Are there super
popular pages that are completely out of date?
4) Do we have increase or decrease in traffic matching specific events
5) Is there a specific partner/agency site that is driving a lot of
attention to Solr; can we replicate that with others?
6) Do we even count downloads in GA? Because GA is for HTML pages only by
default
7) If any of this is valuable, but we want to pull out GA anyway, this
would help to know what tracking information we would like from Apache
Infra?

In general, these kinds of questions are the domain of Developer
Relationships role. Lucene/Solr project does not have one as such, which
may explain why not many people understand the values of modern analytics
solutions. I am offering my time to make the value of analytics concrete,
so we are making the next decision based on  reality rather than our
collective imagination of what analytics actually does or does not.

Regards,
   Alex.

On Wed., Mar. 3, 2021, 8:40 a.m. Robert Muir,  wrote:

>
>
> On Wed, Mar 3, 2021 at 8:35 AM Michael Sokolov  wrote:
>
>> Before you look, should we have a betting pool on the number of
>> downloads/day? I will arrange for a bottle of some excellent liquid to
>> be sent to the closest guess at the number of redirects to the mirror
>> sites, as determined by Alexandre. Also, has it been increasing over
>> the last year? Finally, if we can predict these trends using activity
>> on the main apache site, maybe we don't need to track independently.
>>
>
> Why do we even care?
>
> How many users are downloading lucene tgz from the site versus using an
> artifact in maven repositories (via maven, gradle, etc)? How many users are
> downloading solr tgz from the site versus using solr official image from
> docker hub?
>
> I'm just asking these questions to try to understand the need for the
> google tracking.
>
>

Re: Who has access to Google Analytics for Lucene site?

2021-03-03 Thread Alexandre Rafalovitch

I am offering to look at the numbers, if I can get access.

We can do that for a couple of months and then take it out.

I am not clear whether I negated Rob's position here from the full
propositional logic though, as he used 'or'.

I do agree that there is no point for analytics that is not used. So, let's
use it and have clear picture of its value.

Regards,
Alex

On Wed., Mar. 3, 2021, 7:46 a.m. Ishan Chattopadhyaya, <
ichattopadhy...@gmail.com> wrote:

> +1 Rob
>
> On Wed, 3 Mar, 2021, 5:55 pm Uwe Schindler,  wrote:
>
>> Hi,
>>
>>
>>
>> sorry, I just noticed that the account disappeared from my google
>> analytics profile.
>>
>>
>>
>> It was setup by Grant Ingersoll, maybe he can give us access again. If it
>> is no longer there, we lost the data, but we can recreate one.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Uwe Schindler 
>> *Sent:* Wednesday, March 3, 2021 1:10 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: Who has access to Google Analytics for Lucene site?
>>
>>
>>
>> Hi,
>> I have access.
>> Uwe
>>
>> Am March 3, 2021 8:10:29 AM UTC schrieb "Jan Høydahl" <
>> jan@cominvent.com>:
>>
>> Hi,
>>
>> Who has access to the Lucene site GA account? If it is dead in the waters, 
>> I'd like to setup a new one also for Lucene.
>>
>> I plan to publish the new web sites today, would be nice to track and graph 
>> the traffic ramp-up.
>>
>> Jan
>>
>> --
>>
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://www.thetaphi.de
>>
>

Re: Review request - New Solr website

2021-03-02 Thread Alexandre Rafalovitch

Did we get to any consensus on Google Analytics (or other) tracking?
Would have been nice to track the moment of split.

Regards,
   Alex.

On Tue, 2 Mar 2021 at 16:44, Jan Høydahl  wrote:
>
> We fixed JavaDoc and RefGuide on Solr side 
> (https://issues.apache.org/jira/browse/SOLR-15177) and redirects on Lucene 
> side (https://issues.apache.org/jira/browse/SOLR-15171) -- hopefully.
>
> The Lucene redirects can be tested right now, e.g. 
> https://lucene-new.staged.apache.org/solr/guide/8_8/ redirects to 
> https://solr.apache.org/guide/8_8/ which currently does not exist but will 
> once we publish.
>
> I plan to depoy the new Solr site tomorrow, Wednesday. And then afterwards 
> the new Lucene site.
> After the deploy we can do rapid commits to fix whatever comes up.
>
> Feel free to continue to comment here, or simply commit fixes to the 
> main/solr and main/lucene branches at https://github.com/apache/lucene-site
>
> Jan
>
> 2. mar. 2021 kl. 20:16 skrev Uwe Schindler :
>
> Hi,
>
> Javadocs and refguide do not work in staging sites. They can only be seen on 
> production servers. We can’t test those, but I am confident, Jan’s new 
> htaccess rules will work (they point to the special subversion production 
> repo, still shared by lucene/solr for the time being).
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> From: Jan Høydahl 
> Sent: Tuesday, March 2, 2021 7:25 PM
> To: dev@lucene.apache.org
> Subject: Re: Review request - New Solr website
>
> Thanks for reviewing. Yes, the javadocs and refguide are not part of the site 
> git repo, but will materialize through .htaccess rules once the site is 
> published to the final location.
>
> Jan
>
>
> 2. mar. 2021 kl. 17:17 skrev Michael McCandless :
>
> Thank you Jan!  I clicked around a bit and it looks awesome, but I hit a 
> broken link for the Solr javadocs "Latest Release" link on this page: 
> https://lucene-solrtlp.staged.apache.org/resources.html -- likely this is not 
> an issue because
> once this is moved to the right location, that link should "just work" maybe?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Mar 2, 2021 at 10:32 AM Jan Høydahl  wrote:
>
> On the new "Project" page 
> (https://lucene-solrtlp.staged.apache.org/whoweare.html) I have attempted to 
> make some kind of TLP landing page, framing "what" and "who we are".
>
> The path I have taken so far is, instead of duplicating a list of names, 
> which will get out of date, instead link to official roster at 
> https://projects.apache.org/committee.html?solr for the authoritative list of 
> PMC and committers. It will steal away the opportunity for new committers to 
> have something to commit at day 1, but I can live with that.
>
> The missing piece is the "Emeritus" list. So let me throw out some questions:
> * The ASF does not operate with emeritus PMC or committers. Do we need to?
> * If we want to stick to a notion of "emeritus" committers, what should the 
> initial list of emeritus Solr committers be?
>
> Please also proof-read that entire page, it is brand new and English is not 
> my 1st language.
>
> Jan
>
> > 1. mar. 2021 kl. 09:56 skrev Jan Høydahl :
> >
> > Hi,
> >
> > I have been working on https://issues.apache.org/jira/browse/SOLR-14499 to 
> > prepare the separate website for Solr.
> > I believe the work is practically done, and would like a broader review 
> > before I actually publish the changes.
> >
> > The staging site which will eventually be solr.apache.org is at 
> > https://lucene-solrtlp.staged.apache.org/
> > The staging site which shows the lucene site without Solr is at 
> > https://lucene-new.staged.apache.org/
> >
> > Any feedback is welcome, here or in the JIRA issue. I intend to publish the 
> > new sites in a couple of days.
> >
> > Jan
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Review request - New Solr website

2021-03-01 Thread Alexandre Rafalovitch

> I'm not aware that Confluence is going away, is it?
Sorry, I meant our continual attempts to reduce its importance in
favour of other options and reduction of (out of date) pages it has.
But, if there are no better options...

Regards,
   Alex.

On Mon, 1 Mar 2021 at 17:30, Jan Høydahl  wrote:
>
> Thanks for feedback Alexandre.
>
> These sound like good proposals, but probably well suited as followup cleanup 
> after the site move.
> Or you can make a PR against the "main/solr" branch if you want to merge it 
> in from day one.
>

>
> Jam
>
> > 1. mar. 2021 kl. 23:16 skrev Alexandre Rafalovitch :
> >
> > Looks good.
> >
> > I wonder if there is a way to bring the Reference Guide more
> > prominently to the home page. Maybe even in the top "Learn More"
> > section or even a section of its own right after.
> >
> > I also wonder if the book section is so out-of-date (in terms of Solr)
> > that it should retreat to the Documentation page.
> >
> > We are also linking to CWiki still in the Social Proof section (Visit
> > Solr's Public Servers listing page to learn more), I wonder if we
> > should eliminate that link, at least from the home page as part of
> > CWiki purge. On the other hand, that seems like a useful page and not
> > completely out of date.
> >
> > Regards,
> >   Alex.
> >
> > On Mon, 1 Mar 2021 at 03:56, Jan Høydahl  wrote:
> >>
> >> Hi,
> >>
> >> I have been working on https://issues.apache.org/jira/browse/SOLR-14499 to 
> >> prepare the separate website for Solr.
> >> I believe the work is practically done, and would like a broader review 
> >> before I actually publish the changes.
> >>
> >> The staging site which will eventually be solr.apache.org is at 
> >> https://lucene-solrtlp.staged.apache.org/
> >> The staging site which shows the lucene site without Solr is at 
> >> https://lucene-new.staged.apache.org/
> >>
> >> Any feedback is welcome, here or in the JIRA issue. I intend to publish 
> >> the new sites in a couple of days.
> >>
> >> Jan
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Review request - New Solr website

2021-03-01 Thread Alexandre Rafalovitch

Looks good.

I wonder if there is a way to bring the Reference Guide more
prominently to the home page. Maybe even in the top "Learn More"
section or even a section of its own right after.

I also wonder if the book section is so out-of-date (in terms of Solr)
that it should retreat to the Documentation page.

We are also linking to CWiki still in the Social Proof section (Visit
Solr's Public Servers listing page to learn more), I wonder if we
should eliminate that link, at least from the home page as part of
CWiki purge. On the other hand, that seems like a useful page and not
completely out of date.

Regards,
   Alex.

On Mon, 1 Mar 2021 at 03:56, Jan Høydahl  wrote:
>
> Hi,
>
> I have been working on https://issues.apache.org/jira/browse/SOLR-14499 to 
> prepare the separate website for Solr.
> I believe the work is practically done, and would like a broader review 
> before I actually publish the changes.
>
> The staging site which will eventually be solr.apache.org is at 
> https://lucene-solrtlp.staged.apache.org/
> The staging site which shows the lucene site without Solr is at 
> https://lucene-new.staged.apache.org/
>
> Any feedback is welcome, here or in the JIRA issue. I intend to publish the 
> new sites in a couple of days.
>
> Jan
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: JIRA issues to close?

2021-02-18 Thread Alexandre Rafalovitch

You may enjoy using JiraSearch for this kinds of things, it has a very
nice "Updated Ago >x", and "Last comment user" facet and 'shift-click'
multi-select on status:
http://jirasearch.mikemccandless.com/

For the bulk changes, from the screen you are at, you click on Tools
near top-right area of the screen and select "screen" or "1000" and
that's the beginning of the process. Makes sure to unselect "notify
everybody" box on the last (2nd last?) screen or a lot of people will
be upset.

But also, we had less than fantastic history with bulk closing (had to
reopen/revert). So, it may be better to focus on a very small, clearly
"done" subset and/or leave quick notes "can this be closed" with a
follow-up to close.

Regards,
   Alex.

On Thu, 18 Feb 2021 at 14:43, Eric Pugh  wrote:
>
> I’m still learning my way around JIRA, and I was looking for a Solr JIRA that 
> I swear I had seen open recently.   In trying to query for open/active JIRAs, 
> I crafted this query, which returned a lot of tickets that I *think* should 
> be CLOSED at this point:
>
> https://issues.apache.org/jira/browse/SOLR-13394?jql=project%20%3D%20SOLR%20AND%20status%20in%20(Open%2C%20Reopened%2C%20Resolved%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20in%20releasedVersions()
>
> There are 1102 tickets that appear to be old dealt with tickets and should be 
> CLOSED.
>
> I wanted to point that out, and see if someone could move them to CLOSED, or 
> point me to how to do the Bulk Close that I’ve seen happen periodically over 
> the years.
>
> Eric
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com | My Free/Busy
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Confusion over the term "paramsets" and "Request Parameters API" in Ref Guide

2021-02-08 Thread Alexandre Rafalovitch

So, we may have a problem with the way we use "configset" too. We
are using it to mean a template from which a new collection can be
created (default, techproducts, etc).

But also, if I remember correctly, we allow to share configuration
between live cores using configset parameter in core.properties. And
that is not a template use, because both cores may modify the same
files through API (oops). Though, now that I look at the
documentation, we seem to have collapsed those two. This collapse may
actually be wrong
https://lucene.apache.org/solr/guide/8_8/defining-core-properties.html

For paramsets, the cool thing is that you can refer to them for update
operations as well. So, they could be used for A/B indexing tests,
etc.

Regards,
   Alex.

On Mon, 8 Feb 2021 at 13:03, Erik Hatcher  wrote:
>
>
>
> > On Feb 6, 2021, at 9:48 AM, Eric Pugh  
> > wrote:
> >
> >  “paramsets”,  which I think is a really powerful feature that most people 
> > don’t know about.
>
> Agreed on that!   (see the old example/files for my early paramset usage as I 
> explored that cool capability)
>
> > I’m thinking that we rename request-parameters-api.adoc —> paramsets.adoc, 
> > and rewrite the page to highlight that this feature is called “paramset”, 
> > in the same way we use the term “configset”.
>
> +1
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Old programmers do fade away

2020-12-30 Thread Alexandre Rafalovitch

Erick,

I kept hoping to meet again at a future conference and have an
extended version of the talk we had the first time we met. It was very
valuable but I felt I only got a glimpse of what was possible.
Perhaps, one day, I can travel near your actual "nest" and buy you a
beer or two and listen to the true war stories at the search coalface.

Until then, I wish you luck with the furry rats. We have some in our
backyard, but since we haven't - yet - started growing things, I view
them with amusement rather than anger. But, next summer, I will
probably follow your steps too. I wonder if Tesla coils are more DIY
then the laser-guided heat rays. Though heat rays on lower settings
could be quite nice in Montreal winter, I am sure.

Regards,
   Alex.
P.s. Good old days! When one had to tell the disassembler that the
next instruction was "probably" a start of the string as one tried to
hack Xonix and Arkanoid levels!
P.p.s. If you need any listening material while you garden, I suspect
you will enjoy the hardware/software discussions podcast: "On the
metal" https://oxide.computer/podcast/

On Wed, 30 Dec 2020 at 09:09, Erick Erickson  wrote:
>
> 40 years is enough. OK, it's only been 39 1/2 years. Dear Lord, has it really 
> been that long? Programming's been fun, I've gotten to solve puzzles every 
> day. The art and science of programming has changed over that time. Let me 
> tell you about the joys of debugging with a Z80 stack emulator that required 
> that you to look on the stack for variables and trace function calls by 
> knowing how to follow frame pointers. Oh the tedium! Oh the (lack of) speed! 
> Not to mention that 64K of memory was all you had to work with. I had a 
> co-worker who could predict the number of bytes by which the program would 
> shrink based on extracting common code to functions. The "good old 
> days"...weren't...
>
> I'd been thinking that I'd treat Lucene/Solr as a hobby, doing occasional 
> work on it when I was bored over long winter nights. I've discovered, though, 
> that I've been increasingly reluctant to crack open the code. I guess that 
> after this much time, I'm ready to hang up my spurs. One major factor is the 
> realization that there's so much going on with Lucene/Solr that simply being 
> aware of the changes, much less trying to really understand them, isn't 
> something I can do casually.
>
> I bought a welder and find myself more interested in playing with that than 
> programming. Wait until you see the squirrel-proof garden enclosure I'm 
> building with it. If my initial plan doesn't work, next up is an electric 
> fence along the top. The laser-sighted automatic machine gun emplacement will 
> take more planning...Ahhh, probably won't be able to get a permit from the 
> township for that though. Do you think the police would notice? Perhaps I 
> should add that the local police station is two blocks away and in the line 
> of fire. But an infrared laser powerful enough to "pre-cook" them wouldn't be 
> as obvious would it?
>
> Why am I so fixated on squirrels? One of the joys of gardening is fresh 
> tomatoes rather than those red things they sell in the store. The squirrels 
> ATE EVERY ONE OF MY TOMATOES WHILE THEY WERE STILL GREEN LAST YEAR! And the 
> melons. In the words of B. Bunny: "Of course you realize this means war" 
> (https://www.youtube.com/watch?v=4XNr-BQgpd0)...
>
> Then there's working in the garden and landscaping, the desk I want to build 
> for my wife, travel as soon as I can, maybe seeing if some sailboats need 
> crew...you get the idea.
>
> It's been a privilege to work with this group, you're some of the best and 
> brightest. Many thanks to all who've generously given me their time and 
> guidance. It's been a constant source of amazement to me how willing people 
> are to take time out of their own life and work to help me when I've had 
> questions. I owe a lot of people beers ;)
>
> I'll be stopping my list subscriptions, Slack channels (dm me if you need 
> something), un-assigning any JIRAs and that kind of thing over the next 
> while. If anyone's interested in taking over the BadApple report, let me know 
> and I can put the code up somewhere. It takes about 10 minutes to do each 
> week. I won't disappear entirely, things like the code-reformatting effort 
> are nicely self-contained for instance and something I can to casually.
>
> My e-mail address if you need to get in touch with me is: 
> "erick.erick...@gmail.com". There's a correlation between gmail addresses 
> that are just a name with no numbers and a person's age... A co-worker came 
> over to my desk in pre-historical times and said "there's this new mail 
> service you might want to sign up for"... Like I said, 40 years is enough.
>
> Best to all,
> Erick
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

Re: SOLR: Why do we have a CHANGES.txt/md to maintain?

2020-11-28 Thread Alexandre Rafalovitch

It is a kind of a side note, but server-based Jira product is going away
soon-ish.

I hope somebody at Apache has a plan forward. Especially since cloud Jira
is apparently much worse right now.

Regards,
   Alex

On Sun., Nov. 29, 2020, 12:32 a.m. David Smiley,  wrote:

> After recently proposing per-module CHANGES.md... I think I'd actually
> rather not have any CHANGES file at all to maintain.  I'd rather go to JIRA
> with a bit better hygiene for metadata like components==contrib/module, and
> have some convenient links sprinkled about so that it's a convenient click
> away from each module.  This proposal may not be as compelling for Lucene
> which has no solr-upgrade-notes.adoc file.
>
> Maintaining this CHANGES file (or files) is a pain.  Formatting it just-so
> & conversion to HTML & other scripts manipulating it in dev-tools (e.g. add
> version), and branch syncing.  It's commonly a source of merge conflicts
> more than any other file.  It's an annoying step with GitHub PRs in
> particular.  Why do we bother?  Instead, on releases, provide a JIRA link
> to display all fixed issues grouped by issue type.  We could export it to a
> file for direct inclusion in the distribution.  JIRA even has a feature for
> this -- here's a direct link for 8.7:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230=12348463
>  Notice the HTML version at the bottom.  It could be dumped into the
> release binaries.
> Issue summaries tend to be much shorter than CHANGES.txt bullets but I
> think that's okay because it's not the only information available for those
> who want to know more.  Remember there is also all the other metadata in
> JIRA a user can examine, there are commit messages, sometimes PRs, and
> there's solr-upgrade-notes.adoc which ought to be the starting point for
> someone interested in a release.
>
> It's been argued that contributors should get attribution here but we
> could maintain a separate contributors file to acknowledge people by name
> for inclusion with the Solr distribution -- one that has a link to JIRA and
> GitHub even.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>

Re: Solr: Separate CHANGES.txt for Docker, SolrJ, Contribs, ...

2020-11-24 Thread Alexandre Rafalovitch

Absolutely.

What I was trying to say is that when it comes to implementation,
there may be a choice of strategies to do so within the same scope. A
strategy that aligns better with something that - with more work -
eventually becomes true structured data may have more long-term value
than a strategy that does not.

Hope that makes sense.

Regards,
   Alex.

On Tue, 24 Nov 2020 at 12:13, David Smiley  wrote:
>
> I'd rather not scope-creep my proposal here further.  Granted I ventured into 
> TXT -> Markdown.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Nov 24, 2020 at 9:37 AM Alexandre Rafalovitch  
> wrote:
>>
>> And - afterthought - if there is an easily parsable format, the parser
>> could even run at the commit time on GitHub to make sure that issue
>> numbers are correct, names are included and formatting is not broken.
>>
>> Regards,
>>Alex.
>>
>> On Mon, 23 Nov 2020 at 19:38, Alexandre Rafalovitch  
>> wrote:
>> >
>> > Should we switch to a structured format, instead of current format that 
>> > tools struggle to convert.
>> >
>> > Something that one could push into Solr would have been nice...
>> >
>> > Regards,
>> >  Alex
>> >
>> > On Mon., Nov. 23, 2020, 4:47 p.m. David Smiley,  wrote:
>> >>
>> >> I pushed a commit to a PR for the prometheus exporter that includes a 
>> >> CHANGES.md
>> >> https://github.com/apache/lucene-solr/pull/1972/commits/bec84ce2a1d60480ce0c54b78e83a70f83e7b058
>> >> and likewise for a commit to a PR for the docker module:
>> >> https://github.com/apache/lucene-solr/pull/2083/commits/540f8117153d12bd13441326035820f97084878a
>> >>
>> >> * I chose the Markdown format.  This is an opportune time to switch.  
>> >> This meant changing " 9.0 " to "9.0" then "==" beneath it, 
>> >> but otherwise, no changes!
>> >> * I chose to start this for 9.0.  Any changes prior to 9.0 I think should 
>> >> continue to do things as we have been doing things historically.
>> >> * I considered updating dev-tools/scripts/addVersion.py but ultimately 
>> >> elected not to.  I think the rate of changes in each module will be low 
>> >> enough that it's not a big deal to maintain it manually.  Plus, I confess 
>> >> I'm less motivated to touch Python ;-) but I'd be more than happy to see 
>> >> someone automate this.
>> >>
>> >> If this is agreeable, Solr's master CHANGES.txt ought to have references 
>> >> to CHANGES.md for contribs & Docker.
>> >>
>> >> ~ David Smiley
>> >> Apache Lucene/Solr Search Developer
>> >> http://www.linkedin.com/in/davidwsmiley
>> >>
>> >>
>> >> On Mon, Nov 23, 2020 at 11:56 AM Houston Putman  
>> >> wrote:
>> >>>
>> >>> +1
>> >>>
>> >>> I think that having separate CHANGES.txt files for the different parts 
>> >>> of Solr would be great. If you are looking for certain changes you would 
>> >>> generally know which module to go to.
>> >>>
>> >>>> Some items that have a more sweeping impact would be listed in both
>> >>>
>> >>>
>> >>> I am ambivalent on having a separate CHANGES.txt for SolrJ, as long as 
>> >>> major changes are included in the main CHANGES.txt. In general it's easy 
>> >>> to add an entry to every applicable CHANGES.txt, no matter which module 
>> >>> the change was made in.
>> >>>
>> >>> - Houston
>> >>>
>> >>> On Sat, Nov 21, 2020 at 1:34 AM David Smiley  wrote:
>> >>>>
>> >>>> What of Docker changes?  And beyond direct changes to Dockerfile + 
>> >>>> scripts, it could feature particular notable changes to the server that 
>> >>>> are particularly noteworthy... like hypothetical improvements to solr 
>> >>>> home / core root dir etc. configuration.
>> >>>>
>> >>>> Even if Contribs/Modules are not separated out of the repo *yet* (even 
>> >>>> if they hypothetically never leave), I think it's desirable to separate 
>> >>>> their CHANGES.txt in master now.
>> >>>>
>> >>>> RE SolrJ -- I know it's used heavily in the server side; this one is 
>> >

Re: Solr: Separate CHANGES.txt for Docker, SolrJ, Contribs, ...

2020-11-24 Thread Alexandre Rafalovitch

And - afterthought - if there is an easily parsable format, the parser
could even run at the commit time on GitHub to make sure that issue
numbers are correct, names are included and formatting is not broken.

Regards,
   Alex.

On Mon, 23 Nov 2020 at 19:38, Alexandre Rafalovitch  wrote:
>
> Should we switch to a structured format, instead of current format that tools 
> struggle to convert.
>
> Something that one could push into Solr would have been nice...
>
> Regards,
>  Alex
>
> On Mon., Nov. 23, 2020, 4:47 p.m. David Smiley,  wrote:
>>
>> I pushed a commit to a PR for the prometheus exporter that includes a 
>> CHANGES.md
>> https://github.com/apache/lucene-solr/pull/1972/commits/bec84ce2a1d60480ce0c54b78e83a70f83e7b058
>> and likewise for a commit to a PR for the docker module:
>> https://github.com/apache/lucene-solr/pull/2083/commits/540f8117153d12bd13441326035820f97084878a
>>
>> * I chose the Markdown format.  This is an opportune time to switch.  This 
>> meant changing " 9.0 " to "9.0" then "==" beneath it, but 
>> otherwise, no changes!
>> * I chose to start this for 9.0.  Any changes prior to 9.0 I think should 
>> continue to do things as we have been doing things historically.
>> * I considered updating dev-tools/scripts/addVersion.py but ultimately 
>> elected not to.  I think the rate of changes in each module will be low 
>> enough that it's not a big deal to maintain it manually.  Plus, I confess 
>> I'm less motivated to touch Python ;-) but I'd be more than happy to see 
>> someone automate this.
>>
>> If this is agreeable, Solr's master CHANGES.txt ought to have references to 
>> CHANGES.md for contribs & Docker.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Mon, Nov 23, 2020 at 11:56 AM Houston Putman  
>> wrote:
>>>
>>> +1
>>>
>>> I think that having separate CHANGES.txt files for the different parts of 
>>> Solr would be great. If you are looking for certain changes you would 
>>> generally know which module to go to.
>>>
>>>> Some items that have a more sweeping impact would be listed in both
>>>
>>>
>>> I am ambivalent on having a separate CHANGES.txt for SolrJ, as long as 
>>> major changes are included in the main CHANGES.txt. In general it's easy to 
>>> add an entry to every applicable CHANGES.txt, no matter which module the 
>>> change was made in.
>>>
>>> - Houston
>>>
>>> On Sat, Nov 21, 2020 at 1:34 AM David Smiley  wrote:
>>>>
>>>> What of Docker changes?  And beyond direct changes to Dockerfile + 
>>>> scripts, it could feature particular notable changes to the server that 
>>>> are particularly noteworthy... like hypothetical improvements to solr home 
>>>> / core root dir etc. configuration.
>>>>
>>>> Even if Contribs/Modules are not separated out of the repo *yet* (even if 
>>>> they hypothetically never leave), I think it's desirable to separate their 
>>>> CHANGES.txt in master now.
>>>>
>>>> RE SolrJ -- I know it's used heavily in the server side; this one is more 
>>>> debatable than the others and I don't have a strong opinion.  Some items 
>>>> that have a more sweeping impact (e.g. HTTP2) would be listed in both but 
>>>> the difference is that the SolrJ side would have a more user-facing 
>>>> purpose, mentioning SolrClient subclasses that are pertinent to draw 
>>>> attention to compatibility or new classes users should know about.  This 
>>>> kind of stuff is maybe too detailed to bother putting in 
>>>> solr-upgrade-notes.adoc but would not be to SolrJ's dedicated CHANGES.txt. 
>>>>  On server CHANGES.txt, we tend to be vague.  If SolrJ is changed for 
>>>> something that has more to do with server-side (e.g. SOLR-14691 "Metrics 
>>>> Reporting Should Avoid Creating Objects" which changed some utils in 
>>>> SolrJ), then it ought not to be listed in SolrJ's proposed CHANGES.txt.  
>>>> Admittedly there may be more cumulative CHANGES.txt maintenance between 
>>>> the two.
>>>>
>>>> ~ David Smiley
>>>> Apache Lucene/Solr Search Developer
>>>> http://www.linkedin.com/in/davidwsmiley
>>>>
>>>>
>>>> On Fri, Nov 20, 2020 at 9:17 PM Ishan Chattopadhyaya 
>>>>  wrote:
>>>>>
>>&g

Re: Solr: Separate CHANGES.txt for Docker, SolrJ, Contribs, ...

2020-11-23 Thread Alexandre Rafalovitch

Should we switch to a structured format, instead of current format that
tools struggle to convert.

Something that one could push into Solr would have been nice...

Regards,
 Alex

On Mon., Nov. 23, 2020, 4:47 p.m. David Smiley,  wrote:

> I pushed a commit to a PR for the prometheus exporter that includes a
> CHANGES.md
>
> https://github.com/apache/lucene-solr/pull/1972/commits/bec84ce2a1d60480ce0c54b78e83a70f83e7b058
> and likewise for a commit to a PR for the docker module:
>
> https://github.com/apache/lucene-solr/pull/2083/commits/540f8117153d12bd13441326035820f97084878a
>
> * I chose the Markdown format.  This is an opportune time to switch.  This
> meant changing " 9.0 " to "9.0" then "==" beneath it, but
> otherwise, no changes!
> * I chose to start this for 9.0.  Any changes prior to 9.0 I think should
> continue to do things as we have been doing things historically.
> * I considered updating dev-tools/scripts/addVersion.py but ultimately
> elected not to.  I think the rate of changes in each module will be low
> enough that it's not a big deal to maintain it manually.  Plus, I confess
> I'm less motivated to touch Python ;-) but I'd be more than happy to see
> someone automate this.
>
> If this is agreeable, Solr's master CHANGES.txt ought to have references
> to CHANGES.md for contribs & Docker.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Nov 23, 2020 at 11:56 AM Houston Putman 
> wrote:
>
>> +1
>>
>> I think that having separate CHANGES.txt files for the different parts of
>> Solr would be great. If you are looking for certain changes you would
>> generally know which module to go to.
>>
>> Some items that have a more sweeping impact would be listed in both
>>
>>
>> I am ambivalent on having a separate CHANGES.txt for SolrJ, as long as
>> major changes are included in the main CHANGES.txt. In general it's easy to
>> add an entry to every applicable CHANGES.txt, no matter which module the
>> change was made in.
>>
>> - Houston
>>
>> On Sat, Nov 21, 2020 at 1:34 AM David Smiley  wrote:
>>
>>> What of Docker changes?  And beyond direct changes to Dockerfile +
>>> scripts, it could feature particular notable changes to the server that are
>>> particularly noteworthy... like hypothetical improvements to solr home /
>>> core root dir etc. configuration.
>>>
>>> Even if Contribs/Modules are not separated out of the repo *yet* (even
>>> if they hypothetically never leave), I think it's desirable to separate
>>> their CHANGES.txt in master now.
>>>
>>> RE SolrJ -- I know it's used heavily in the server side; this one is
>>> more debatable than the others and I don't have a strong opinion.  Some
>>> items that have a more sweeping impact (e.g. HTTP2) would be listed in both
>>> but the difference is that the SolrJ side would have a more user-facing
>>> purpose, mentioning SolrClient subclasses that are pertinent to draw
>>> attention to compatibility or new classes users should know about.  This
>>> kind of stuff is maybe too detailed to bother putting in
>>> solr-upgrade-notes.adoc but would not be to SolrJ's dedicated CHANGES.txt.
>>> On server CHANGES.txt, we tend to be vague.  If SolrJ is changed for
>>> something that has more to do with server-side (e.g. SOLR-14691 "Metrics
>>> Reporting Should Avoid Creating Objects" which changed some utils in
>>> SolrJ), then it ought not to be listed in SolrJ's proposed CHANGES.txt.
>>> Admittedly there may be more cumulative CHANGES.txt maintenance between the
>>> two.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Fri, Nov 20, 2020 at 9:17 PM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
 I think whatever we don't ship in the main tarball today should stay
 separate. Going forward, when we stop shoving the extra modules (contribs)
 into the main distro, we can separate out their changelogs. However, I feel
 SolrJ changes should stay with Solr changes since it is also used heavily
 in the server side.

 On Sat, 21 Nov, 2020, 3:39 am David Smiley,  wrote:

> I was about to merge a PR pertaining to Solr's new Docker module when
> it occurred to me that I ought to add a CHANGES.txt entry.  But, for Solr
> users (which includes me and everyone reading this), it's annoying to have
> to go to Solr's all-encompassing CHANGES.txt to find Docker upgrade
> notes, which is a distinct way of running Solr.  I think the same could be
> said for our contribs, and perhaps even SolrJ, which is another distinct
> consumable.  The idea of separated CHANGES.txt aligns well with contribs
> being further isolated (see both the discussion on separate git repos for
> them, and also the discussion of getting rid of "dist" (each contrib's jar
> goes in its own folder; keeps to itself)).
>
> Solr's root /CHANGES.txt could at the

Re: Welcome Julie Tibshirani as Lucene/Solr committer

2020-11-18 Thread Alexandre Rafalovitch

Juliet from the house of Elasticsearch meets a interesting,
relevancy-aware  committer from the house of Solr.

Such a romantic beginning. Not sure I want to know the end of that
heroine's journey.

:-)

On Wed., Nov. 18, 2020, 12:59 p.m. Dawid Weiss, 
wrote:

>
> Congratulations and welcome, Julie.
>
> I think juliet is not a bad nick at all, you just need to who -all | grep
> "romeo"... :)
>
> Dawid
>
> On Wed, Nov 18, 2020 at 4:08 PM Michael Sokolov 
> wrote:
>
>> I'm pleased to announce that Julie Tibshirani has accepted the PMC's
>> invitation to become a committer.
>>
>> Julie, the tradition is that new committers introduce themselves with
>> a brief bio.
>>
>> I think we may still be sorting out the details of your Apache account
>> (julie@ may have been taken?), but as soon as that has been sorted out
>>  and karma has been granted, you can use your new powers to add
>> yourself to the committers section of the Who We Are page on the
>> website: 
>>
>> Congratulations and welcome!
>>
>> Mike Sokolov
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Re: Welcome Julie Tibshirani as Lucene/Solr committer

2020-11-18 Thread Alexandre Rafalovitch

Congratulations Julie and welcome,

I guess you will have to do a follow-up post to your "Finding a home"
entry now :-) 
https://www.elastic.co/blog/culture-finding-a-home-and-career-in-the-open-source-community

Regards,
   Alex.

On Wed, 18 Nov 2020 at 10:07, Michael Sokolov  wrote:
>
> I'm pleased to announce that Julie Tibshirani has accepted the PMC's
> invitation to become a committer.
>
> Julie, the tradition is that new committers introduce themselves with
> a brief bio.
>
> I think we may still be sorting out the details of your Apache account
> (julie@ may have been taken?), but as soon as that has been sorted out
>  and karma has been granted, you can use your new powers to add
> yourself to the committers section of the Who We Are page on the
> website: 
>
> Congratulations and welcome!
>
> Mike Sokolov
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [DISCUSS] Solr Operator grant to Apache Lucene

2020-10-23 Thread Alexandre Rafalovitch

What would this practically look like if it is adopted/accepted? Given
the Lucene and Solr separating as an additional wrinkle.

I assume this is a donation to Solr project, so it will be an
apache/solr-operator project, similar to how it is currently
lucene-solr? Would the committers be the same or is it kind of a new
set? Or more like a first-party package?

How does this interplay with Docker that IS in-project?

I am neither plus nor minus on this, just putting the questions that I
feel would benefit being clarified.

Regards,
   Alex.

On Fri, 23 Oct 2020 at 03:32, Anshum Gupta  wrote:
>
> Hi everyone,
>
> Recently, Bloomberg reached out to us to donate the Solr Operator[1] codebase 
> to the Apache Lucene project.
>
> Built on the Kube Builder framework, Solr Operator would help in 
> standardizing the way SolrCloud clusters are managed in Kubernetes. This will 
> allow the community to converge and share best practices around managing 
> SolrCloud in k8s world.
>
> The PMC has spent the last few weeks discussing the merits and concerns 
> around the grant and intends to move forward with it unless there are 
> concerns that the community has around it.
>
> Thanks to Tim, here’s a detailed document around the design of Solr Operator, 
> this should answer most questions around the technicality of the project - 
> https://docs.google.com/document/d/1uQiJcE7kW5c6iEl9zG1Ve9MTEUGY7OHnMHX8PuTqpY8/edit?usp=sharing
>
> I’d also like to summarize the PMC discussions to help reduce repeating 
> walking down the same path.
>
> Q: Why is having an operator important for the project?
> A: In todays’ world of cloud native technologies, Kubernetes is an essential 
> part of most modern platforms. A Kubernetes Operator allows the users of 
> Apache Solr to deploy SolrCloud clusters on k8s while allowing the people who 
> understand the system, to codify our collective knowledge about how SolrCloud 
> should be operated.
>
> Q: Do we want to maintain the Kubernetes operator as part of the Apache 
> Lucene project?
> A: Yes, the operator will become an essential part of Solr as K8s adoption 
> grows. Instead of pointing users at third party documentation and supporting 
> projects, it would be good to have something that is supported by the Solr 
> community. Also, as a separate repository, with a release cadence that 
> doesn’t restrict Lucene/Solr releases, the Kubernetes Operator will create a 
> lot of value for users.
>
> Q: Have we reviewed the design of the operator before accepting the grant?
> A: The project has a lot of commits from Houston, who’s an existing 
> committer. Also, Tim (Timothy Potter) has gone through the code and has PRs. 
> His document above also provides a lot of insight into the operator for the 
> rest of us. Overall, the code seems good and the code is in reasonable shape 
> to be accepted and improved.
>
> Q: Should we allow the Operator to be incubated as its own project instead? 
> If not, why?
> A: This was considered, but after discussing the pros and cons around having 
> the operator come in via the incubator, it was decided otherwise.
>
> Q: This is written in a different language i.e. Go. How do we handle that? 
> Can we not find something in Java instead ?
> A: Go is the de-facto language for Kubernetes. We would not get the same 
> amount of tooling and  support for Kubernetes in Java as Go. As this is the 
> right language to move forward with the operator, all of us running SolrCloud 
> in K8s will be learning and working with it anyways. We will also certainly 
> get questions around it from users, and it makes sense for us to lead that 
> instead of catch up. This way we will also attract more contributors who know 
> Go and Kubernetes in the future.  Most importantly, a separate repository 
> will allow us to keep things easy to manage.
>
> Q: What about the operator release cadence?
> A: The operator and Lucene/Solr would have independent release cadence.
>
> We would like to give the community a week i.e. until 30th of October, 2020 
> to discuss this so the PMC can make an informed decision.
>
> Looking forward to a healthy discussion.
>
> [1] https://github.com/bloomberg/solr-operator
>
> --
> Anshum Gupta

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Automatically delete merged branches (by Github)?

2020-10-19 Thread Alexandre Rafalovitch

On Mon, 19 Oct 2020 at 14:24, Uwe Schindler  wrote:

> Nice possibility on top: You can force push as often as you like! ☕藍
>

Up until the point you create a PR. After that, I think it gets confused
and doubles the entries. Unless I messed up one of my PRs even worse than I
thought.

+1 on doing changes in the personal fork of the repo.

Regards,
   Alex.

Re: Solr Alpha (EA) release of Reference Branch

2020-10-04 Thread Alexandre Rafalovitch

(The comments below are in the context of +1 of getting this working out.)

When we say "let users try", do we mean actual public with a release
published on our website?

Because I can see the publishing of version 10, however it is tagged
(alpha, whatever), completely confusing people about the upcoming 9
version and causing an adoption delay. Especially combined with
cleanups that we already put in 9.0. Maybe we could release it to
committers community first and dogfood it "internally"?

And if the issue with naming it 'not 10.x' is one piece of code
(package manager), maybe we can one-off patch that instead. Or hack
the version to be something ridiculous like 42 (the answer to
everything...) instead of something that is psychologically feasible.
I recall this dual version confusion happening before in other
communities and it really messed things up. Python is a recent
example, but I seem to recall other similar events for
products/communities that no longer exist (hopefully for other
reasons).

And yes, all the questions of forward-porting are there as well,
if/once this succeeds.

Regards,
   Alex.

Regards,
   Alex.

On Sun, 4 Oct 2020 at 11:34, Varun Thacker  wrote:
>
> Hi Ishan,
>
> Let's say Solr 10 ( or whatever name gets picked ) turns out stable enough in 
> the alpha phase - What would the next step be?
>
> Would we bring back all the changes to master? Do you have a sense into how 
> that would end up playing out? Could it be brought in chunks or would it have 
> to be wholesale ?
>
> Also do you know what features in the reference branch have been removed 
> because they were unstable ? Finding out the features/bug-fixes in master 
> that haven't made it to the reference branch would be easier to find out.
>
> On Sat, Oct 3, 2020 at 10:17 PM Ishan Chattopadhyaya 
>  wrote:
>>
>> Erick, I'll answer your questions shortly.
>>
>> On Sun, 4 Oct, 2020, 10:33 am Ishan Chattopadhyaya, 
>>  wrote:
>>>
>>> Agree, Noble. Let's not worry about the naming too much. We can discuss 
>>> that later as well, or in a separate thread.
>>>
>>> On Sun, 4 Oct, 2020, 10:06 am Noble Paul,  wrote:

 +1 Ishan

 It's important that the branch gets some real world testing and
 feedback. At this point we cannot be 100% sure about the stability of
 that branch to port all the changes to master.

 Users don't care what is Solr 9/Solr 10  or even Mark's Solr or even a
 "Crazy Solr". As long as all the tests pass and they can do an upgrade
 of their existing cluster to that release,that IS Solr. I think we do
 not need to worry too much about it now. If/when we reach a point
 where we have a new stable release of Solr that is 100% compatible
 with our other branch, we can resume this discussion.

 As Ilan said, we may get real feedback from our users deploying it on
 production scale but non critical deployments. Our JUnit tests are not
 good enough to uncover stability issues.

 Let's focus on making all the tests pass and get this to the hands of our 
 users.

 On Sun, Oct 4, 2020 at 8:01 AM Uwe Schindler  wrote:
 >
 > Is the branch ready for Jenkins testing?
 >
 > If yes and "gradlew check" works, I really would like to set it up.
 >
 > Uwe
 >
 > Am October 3, 2020 7:42:22 PM UTC schrieb Ishan Chattopadhyaya 
 > :
 >>
 >> Hi Devs,
 >>
 >> As you might be aware, the reference_impl branch has a lot of 
 >> improvements that we want to see in Solr master. However, it is 
 >> currently a large deviation from master and hence the stability and 
 >> reliability (though improved in certain aspects) remains to be tested 
 >> in real production environments before we gain confidence in bringing 
 >> those changes to master.
 >>
 >> I propose that we do a one off preview release from that branch, say 
 >> Solr 10 alpha (early access) or any other name that someone suggests, 
 >> so that users could try it out and report regressions or improvements 
 >> etc.
 >>
 >> I volunteer to be the RM and planning to start the process around 1 
 >> December-15 December timeframe. Until then, we can tighten the loose 
 >> ends on the branch and plan for such a release.
 >>
 >> Is there any thoughts, concerns, questions?
 >>
 >> Regards,
 >> Ishan
 >
 >
 > --
 > Uwe Schindler
 > Achterdiek 19, 28357 Bremen
 > https://www.thetaphi.de

 --
 -
 Noble Paul

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

2020-10-03 Thread Alexandre Rafalovitch

I did not create a ticket (got distracted). Feel free to make one and
add me to watchers. I will be happy to test it with my dataset.

Thanks,
   Alex.

On Sat, 3 Oct 2020 at 15:23, Bar Rotstein  wrote:
>
> Hey,
> Was a ticket opened?
>
> I'd gladly tackle that one if it hasn't been assigned yet.
>
> Thanks in advance,
> Bar
> On Fri, Oct 2, 2020 at 3:13 PM David Smiley  wrote:
>>
>> I think that's a bug!  Good catch!
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Thu, Oct 1, 2020 at 11:38 PM Alexandre Rafalovitch  
>> wrote:
>>>
>>> I am indexing a deeply nested structure and am trying to return it
>>> with fl=*,[child].
>>>
>>> And it is supposed to have 5 children under the top element but
>>> returns only 4. Two hours of debugging later, I realize that the
>>> "limit" parameter is set to 10 by default and that 10 seems to be
>>> counting children at ANY level. And calculating them depth-first. So,
>>> it was quite unobvious to discover when the children suddenly stopped
>>> showing up.
>>>
>>> The documentation says:
>>> > The maximum number of child documents to be returned per parent document. 
>>> > > The default is `10`.
>>>
>>> So, is that (all nested children included in limit) what we actually
>>> mean? Or did we mean maximum number of "immediate children" for any
>>> specific document/level and the code is wrong?
>>>
>>> I can update the doc to clarify the results, but I don't know whether
>>> I am looking at the bug or the feature.
>>>
>>> Regards,
>>>Alex.
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

2020-10-01 Thread Alexandre Rafalovitch

I am indexing a deeply nested structure and am trying to return it
with fl=*,[child].

And it is supposed to have 5 children under the top element but
returns only 4. Two hours of debugging later, I realize that the
"limit" parameter is set to 10 by default and that 10 seems to be
counting children at ANY level. And calculating them depth-first. So,
it was quite unobvious to discover when the children suddenly stopped
showing up.

The documentation says:
> The maximum number of child documents to be returned per parent document. > 
> The default is `10`.

So, is that (all nested children included in limit) what we actually
mean? Or did we mean maximum number of "immediate children" for any
specific document/level and the code is wrong?

I can update the doc to clarify the results, but I don't know whether
I am looking at the bug or the feature.

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Filestore, subsume userfiles requirement

2020-09-25 Thread Alexandre Rafalovitch

We are talking specifically about 'cat', I think. Nothing else
accesses that directory:
https://lucene.apache.org/solr/guide/8_6/stream-source-reference.html#cat

Seems standalone-ready. Very file-centric in fact. Makes me wonder if
it is SolrCloud ready

Regards,
   Alex.

On Fri, 25 Sep 2020 at 12:03, Jason Gerlowski  wrote:
>
> I think some expressions are SolrCloud specific, but there are
> expressions that work in standalone
>
> e.g. https://paste.apache.org/zm3sw and https://paste.apache.org/vs35m
>
> On Fri, Sep 25, 2020 at 11:53 AM Eric Pugh
>  wrote:
> >
> > Can you do streaming expression in standalone solr?   I was under the 
> > impression that it required SolrCloud.
> >
> >
> > On Sep 25, 2020, at 11:45 AM, Alexandre Rafalovitch  
> > wrote:
> >
> > It is not just a directory. There is a security feature that only
> > allows to read from several file locations (configurable in solr.xml).
> > Userfiles is part of default list, so whatever replacement is there
> > will need to be explicitly-named as well.
> >
> > Regards,
> >   Alex.
> >
> > On Fri, 25 Sep 2020 at 11:43, Jason Gerlowski  wrote:
> >
> >
> > You're left with the /filestore/ dir to put there what you want 
> > in standalone mode.
> >
> > That's fine by me as that's effectively what "userfiles" provides now.
> > The main difference I imagine is that the 'filestore' directory won't
> > exist at all in standalone? In that case standalone users will have to
> > know where to create the dir if they want to reference 'filestore'
> > files in their streaming expressions.  But that's not all that much
> > more to ask I guess.
> >
> > I'm curious - is the filestore unimplemented in standalone because no
> > one prioritized it, or is there a specific technical blocker?
> >
> > On Fri, Sep 25, 2020 at 11:24 AM David Smiley  wrote:
> >
> >
> > The "userfiles" thing is merely a directory.  There's no CRUD API on it.  
> > Even if the CRUD API of the file store isn't active in standalone -- I 
> > think that doesn't matter.  You're left with the /filestore/ dir 
> > to put there what you want in standalone mode.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Sep 25, 2020 at 11:00 AM Ishan Chattopadhyaya 
> >  wrote:
> >
> >
> > Standalone isn't supported.
> >
> > We need to transition away from the standalone mode and get rid of it 
> > completely.
> >
> > On Fri, 25 Sep, 2020, 7:50 pm Jason Gerlowski,  
> > wrote:
> >
> >
> > I don't know much about the new package/file-store, but it does sound
> > like a good replacement for 'userfiles' (which Eric is right- is
> > pretty awkward to work with).
> >
> > My only concerns with using the filestore would be around its
> > availability.  Is it enabled by default (or will it be in an upcoming
> > release)?  Does it work in standalone as well as SolrCloud?  If the
> > answer to those questions are both "yes", then it makes sense to me as
> > a replacement on the face of things.
> >
> > Jason
> >
> > On Fri, Sep 25, 2020 at 9:39 AM Eric Pugh
> >  wrote:
> >
> >
> > +1.  I’ve found the user files really useful when doing things with 
> > streaming, but it’s also awkward to reach to put files into.
> >
> >
> > On Sep 25, 2020, at 8:47 AM, David Smiley  wrote:
> >
> > Yes.  And I think it's high time that coreRootDirectory default to 
> > /something (e.g. "cores")
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Sep 25, 2020 at 8:42 AM Alexandre Rafalovitch  
> > wrote:
> >
> >
> > Would that file store also be under solr.home? Because if it is and
> > the user can upload core.property into it as well as other things that
> > core discovery will then load bypassing security
> >
> > Regards,
> >  Alex.
> >
> > On Fri, 25 Sep 2020 at 08:32, David Smiley  wrote:
> >
> >
> > I'm looking to see that we can deprecate "userfiles" and remove for 9.0
> >
> > Solr has a "userfiles" directory under Solr home that Jason added in some 
> > issue relating to streaming expressions accessing a local file.  I bet only 
> > a few of you have even heard of it.  I think the "File Store" that came 
> > along with the package manager

Re: Filestore, subsume userfiles requirement

2020-09-25 Thread Alexandre Rafalovitch

It is not just a directory. There is a security feature that only
allows to read from several file locations (configurable in solr.xml).
Userfiles is part of default list, so whatever replacement is there
will need to be explicitly-named as well.

Regards,
   Alex.

On Fri, 25 Sep 2020 at 11:43, Jason Gerlowski  wrote:
>
> > You're left with the /filestore/ dir to put there what you want 
> > in standalone mode.
> That's fine by me as that's effectively what "userfiles" provides now.
> The main difference I imagine is that the 'filestore' directory won't
> exist at all in standalone? In that case standalone users will have to
> know where to create the dir if they want to reference 'filestore'
> files in their streaming expressions.  But that's not all that much
> more to ask I guess.
>
> I'm curious - is the filestore unimplemented in standalone because no
> one prioritized it, or is there a specific technical blocker?
>
> On Fri, Sep 25, 2020 at 11:24 AM David Smiley  wrote:
> >
> > The "userfiles" thing is merely a directory.  There's no CRUD API on it.  
> > Even if the CRUD API of the file store isn't active in standalone -- I 
> > think that doesn't matter.  You're left with the /filestore/ dir 
> > to put there what you want in standalone mode.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Sep 25, 2020 at 11:00 AM Ishan Chattopadhyaya 
> >  wrote:
> >>
> >> Standalone isn't supported.
> >>
> >> We need to transition away from the standalone mode and get rid of it 
> >> completely.
> >>
> >> On Fri, 25 Sep, 2020, 7:50 pm Jason Gerlowski,  
> >> wrote:
> >>>
> >>> I don't know much about the new package/file-store, but it does sound
> >>> like a good replacement for 'userfiles' (which Eric is right- is
> >>> pretty awkward to work with).
> >>>
> >>> My only concerns with using the filestore would be around its
> >>> availability.  Is it enabled by default (or will it be in an upcoming
> >>> release)?  Does it work in standalone as well as SolrCloud?  If the
> >>> answer to those questions are both "yes", then it makes sense to me as
> >>> a replacement on the face of things.
> >>>
> >>> Jason
> >>>
> >>> On Fri, Sep 25, 2020 at 9:39 AM Eric Pugh
> >>>  wrote:
> >>> >
> >>> > +1.  I’ve found the user files really useful when doing things with 
> >>> > streaming, but it’s also awkward to reach to put files into.
> >>> >
> >>> >
> >>> > On Sep 25, 2020, at 8:47 AM, David Smiley  wrote:
> >>> >
> >>> > Yes.  And I think it's high time that coreRootDirectory default to 
> >>> > /something (e.g. "cores")
> >>> >
> >>> > ~ David Smiley
> >>> > Apache Lucene/Solr Search Developer
> >>> > http://www.linkedin.com/in/davidwsmiley
> >>> >
> >>> >
> >>> > On Fri, Sep 25, 2020 at 8:42 AM Alexandre Rafalovitch 
> >>> >  wrote:
> >>> >>
> >>> >> Would that file store also be under solr.home? Because if it is and
> >>> >> the user can upload core.property into it as well as other things that
> >>> >> core discovery will then load bypassing security
> >>> >>
> >>> >> Regards,
> >>> >>   Alex.
> >>> >>
> >>> >> On Fri, 25 Sep 2020 at 08:32, David Smiley  wrote:
> >>> >> >
> >>> >> > I'm looking to see that we can deprecate "userfiles" and remove for 
> >>> >> > 9.0
> >>> >> >
> >>> >> > Solr has a "userfiles" directory under Solr home that Jason added in 
> >>> >> > some issue relating to streaming expressions accessing a local file. 
> >>> >> >  I bet only a few of you have even heard of it.  I think the "File 
> >>> >> > Store" that came along with the package manager obsoletes 
> >>> >> > "userfiles".  If you have not heard of the file store either, I 
> >>> >> > wouldn't be surprised -- it's new and it's name was changed from 
> >>> >> > "package store" last minute, since it's general purpose, with 
> >>> >> > &quo

Re: Filestore, subsume userfiles requirement

2020-09-25 Thread Alexandre Rafalovitch

Would that file store also be under solr.home? Because if it is and
the user can upload core.property into it as well as other things that
core discovery will then load bypassing security

Regards,
  Alex.

On Fri, 25 Sep 2020 at 08:32, David Smiley  wrote:
>
> I'm looking to see that we can deprecate "userfiles" and remove for 9.0
>
> Solr has a "userfiles" directory under Solr home that Jason added in some 
> issue relating to streaming expressions accessing a local file.  I bet only a 
> few of you have even heard of it.  I think the "File Store" that came along 
> with the package manager obsoletes "userfiles".  If you have not heard of the 
> file store either, I wouldn't be surprised -- it's new and it's name was 
> changed from "package store" last minute, since it's general purpose, with 
> "packages" being a directory at the root of the file store for packages.  
> It's documented as the "package store" (should be renamed) on the package 
> manager internals doc: 
> https://lucene.apache.org/solr/guide/8_6/package-manager-internals.html#package-store
> However IMO it's worthy of its own doc page as it's a very useful new 
> component of the Solr platform.  It can store "user files" (hence obsoleting 
> the userfiles dir), ML models, or basically any file that's too large to put 
> in ZK.  I'd be nice if SolrResourceLoader could resolve resources from it -- 
> an issue for another day.  That would be another avenue of use separate from 
> the configSet.  You can already upload single files to the file store :-)
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: how to run/debug solr from eclipse

2020-09-24 Thread Alexandre Rafalovitch

You would still start your solr from the command line with bin/solr as
usual. To build that in master (solr 9), you would run "./gradlew -p
solr/packaging assemble" and find the results in
solr/packaging/build/solr-9.0.0-SNAPSHOT/

To debug, you would connect to it as a remote Debug session and you
would pass those agentlib parameters to the bin/solr command with -a
flag. It has an example if you run bin/solr -h. I am not sure if you
can debug multiple Solr instances running at once with the same
Eclipse instance. Maybe you have to have multiple copies of it open
with different debug ports configured.

I do this with IntelliJ Idea, but it should be similar with Eclipse.

Regards,
   Alex.
P.s. I haven't tried ./gradlew eclipse, but if you were switching
branches, there may be some junk leftover. You may want to do a very
clean checkout.

On Thu, 24 Sep 2020 at 14:11, uyilmaz  wrote:
>
>
> Hi all,
>
> I want to run/debug Solr inside Eclipse to debug some troubles I'm having 
> with streaming expressions. All the guides on the net explain how to do it 
> with Ant, but from what I see Solr migrated to Gradle. I tried two methods, 
> importing lucene-solr project as an existing gradle project, and running 
> ./gradlew eclipse to generate eclipse project files and importing it as a 
> regular eclipse project. There were many problems and red marks but assuming 
> I can resolve them, how do I run Solr? Is there a simple main method that I 
> can run, and how can I supply required minimal settings, especially for Cloud 
> mode?
>
> Regards
> --
> uyilmaz 
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: bin/solr testing surprise with techproducts example

2020-09-24 Thread Alexandre Rafalovitch

I run ./gradlew -p solr/packaging assemble . I think that shows when
you do ./gradlew helpWorkflow (one of many help commands added for our
projects). And it will be in solr/packaging/build/solr-9.0.0-SNAPSHOT

I need to experiment more with ./gradlew dev command, if it does not
do full wipe out, that could be useful.

I also use "git worktree" as I have maybe 4 Jiras worked on in the
same timeframe, plus I want to compare to baseline master build when I
screw things up. Apparently, IntelliJ Idea terminal will even remember
open locations in the project file, so the worktrees help to reopen
everything in issue-appropriate directories (still learning to take
advantage of that).

Regards,
   Alex.

On Thu, 24 Sep 2020 at 10:51, Erick Erickson  wrote:
>
> Christine:
>
> Quite possibly you had some remnants of an ant build hanging around from 
> bin/solr. If I start with a fresh clone and try to start from bin/solr I 
> usually get no class def errors.
>
> git clean -dxf if my friend to be absolutely sure that I have nothing laying 
> around when switching back and forth between 8x and master, although others 
> have suggested that git “worktree” is a much better alternative that I 
> haven’t explored yet. I’m sure it is, because for one thing “git clean -dxf” 
> removes any IDE files too...
>
> The correct place to run solr from should be under 
> “…/master/solr/packaging/build”, the “dev” and “assemble” targets will go 
> into different directories. “assemble” will wipe out anything that used to 
> be, the “dev” won’t, which will preserve directories, indexes and the like. 
> Definitely preferable for code change iterations.
>
> Finally, there was a helpful message telling you where the artifacts were 
> that got lost, it’ll get put back sometime. See SOLR-14888
>
> Best,
> Erick
>
> > On Sep 24, 2020, at 9:14 AM, Jason Gerlowski  wrote:
> >
> > I couldn't reproduce your error on running techproducts.  Though
> > whatever is causing it locally for you sounds a bit related to
> > SOLR-13690 maybe?
> >
> > Jason
> >
> > On Wed, Sep 23, 2020 at 11:28 AM Munendra S N  
> > wrote:
> >>
> >> The wiki has steps to build solr with gradle
> >> https://cwiki.apache.org/confluence/display/SOLR/Building+Solr+with+Gradle
> >>
> >> ./gradlew assemble or ./gradlew dev will create runnable solr instance.
> >>
> >>
> >> On Wed, Sep 23, 2020, 8:01 PM Christine Poerschke (BLOOMBERG/ LONDON) 
> >>  wrote:
> >>>
> >>> Hello everyone.
> >>>
> >>> So I was trying to locally test the small 
> >>> https://issues.apache.org/jira/browse/SOLR-11167 change on master branch 
> >>> and encountered two things:
> >>>
> >>> Question: What is the replacement for "cd solr ; ant dist server" usage?
> >>>
> >>> If there is an equivalent -- "./gradlew -p solr assembleDist" perhaps? -- 
> >>> then I'd be happy to update 
> >>> https://github.com/apache/lucene-solr/blob/master/help/ant.txt with the 
> >>> info.
> >>>
> >>> Observation: "cd solr ; bin/solr start -e techproducts" on master branch 
> >>> (but not branch_8x) gives me an error. Is this a known issue already or 
> >>> if not could someone try to reproduce the issue before a JIRA ticket is 
> >>> opened?
> >>>
> >>> ERROR: Error CREATEing SolrCore 'techproducts': Unable to create core 
> >>> [techproducts] Caused by: [schema.xml] analyzer/tokenizer: missing 
> >>> mandatory attribute 'class'
> >>>
> >>> Thanks,
> >>>
> >>> Christine
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Bug Bash

2020-09-24 Thread Alexandre Rafalovitch

That's interesting. I did track LUCENE-9497 from your comment and
gradle/validation/error-prone.gradle

But it seems all the flags are disabled for now. So, is it actually
running on pre-commit (check)?

Because, for example, we also introduced a custom Doclet to check ?
cross-references recently. But error-prone supports JavaDoc analysis
as well. So, maybe these things should be consolidated.

I am guessing this is a separate discussion from MuseDev's approach as
they will primarily run the checks on PRs and show changed warnings
only (and with more tools). But it is somewhat connected at the same
time in terms of using a static analysis approach.

Regards,
   Alex.
P.s. It also means that maybe writing my own error-prone plugin and
integrating it into the project is less difficult than I estimated.

On Wed, 23 Sep 2020 at 21:04, Mike Drob  wrote:
>
> Note that error prone is part of our standard compilation already.
>
> On Wed, Sep 23, 2020 at 6:14 PM Alexandre Rafalovitch  
> wrote:
>>
>> On Wed, 23 Sep 2020 at 18:56, Tom DuBuisson  wrote:
>>
>> > On Wed, Sep 23, 2020 at 9:11 AM Alexandre Rafalovitch  
>> > wrote:
>>
>> >> I would be super-curious to see how well it would be able to support
>>
>> >> Solr's gradle build with all the dark magic we seem to have in it.
>>
>> >
>>
>> >
>>
>> > Perhaps I should keep it a secret and ratchet up the suspense, but I'm not 
>> > much of a showman.
>>
>> >
>>
>> > The Infer and FSB tools ran on Solr seemingly fine 
>> > (https://console.muse.dev/result/TomMD/lucene-solr/01EG97PRSVXT35Z1E9T3SKA9V2?search=solr=results)
>> >  but with the noise level you expect on a large project with subtle 
>> > invariants.  The error prone results are lacking so I'll investigate.
>>
>> >
>>
>> The results are long enough that they could benefit from faceted
>>
>> search by source, file, error type, etc. I wish there was an
>>
>> open-source product you could leverage for such custom structured
>>
>> search :-) (Yes, I realize your primary interface is PR with much less
>>
>> noise)
>>
>>
>>
>> >>
>>
>> >> P.p.s. Medium term, I would love to write a custom check that
>>
>> >> complains about missing @since Javadoc tags for anything that is
>>
>> >> pluggable/module like, including Analyzers, UpdateRequestProcessors,
>>
>> >> Stream Components, etc. Knowing when each individual module is
>>
>> >> introduced is super useful for those on older versions and my previous
>>
>> >> attempts at fixing this required standalone code that even I cannot
>>
>> >> get to run again easily now.
>>
>> >
>>
>> >
>>
>> > I know we tweeted about this, but to bring that conversation to the ML: 
>> > The fastest way to write such a check is probably with an Error Prone 
>> > plugin.  There isn't any support yet for ErrorProne plugins inside of 
>> > Muse, but this has been on our minds for a while.  If someone beats me ot 
>> > making an error prone pass then I'll gladly make a way to run it.
>>
>>
>>
>> I have given error-prone a quick go. It works nicely when I installed
>>
>> IntelliJ plugin linked from their website.
>>
>>
>>
>> But for custom plugin. Let's just say I got lost between
>>
>> annotation processing during plugin compilation, annotation processing
>>
>> including plugin, module/project dependencies and IntelliJ Idea's
>>
>> options to make it work. So, I could not get a trivial example end to
>>
>> end in Idea's default project setup.
>>
>>
>>
>> But they have an example that I could import into IntelliJ idea
>>
>> and get it to work with Gradle setup:
>>
>> https://github.com/google/error-prone/tree/master/examples/plugin/gradle
>>
>> (clone whole repo, point IntelliJ at just that directory as a project,
>>
>> let it recognize Gradle, etc).
>>
>>
>>
>> So, theoretically, if you take that custom plugin and manage to figure
>>
>> out how to apply it to a custom project, I can keep beating my head on
>>
>> my own specialized needs in parallel.
>>
>>
>>
>> Anyway, the rest of this thread is probably not lucene-dev worthy. At
>>
>> least not until there is something to show.
>>
>>
>>
>> Regards,
>>
>>Alex.
>>
>>
>>
>> -
>>
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Bug Bash

2020-09-23 Thread Alexandre Rafalovitch

On Wed, 23 Sep 2020 at 18:56, Tom DuBuisson  wrote:
> On Wed, Sep 23, 2020 at 9:11 AM Alexandre Rafalovitch  
> wrote:
>> I would be super-curious to see how well it would be able to support
>> Solr's gradle build with all the dark magic we seem to have in it.
>
>
> Perhaps I should keep it a secret and ratchet up the suspense, but I'm not 
> much of a showman.
>
> The Infer and FSB tools ran on Solr seemingly fine 
> (https://console.muse.dev/result/TomMD/lucene-solr/01EG97PRSVXT35Z1E9T3SKA9V2?search=solr=results)
>  but with the noise level you expect on a large project with subtle 
> invariants.  The error prone results are lacking so I'll investigate.
>
The results are long enough that they could benefit from faceted
search by source, file, error type, etc. I wish there was an
open-source product you could leverage for such custom structured
search :-) (Yes, I realize your primary interface is PR with much less
noise)

>>
>> P.p.s. Medium term, I would love to write a custom check that
>> complains about missing @since Javadoc tags for anything that is
>> pluggable/module like, including Analyzers, UpdateRequestProcessors,
>> Stream Components, etc. Knowing when each individual module is
>> introduced is super useful for those on older versions and my previous
>> attempts at fixing this required standalone code that even I cannot
>> get to run again easily now.
>
>
> I know we tweeted about this, but to bring that conversation to the ML: The 
> fastest way to write such a check is probably with an Error Prone plugin.  
> There isn't any support yet for ErrorProne plugins inside of Muse, but this 
> has been on our minds for a while.  If someone beats me ot making an error 
> prone pass then I'll gladly make a way to run it.

I have given error-prone a quick go. It works nicely when I installed
IntelliJ plugin linked from their website.

But for custom plugin. Let's just say I got lost between
annotation processing during plugin compilation, annotation processing
including plugin, module/project dependencies and IntelliJ Idea's
options to make it work. So, I could not get a trivial example end to
end in Idea's default project setup.

But they have an example that I could import into IntelliJ idea
and get it to work with Gradle setup:
https://github.com/google/error-prone/tree/master/examples/plugin/gradle
(clone whole repo, point IntelliJ at just that directory as a project,
let it recognize Gradle, etc).

So, theoretically, if you take that custom plugin and manage to figure
out how to apply it to a custom project, I can keep beating my head on
my own specialized needs in parallel.

Anyway, the rest of this thread is probably not lucene-dev worthy. At
least not until there is something to show.

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Bug Bash

2020-09-23 Thread Alexandre Rafalovitch

I just tested this on my personal tiny Java project and it is
(committer) +1 from me. Did not test it as part of a PR process,
though would be super excited if I could retroactively graft it on my
current one somehow: https://github.com/apache/lucene-solr/pull/1863

I've also signed up for the ApacheCon code bash and confirmed as
mentor availability. How much time I will actually have is a question,
but I will try. Hopefully, I will end up on a Solr team, there was no
way to indicate that (Tom!).

I would be super-curious to see how well it would be able to support
Solr's gradle build with all the dark magic we seem to have in it.

Regards,
   Alex.
P.s. I suspect the full analysis of the code base would still be
rather noisy, but since that's not a default flow of
MuseDev, that should be fine.
P.p.s. Medium term, I would love to write a custom check that
complains about missing @since Javadoc tags for anything that is
pluggable/module like, including Analyzers, UpdateRequestProcessors,
Stream Components, etc. Knowing when each individual module is
introduced is super useful for those on older versions and my previous
attempts at fixing this required standalone code that even I cannot
get to run again easily now.

On Wed, 23 Sep 2020 at 11:56, Tom DuBuisson  wrote:
>
> Lucene Developers,
>
> As part of our sponsorship of ApacheCon, our company MuseDev is doing a Bug 
> Bash for select Apache projects. We'll bring members of the ApacheCon 
> community together to find and fix a range of security and performance bugs 
> during the conference, and gameify the experience with teams, a leaderboard, 
> and prizes. The bash is open to everyone whether attending the conference or 
> not, and our whole dev team will also be participating to help fix as many 
> bugs as we can.
>
> We're seeding the bug list with results from Muse, our code analysis 
> platform, which runs as a Github App and comments on possible bugs as part of 
> the pull request workflow.  Here's an example of what it looks like:
>
> https://github.com/curl/curl/pull/5971#discussion_r490252196
>
> We explored a number of Apache projects and are reaching out because our 
> analysis through Muse found some interesting bugs that could be fixed during 
> the Bash.  If this sounds familiar it's because I've been talking a bit on 
> this mailing list about Muse already. There has already been a bug fix based 
> on the tool findings, a prior conversation "Code Analysis during CI?", and a 
> PR adding configuration information for the GitHub App.
>
> We're writing to see if you'd be interested in having your project included 
> in the Bash. Everything is set up on our end, and while we're already working 
> with the infrastructure team to get lucene-solr added (with the PR and other 
> conversation as evidence of support) it would help if you say yes on this 
> listserv as a clear signal to the Apache Infrastructure team to grant Muse 
> access to your Github mirror.
>
> We'll then make sure it's all set-up and ready for the Bash. And of course, 
> everyone on the project is most welcome to join the Bash and help us smash 
> some bugs.
>
> -Tom

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: restlet dependencies

2020-09-23 Thread Alexandre Rafalovitch

How much harder are the use-cases currently covered by managed
resources, if that module was removed?

For standalone instances, it is nearly as easy to edit the file and
reload the schema. And it will probably be more version-control
friendly than the files currently saved by the module.
What about for SolrCloud?

My feeling is that this module did not catch on, I don't think anybody
ever implemented additional managed resources, though I remember
seeing Jiras. So, unless there are super-special use cases, I am +1 on
deprecating it ASAP for 8.7 and removing it in 9. It will fit with the
overall theme of getting slimmer and more consistent.

Regards,
   Alex.
P.s. Also, I think the question on SolrUsers about this had limited
response and mentioned a security issue.


On Wed, 23 Sep 2020 at 10:28, Timothy Potter  wrote:
>
> I agree we should deprecate the managed resources feature, it was the first 
> thing I was asked to build by LW nearly 7 years ago, before I was a 
> committer. Restlet was already in place and I built on top of that, not sure 
> who introduced it originally (nor do I care). Clearly from the vantage point 
> of looking back, JAX-RS and Jersey won the day with REST in Java but that 
> simply wasn't the case back then. What's important is how we move forward vs. 
> bestowing judgement backed by wisdom of hindsight on decisions made many 
> years ago.
>
> In the short term, does Apache have an Artifactory (or similar) where we can 
> host the Restlet dependencies for Github to pull them from? If not, then we 
> can port the code that's using Restlet over to using JAX-RS / Jersey. 
> Personally I'd prefer we remove Managed Resources support from 9 instead of 
> porting the Restlet code but I don't know if 9 is too soon from a deprecation 
> stand point?
>
> Tim
>
>
> On Mon, Sep 21, 2020 at 11:33 PM Noble Paul  wrote:
>>
>> We should deprecate that feature and remove restlet dependency altogether
>>
>> On Mon, Sep 21, 2020 at 10:20 PM Joel Bernstein  wrote:
>> >
>> > Restlet again!!!
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> >
>> > On Mon, Sep 21, 2020 at 7:18 AM Eric Pugh 
>> >  wrote:
>> >>
>> >> Do we have a community blessed alternative to restlet already?
>> >>
>> >> On Sep 20, 2020, at 9:40 AM, Noble Paul  wrote:
>> >>
>> >> Haha.
>> >>
>> >> In fact schema APIs don't use restlet. Only the managed resources use it
>> >>
>> >> On Sat, Sep 19, 2020, 3:35 PM Ishan Chattopadhyaya 
>> >>  wrote:
>> >>>
>> >>> If I were talend, I'd immediately start publishing to maven central. If 
>> >>> I were the developer who built the schema APIs, I would never have used 
>> >>> restlet to begin with.
>> >>>
>> >>> On Sat, 19 Sep, 2020, 1:13 am Uwe Schindler,  wrote:
>> 
>>  I was thinking the same. Because GitHub does not cache the downloaded 
>>  artifacts like our jenkins servers.
>> 
>>  It seems to run it in a new VM or container every time, so it downloads 
>>  all artifacts. If I were Talend, I'd also block this.
>> 
>>  Uwe
>> 
>>  Am September 18, 2020 7:32:47 PM UTC schrieb Dawid Weiss 
>>  :
>> >
>> > I don't think it's http/https - I believe restlet repository simply
>> > bans github servers because of excessive traffic? These URLs work for
>> > me locally...
>> >
>> > Dawid
>> >
>> > On Fri, Sep 18, 2020 at 6:35 PM Christine Poerschke (BLOOMBERG/
>> > LONDON)  wrote:
>> >>
>> >>
>> >>  This sounds vaguely familiar. "http works, https does not work" and 
>> >> https://issues.apache.org/jira/browse/SOLR-13756 possibly related?
>> >>
>> >>  From: dev@lucene.apache.org At: 09/18/20 10:01:29
>> >>  To: dev@lucene.apache.org
>> >>  Subject: Re: restlet dependencies
>> >>
>> >>  I don't think it is, sadly.
>> >>  https://repo1.maven.org/maven2/org/restlet
>> >>
>> >>  The link you provided (mvnrepository) aggregates from several maven
>> >>  repositories.
>> >>
>> >>
>> >>  D.
>> >>
>> >>  On Fri, Sep 18, 2020 at 10:46 AM Ishan Chattopadhyaya
>> >>   wrote:
>> >>>
>> >>>
>> >>>  Sorry, afk, but I heard (*hearsay*) that restlet is also on maven 
>> >>> central
>> >>
>> >> these days. Can we confirm and switch to that? Sorry, if that's not 
>> >> the case.
>> >>>
>> >>>
>> >>>  On Fri, 18 Sep, 2020, 1:15 pm Dawid Weiss,  
>> >>> wrote:
>> 
>> 
>>   Just FYI: can't get PR builds on github to work recently because 
>>  of this:
>> 
>> > Could not resolve all files for configuration
>> >>
>> >> ':solr:core:compileClasspath'.
>> 
>>   350 > Could not download org.restlet.ext.servlet-2.4.3.jar
>>   (org.restlet.jee:org.restlet.ext.servlet:2.4.3)
>>   351 > Could not get resource
>> 
>> >>

Re: Code Analysis during CI?

2020-09-23 Thread Alexandre Rafalovitch

ApacheCon is apparently running Muse-based CodeBash. Are we part of that?

Regards,
   Alex.

On Wed, 9 Sep 2020 at 05:22, Bruno Roustant  wrote:
>
> +1 for analysis within the PR workflow.
>
> Le ven. 4 sept. 2020 à 06:38, David Smiley  a écrit :
>>
>> Sounds great to me!  I'm really glad to hear it works with the PR workflow, 
>> and only on the files touched in the PR.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Thu, Sep 3, 2020 at 8:03 PM Tom DuBuisson  wrote:
>>>
>>> Tomás,
>>> Oof, thanks for the note on TOS.  I fixed the link.  The tool can be 
>>> configured and I'm happy to make things work better for your use case.  
>>> Muse is free for public repos and will remain free for open source 
>>> indefinitely.  You can try it and remove it any time - github is in charge 
>>> of access control and provides you as the repository owner with control via 
>>> the website.
>>>
>>> On Thu, Sep 3, 2020 at 4:37 PM Tomás Fernández Löbbe 
>>>  wrote:

 Thanks Tom. I think this could be very useful as long as it can be 
 configurable. (The "terms of use here[1] link to "google.com", so I 
 couldn't check that, but they claim it's free for public repos, so...). We 
 could always try it and remove it if we don't like it? What do others 
 think?


 [1] https://github.com/apps/muse-dev

 On Thu, Sep 3, 2020 at 3:06 PM Tom DuBuisson  wrote:
>
> Hello Lucene/Solr folks,
>
> During Lucene development CI is used for build and unit tests to gate 
> merges.  The CI doesn't yet include any analysis tools though, but their 
> use has been discussed [1].  I fixed some issues flagged by Facebook's 
> Infer and was prompted to bring up the topic here [2].
>
> The recent PR fixed some low-hanging fruit that was reported when I ran 
> Muse [3] - a github app that is a platform for static analysis tools.   
> Muse's platform bundles the most useful analysis tools, all open source 
> with many of them developed by FANG, and triggers analysis on PRs then 
> delivers results as comments.
>
> Because of the PR-centric workflow you only see issues related to the 
> changes in the pull request.  This means that even a project where tools 
> give a daunting list of issues can still have quiet day-to-day operation. 
> Muse also has options to configure individual tools and turn tools or 
> warnings off entirely.  If there are concerns in addition to noise and 
> added mental tax on development then I'd really like to hear those 
> thoughts.
>
> Would you be up for running Muse on the lucene-solr repo?  Let me know, 
> and I hope to hear your thoughts on analysis tools either way.
>
> -Tom
>
> [1] https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-8847
> [2] https://issues.apache.org/jira/projects/SOLR/issues/SOLR-14819
> [3] Muse result on Lucene: 
> https://console.muse.dev/result/TomMD/lucene-solr/01EH5WXS6C1RH1NFYHP6ATXTZ9?tab=results
> Muse app link: https://github.com/apps/muse-dev
> [4] https://github.com/TomMD/lucene-solr/pulls
> [5] Example of muse commenting on an issue 
> https://github.com/TomMD/shiro/pull/2
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

gradle on master is not skipping README.committers.txt

2020-09-05 Thread Alexandre Rafalovitch

It seems that solr/licenses/README.committers.txt file is not ignored
in the gradle build and slips into the distribution package.

I did not check if this affects the 8_x branch.

Do I need to create JIRA for this or can somebody just fix it?

Regards,
   Alex.
Ps. And I found it because the file points to
http://wiki.apache.org/solr/CommitterInfo, which redirects to cwiki,
which also looks very out of date and talks about subversion and ant.
But it does have a section on regenerating analysis factories, which I
don't know if gradle does. But if it does, maybe that information
should go into the developer guide and we relink there directly...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Name for the directory above solr.home?

2020-09-04 Thread Alexandre Rafalovitch

coreRootDirectory!

If it takes a relative path, I am good with that. Great even.

We can update default solr.xml and fix documentation/whatever. The
rest of the files can go in solr.home.

Updated SIP-10 as primary path to explore for layout.

Regards,
   Alex.
P.s. server/solr should be server/solrhome (or not in server, but
"solrhome"). I wasted 20 minutes trying to figure the
"example/example-DIH/solr/solr/conf/" path.

On Fri, 4 Sep 2020 at 11:45, David Smiley  wrote:
>
> On Fri, Sep 4, 2020 at 11:24 AM Alexandre Rafalovitch  
> wrote:
>>
>> > I really wish the default cores dir was not solr.home itself but one 
>> > directory beneath.
>> Why? That is worth exploring.
>
>
> Because core discovery looks for cores in directories that are not cores 
> (e.g. filestore)
>
>>
>>
>> I am ok - if others are - with solr.home containing
>> *) solr.in.sh
>> *) configset
>> *) logs
>> *) log and related configuration
>> *) userfiles
>> *) filestore
>> *) solr.xml
>> *) actual cores
>>
>> However, this feels to me like it makes the cores crowded and -
>> potentially - interferes/slows-down the recursive core discovery
>> mechanism. A lot of directories to stat potentially.
>
>
> Hence add a "cores" dir :-)  This is already supported, see 
> "coreRootDirectory" in solr.xml documented here: 
> https://lucene.apache.org/solr/guide/8_6/format-of-solr-xml.html#format-of-solr-xml
>
>>
>> But moving cores from under solr.home is a LOT of updates I suspect in
>> Solr, documentation and 3rd party solutions.
>
>
> I think it's worth it; we should have done this a long time ago.
>
>>
>> To me, I would formally
>> establish a 'node' directory above solrhome and move everything but
>> solr.xml and cores up there. Including the recently added 'userfiles'
>> and 'filestore' if at all possible.
>
>
> Essentially, you are proposing that solr.home *be* the core directory _only_, 
> and that some other dir where all the other stuff currently in solr.home get 
> some new special name.  I don't think we need new special dirs though.  And I 
> think it's useful that "solr.home" retain all the stuff in it already.
>
> Any thoughts on this thread Jan?  I mention you because you tend to comment 
> on such matters.
>
>>
>>
>> Regards,
>> Alex.
>>
>>
>> On Fri, 4 Sep 2020 at 11:04, David Smiley  wrote:
>> >
>> > On Fri, Sep 4, 2020 at 10:42 AM Alexandre Rafalovitch  
>> > wrote:
>> >>
>> >> I feel that maybe we don't treat it special mentally, but then it
>> >> actually is. At least with minimum custom locations configured.
>> >>
>> >> Consider:
>> >> 1) What is in "server", above "server/solr"
>> >> * resources/log4j2.xml (because of magic jetty classpath mechanism)
>> >> * logs
>> >> * NOT server/solr/configsets (but probably should be to avoid
>> >> confusion with recursive core searching algorithm or just people
>> >> trying to understand what those directories are)
>> >
>> >
>> > Correct; configsets is in solr.home
>> >
>> >>
>> >> 2) What is in "example/schemaless"
>> >> * logs (because of hardcoded magic in bin/solr)
>> >> * NOT log4j2.xml, because we simplified it but also lost an ability to 
>> >> customize
>> >>
>> >> 3) What is in "example/cloud/node1"
>> >> * logs (same magic)
>> >>
>> >> From other discussions, if you try to reproduce examples outside of
>> >> the example directory, they will log to the global log directory and
>> >> mess with each other, unless you pass on an individual log directory
>> >> location for each bin/solr command.
>> >>
>> >>
>> >> I am wondering if the example above solr.home should be the place we
>> >> try to check for:
>> >> 1) logging configuration (or override that latest log4 seems to support)
>> >> 2) logs
>> >> 3) configsets available to that node
>> >>
>> >> 4) maybe solr.in.sh
>> >
>> >
>> > Starting more than one Solr node on a machine is rather unusual, but it 
>> > shows up in certain examples, and on a local dev machine, but I don't 
>> > think "the real world" (prod).  I think it would help a bit if logs were 
>> > under solr.home.  It's rare to touch logging config, but it'd be 
>>

Re: Name for the directory above solr.home?

2020-09-04 Thread Alexandre Rafalovitch

> I really wish the default cores dir was not solr.home itself but one 
> directory beneath.
Why? That is worth exploring.

I am ok - if others are - with solr.home containing
*) solr.in.sh
*) configset
*) logs
*) log and related configuration
*) userfiles
*) filestore
*) solr.xml
*) actual cores

However, this feels to me like it makes the cores crowded and -
potentially - interferes/slows-down the recursive core discovery
mechanism. A lot of directories to stat potentially.

But moving cores from under solr.home is a LOT of updates I suspect in
Solr, documentation and 3rd party solutions. To me, I would formally
establish a 'node' directory above solrhome and move everything but
solr.xml and cores up there. Including the recently added 'userfiles'
and 'filestore' if at all possible.

Regards,
Alex.


On Fri, 4 Sep 2020 at 11:04, David Smiley  wrote:
>
> On Fri, Sep 4, 2020 at 10:42 AM Alexandre Rafalovitch  
> wrote:
>>
>> I feel that maybe we don't treat it special mentally, but then it
>> actually is. At least with minimum custom locations configured.
>>
>> Consider:
>> 1) What is in "server", above "server/solr"
>> * resources/log4j2.xml (because of magic jetty classpath mechanism)
>> * logs
>> * NOT server/solr/configsets (but probably should be to avoid
>> confusion with recursive core searching algorithm or just people
>> trying to understand what those directories are)
>
>
> Correct; configsets is in solr.home
>
>>
>> 2) What is in "example/schemaless"
>> * logs (because of hardcoded magic in bin/solr)
>> * NOT log4j2.xml, because we simplified it but also lost an ability to 
>> customize
>>
>> 3) What is in "example/cloud/node1"
>> * logs (same magic)
>>
>> From other discussions, if you try to reproduce examples outside of
>> the example directory, they will log to the global log directory and
>> mess with each other, unless you pass on an individual log directory
>> location for each bin/solr command.
>>
>>
>> I am wondering if the example above solr.home should be the place we
>> try to check for:
>> 1) logging configuration (or override that latest log4 seems to support)
>> 2) logs
>> 3) configsets available to that node
>>
>> 4) maybe solr.in.sh
>
>
> Starting more than one Solr node on a machine is rather unusual, but it shows 
> up in certain examples, and on a local dev machine, but I don't think "the 
> real world" (prod).  I think it would help a bit if logs were under 
> solr.home.  It's rare to touch logging config, but it'd be interesting to 
> allow a log4j2.xml in solr.home to take precedence.  configsets dir is 
> already in solr.home.  Again, it'd be interesting for a solr.in.sh in 
> solr.home to take precedence if it is defined.  All this would help make a 
> solr.home a complete place for a Solr node if you want to isolate it from 
> other nodes -- be it for "example"/tutorial reasons, or for keeping multiple 
> side-projects together.  In addition to all this, I really wish the default 
> cores dir was not solr.home itself but one directory beneath.
>
>>
>> This would certainly make example setups easy and less magical.
>
>
> Yes; that'd be nice.
>
>>
>> I don't know if this is the right answer. I specifically don't know if
>> this will mess up cloud setups.
>
>
> SolrCloud isn't special with regards to this discussion (I think).
>
>>
>> But I see a pattern that is
>> unacknowledged and think that maybe acknowledging would allow us to
>> have a more general solution that magic in bin/solr. And maybe just in
>> time for the Solr in Docker final decisions.
>
>
> Yeah; I hope you like my suggestions above.
>
>>
>> Regards,
>>Alex.
>> P.s. To make it more interesting, I am also confused about pid files
>> going into bin directory. I bet docker image puts them somewhere else.
>>
>> On Fri, 4 Sep 2020 at 01:04, David Smiley  wrote:
>> >
>> > The parent directory of Solr home is not special.  Where Solr Home is (as 
>> > you know) configurable.  It's default location is different too, since 
>> > it's in /var/solr for both the Docker & service install script.
>> >
>> > ~ David Smiley
>> > Apache Lucene/Solr Search Developer
>> > http://www.linkedin.com/in/davidwsmiley
>> >
>> >
>> > On Tue, Aug 25, 2020 at 8:31 PM Alexandre Rafalovitch  
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> What do we call the directory above the solr.home?
>> >> E.g.
>&g

Re: Name for the directory above solr.home?

2020-09-04 Thread Alexandre Rafalovitch

I feel that maybe we don't treat it special mentally, but then it
actually is. At least with minimum custom locations configured.

Consider:
1) What is in "server", above "server/solr"
* resources/log4j2.xml (because of magic jetty classpath mechanism)
* logs
* NOT server/solr/configsets (but probably should be to avoid
confusion with recursive core searching algorithm or just people
trying to understand what those directories are)
2) What is in "example/schemaless"
* logs (because of hardcoded magic in bin/solr)
* NOT log4j2.xml, because we simplified it but also lost an ability to customize
3) What is in "example/cloud/node1"
* logs (same magic)

>From other discussions, if you try to reproduce examples outside of
the example directory, they will log to the global log directory and
mess with each other, unless you pass on an individual log directory
location for each bin/solr command.

I am wondering if the example above solr.home should be the place we
try to check for:
1) logging configuration (or override that latest log4 seems to support)
2) logs
3) configsets available to that node
4) maybe solr.in.sh

This would certainly make example setups easy and less magical.

I don't know if this is the right answer. I specifically don't know if
this will mess up cloud setups. But I see a pattern that is
unacknowledged and think that maybe acknowledging would allow us to
have a more general solution that magic in bin/solr. And maybe just in
time for the Solr in Docker final decisions.

Regards,
   Alex.
P.s. To make it more interesting, I am also confused about pid files
going into bin directory. I bet docker image puts them somewhere else.

On Fri, 4 Sep 2020 at 01:04, David Smiley  wrote:
>
> The parent directory of Solr home is not special.  Where Solr Home is (as you 
> know) configurable.  It's default location is different too, since it's in 
> /var/solr for both the Docker & service install script.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Aug 25, 2020 at 8:31 PM Alexandre Rafalovitch  
> wrote:
>>
>> Hello,
>>
>> What do we call the directory above the solr.home?
>> E.g.
>> - "schemaless" in example/schemaless/solr/gettingstarted
>> - "node1" in example/cloud/node1/solr/gettingstarted_shard1_replica_n2
>> - "solr" in server/solr/book/conf
>>
>> In solr.in.cmd we "may be" calling it "solr start dir"
>> In bin/solr, we call it SOLR_SERVER_DIR
>> In install_solr-service.sh, we sort-of call it SOLR_VAR_DIR "Directory
>> for live/writable Solr files", which is not quite the same because
>> apparently pid files go there, while for distribution they seem to go
>> into bin
>>
>> I guess it mostly matters when you have multiple Solr instances
>> running on the same machine and you need to separate log locations,
>> etc.
>>
>> Regards,
>>   Alex.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Annoying but harmless exceptions due to filepermissions when running tests

2020-09-02 Thread Alexandre Rafalovitch

There is a flag to disable package manager. Can that code path avoid
creating directory? Or maybe it does already.

Then tests that don't test that specifically could have disable flag on.

Regards,
   Alex

On Wed., Sep. 2, 2020, 9:41 p.m. Noble Paul,  wrote:

> The filestore dir is where the packages live. If we move it to another
> location, existing installations might fail. So, it's a backward
> incompatible change.
>
> What are our options?
>
> Is it possible to have these directories precreated in the distro/code to
> ensure that these warnings don't come.
>
> On Thu, Sep 3, 2020, 4:58 AM Erick Erickson 
> wrote:
>
>> Oh bother. Somehow I thought I’d looked and only a handful of tests
>> reported this.
>>
>> So I looked again and I _wish_ I was able to blame drugs cause you’re
>> right, there
>> are over a thousand of them.
>>
>> Never mind...
>>
>> > On Sep 1, 2020, at 8:55 PM, Chris Hostetter 
>> wrote:
>> >
>> >
>> > : Hmmm, that’s kind of an dilemma then. Are you saying that
>> > : he test can see that the directory appears writable then tries
>> > : to write to it then gets tripped up by the framework?
>> > :
>> > : Seems to me that a test that tries to write, thinks it can and then
>> > : can’t should fail anyway.
>> > :
>> > : Well, I don’t think there are very many tests that have this problem
>> > : anyway, so maybe I can examine them one-by-one and not
>> > : introduce new failures...
>> >
>> > You keep using the phrase "the test" in the context of (trying to)
>> create
>> > these directories ("userfiles" and "filestore") ... the "test CODE"
>> isn't
>> > making any choices about trying to write those files -- the choice is
>> > being made by the "CoreContainer CODE".
>> >
>> > These features were added with the explicit implementation choice to
>> _TRY_
>> > to write the "usersfiles" (and/or "filestore") directory to "solr home"
>> IF
>> > POSSIBLE, and if so then enable a bunch of features -- if NOT then log
>> a
>> > WARNing and don't enable those features.
>> >
>> > So what you're seeing here isn't an artifact/result of any particular
>> > choices "a test" or "the test" makes -- it's a concious choice of the
>> > developer who added this feature to solr.  These WARN messages that
>> show
>> > up in tests where the solr home dir isn't writable (which is actaully
>> the
>> > vast majority of tests because of how the test framework works) are the
>> > same types of WARN messages that a "real" solr deployment might get if
>> > their solr home dir isn't wriable (ie: maybe the use ${solr.data.dir}
>> to
>> > point to a diff drive).
>> >
>> >
>> >
>> > :
>> > : > On Aug 31, 2020, at 1:29 PM, Chris Hostetter <
>> hossman_luc...@fucit.org> wrote:
>> > : >
>> > : >
>> > : > Some tests "create" a new solr home dir and copy config files
>> there, but
>> > : > you'll see this type of WARN logging for any test that just uses
>> the test
>> > : > configs "in place" because of how the code is designed to _try_ and
>> create
>> > : > a userfiles directory in the solr home if it's writable.
>> > : >
>> > : >
>> > : > : Date: Sat, 29 Aug 2020 09:25:17 -0400
>> > : > : From: Erick Erickson 
>> > : > : Reply-To: dev@lucene.apache.org
>> > : > : To: dev@lucene.apache.org
>> > : > : Subject: Re: Annoying but harmless exceptions due to
>> filepermissions when
>> > : > : running tests
>> > : > :
>> > : > : Well, as Uwe and I discussed offline, he’s right and I’m wrong.
>> > : > :
>> > : > : In CoreContainer [364] there’s code like this:
>> > : > :
>> > : > : Path userFilesPath = return solrHome.resolve("userfiles"); //
>> TODO make configurable on cfg?
>> > : > : try {
>> > : > :   Files.createDirectories(userFilesPath); // does nothing if
>> already exists
>> > : > : } catch (Exception e) {
>> > : > :   log.warn("Unable to create [{}].  Features requiring this
>> directory may fail.", userFilesPath, e);
>> > : > : }
>> > : > :
>> > : > : So I assumed it would happen on most every test, at least in
>> cloud mode. But when I tried to make it happen on a different test, there
>> was no exception.
>> > : > :
>> > : > : I’ll have to poke some more to see what’s really happening…
>> > : > :
>> > : > : Never Mind….
>> > : > :
>> > : > : > On Aug 29, 2020, at 8:59 AM, Uwe Schindler 
>> wrote:
>> > : > : >
>> > : > : > Hi,
>> > : > : >
>> > : > : > this is a bug in the test! It should never ever modify files
>> outside its sandbox. It should not even modify files in build directory. It
>> is fully valid to reject those writes - has nothing to do with users, it's
>> just forbidden by the test framework. Modifying this file is harmful, as it
>> may affect later tests.
>> > : > : >
>> > : > : > So the correct way is to copy those files to the solr container
>> running inside test's sandbox. That's one of the main advantages of the
>> Test sandbox: No write access anywhere outside the test, see policy file.
>> > : > : >
>> > : > : > Uwe
>> > : > : >
>> > : > : > -
>> > : > : > Uwe Schindler
>> > : > : >

master/gradle seems to expand JIRA IDs in documentation filepath references

2020-09-01 Thread Alexandre Rafalovitch

Hi,
I am doing some work in a branch named after JIRA and gradle refuses
to build because explands a variable and then seems to expand the
SOLR-XYZ in the path with the Jira link:

Specifically:
*[Lucene Documentation](${project.luceneDocUrl}/index.html) in
solr/site.index.template.md (or maybe the .xsl version)
becomes:
https://issues.apache.org/jira/browse/SOLR-14792)-velocity/lucene/build/documentation//index.html">Lucene
Documentation

The file path is:
/Users/arafalov/Projects/solr/working-branches/SOLR-14792-velocity/build

This is undesirable expanded part:
[SOLR-14792](https://issues.apache.org/jira/browse/SOLR-14792)

This does not happen on master branch, as it does not have that
pattern in the URL.

I can work around it for now by doing
./gradlew check -x :solr:checkBrokenLinks , though I don't know how
much I am shortcuting this way.

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-01 Thread Alexandre Rafalovitch

That Jeopardy set reads very dubious. Content that was collected by
scraping and available on various sharing sites (including Mega!). I
would not feel comfortable working with that in our context.

There are other dataset sources. I like the ones that Data is Plural
newsletter collects: https://tinyletter.com/data-is-plural (full list
at: 
https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit#gid=0
). Again, copyright is important and I think having a local copy is
important too, for at least tutorial purposes.

But I wish we could figure out a way to include the RefGuide. It is
just so much more triple-bottom line solution than just random other
dataset. We could do a graph of cross-references in the guide, figure
out how to extract java path references, etc.

Anyway, it is not something that is super-urgent. I don't even know
whether our new build processes can be augmented to do this. I guess
it is a bit similar to how we run tests.

I just wanted to get a strong yay/nay on the idea. So far it feels
like I got one strong yay, one caution and one soft nay.

Regards,
   Alex.



On Tue, 1 Sep 2020 at 12:28, Jan Høydahl  wrote:
>
> What about 200.000 Jeopardy questions in JSON format?
> https://www.reddit.com/r/datasets/comments/1uyd0t/20_jeopardy_questions_in_a_json_file/
> I downloaded the file in a few seconds, and it also has some structured 
> content, e.g.
>
>   {
> "category": "NOVELS",
> "air_date": "2005-01-27",
> "question": "'Even the epilogue is lengthy in this 1869 Tolstoy epic; it 
> comes out in 2 parts &, in our copy, is 105 pages long'",
> "value": "$400",
> "answer": "War and Peace",
> "round": "Jeopardy!",
> "show_number": "4699"
>   },
>   {
> "category": "BRIGHT IDEAS",
> "air_date": "2005-01-27",
> "question": "'In 1948 scientists at Bristol-Meyers \"buffered\" this 
> medicine for the first time'",
> "value": "$400",
> "answer": "aspirin",
> "round": "Jeopardy!",
> "show_number": "4699"
>   },
>
> Lots of docs. Enough free-text to learn some analysis, enough metadata for 
> some meaningful facets / filters…
>
> As long as we only provide a URL and not re-distribute the content, licensing 
> is less of a concern.
>
> Jan
>
> 1. sep. 2020 kl. 15:59 skrev Alexandre Rafalovitch :
>
> I've thought of providing instructions. But for good indexing, we
> should use adoc format as source, rather than html (as Cassandra's
> presentation showed), so that means dependencies to build by user to
> get asciidoctor library. And the way to get content, so either git
> clone or download the whole source and unpack and figure out the
> directory locations. It feels messy. Then, it may as well be an
> external package or even an external independent project. And
> therefore, it would lose value as a shipped tutorial material.
>
> We could also discuss actually shipping the Solr Reference Guide with
> Solr now that the release cycles align, but that would actually not
> help my sub-project too much, again because of adoc vs. html formats.
>
> In terms of other datasets:
> *) I could just stay with limited full-text in the one I am thinking
> of. The bulk download mode allows for fields such as Occupation,
> Company and Vehicle model which are 2-7 words long. That's about the
> same length as current examples we ship. It does not allow for a
> meaningful discussion about longer-text issues such as
> length-normalization, but we don't have those now anyway.
> *) I could use a public domain book and break it into parts. From
> somewhere like https://standardebooks.org/ . But there is a question
> about licensing and also whether we will be able to show interesting
> effects with that.
> *) I was also told that there is Wikipedia, but again, would we just
> include a couple of articles at random? What's the license?
> *) It is possible to index Stack Overflow questions, either from the
> feed (DIH was doing that) or as a download. I think the license was
> compatible.
> *) I could augment the dataset with some mix of the above, like a
> "favourite quote" field with random book sentences. This feels like
> fun, but possibly a whole separate project of its own.
>
> Anyway, I am open to further thoughts. It is quite likely I missed something.
>
> Regards,
>   Alex.
>
> T
>
> On Tue, 1 Sep 2020 at 03:10, Jan Høydahl  wrote:
>
>
> I’d rather ship a tutorial and tooling that expla

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-01 Thread Alexandre Rafalovitch

I've thought of providing instructions. But for good indexing, we
should use adoc format as source, rather than html (as Cassandra's
presentation showed), so that means dependencies to build by user to
get asciidoctor library. And the way to get content, so either git
clone or download the whole source and unpack and figure out the
directory locations. It feels messy. Then, it may as well be an
external package or even an external independent project. And
therefore, it would lose value as a shipped tutorial material.

We could also discuss actually shipping the Solr Reference Guide with
Solr now that the release cycles align, but that would actually not
help my sub-project too much, again because of adoc vs. html formats.

In terms of other datasets:
*) I could just stay with limited full-text in the one I am thinking
of. The bulk download mode allows for fields such as Occupation,
Company and Vehicle model which are 2-7 words long. That's about the
same length as current examples we ship. It does not allow for a
meaningful discussion about longer-text issues such as
length-normalization, but we don't have those now anyway.
*) I could use a public domain book and break it into parts. From
somewhere like https://standardebooks.org/ . But there is a question
about licensing and also whether we will be able to show interesting
effects with that.
*) I was also told that there is Wikipedia, but again, would we just
include a couple of articles at random? What's the license?
*) It is possible to index Stack Overflow questions, either from the
feed (DIH was doing that) or as a download. I think the license was
compatible.
*) I could augment the dataset with some mix of the above, like a
"favourite quote" field with random book sentences. This feels like
fun, but possibly a whole separate project of its own.

Anyway, I am open to further thoughts. It is quite likely I missed something.

Regards,
   Alex.

T

On Tue, 1 Sep 2020 at 03:10, Jan Høydahl  wrote:
>
> I’d rather ship a tutorial and tooling that explains how to index the 
> ref-guide, than shipping a binary index.
> What other full-text datasets have you considered as candidates for 
> getting-started examples?
>
> Jan
>
> 1. sep. 2020 kl. 05:53 skrev Alexandre Rafalovitch :
>
> I did not say it was trivial, but I also did not quite mention the previous 
> research.
>
> https://github.com/arafalov/solr-refguide-indexing/blob/master/src/com/solrstart/refguide/Indexer.java
>
> Uses official AsciidoctorJ library directory. Not sure if that's just JRuby 
> version of Asciidoctor we currently use to build. But this should only affect 
> the development process, not the final built package.
>
> I think I am more trying to figure out what people think about shipping an 
> actual core with the distribution. That is something I haven't seen done 
> before. And may have issues I did not think of.
>
> Regards,
> Alex
>
> On Mon., Aug. 31, 2020, 10:11 p.m. Gus Heck,  wrote:
>>
>> Some background to consider before committing to that... it might not be as 
>> trivial as you think. (I've often thought it ironic that we don't have real 
>> search for our ref guide... )
>>
>> https://www.youtube.com/watch?v=DixlnxAk08s
>>
>> -Gus
>>
>> On Mon, Aug 31, 2020 at 2:06 PM Ishan Chattopadhyaya 
>>  wrote:
>>>
>>> I love the idea of making the ref guide itself as an example dataset. That 
>>> way, we won't need to ship anything separately. Python's beautiful soup can 
>>> extract text from the html pages. I'm sure there maybe such things in Java 
>>> too (can Tika do this?).
>>>
>>> On Mon, 31 Aug, 2020, 11:18 pm Alexandre Rafalovitch,  
>>> wrote:
>>>>
>>>> Hi,
>>>> I need a sanity check.
>>>>
>>>> I am in the planning stages for the new example datasets to ship with
>>>> Solr 9. The one I am looking at is great for structured information,
>>>> but is quite light on full-text content. So, I am thinking of how
>>>> important that is and what other sources could be used.
>>>>
>>>> One - only slightly - crazy idea is to use Solr Reference Guide itself
>>>> as a document source. I am not saying we need to include the guide
>>>> with Solr distribution, but:
>>>> 1) I could include a couple of sample pages
>>>> 2) I could index the whole guide (with custom Java-code) during the
>>>> final build and we could ship the full index (with stored=false) with
>>>> Solr, which then basically becomes a local search for the remote guide
>>>> (with absolute URLs).
>>>>
>>>> Either way would allow us to also explore what a goo

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-08-31 Thread Alexandre Rafalovitch

I did not say it was trivial, but I also did not quite mention the previous
research.

https://github.com/arafalov/solr-refguide-indexing/blob/master/src/com/solrstart/refguide/Indexer.java

Uses official AsciidoctorJ library directory. Not sure if that's just JRuby
version of Asciidoctor we currently use to build. But this should only
affect the development process, not the final built package.

I think I am more trying to figure out what people think about shipping an
actual core with the distribution. That is something I haven't seen
done before. And may have issues I did not think of.

Regards,
Alex

On Mon., Aug. 31, 2020, 10:11 p.m. Gus Heck,  wrote:

> Some background to consider before committing to that... it might not be
> as trivial as you think. (I've often thought it ironic that we don't have
> real search for our ref guide... )
>
> https://www.youtube.com/watch?v=DixlnxAk08s
>
> -Gus
>
> On Mon, Aug 31, 2020 at 2:06 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> I love the idea of making the ref guide itself as an example dataset.
>> That way, we won't need to ship anything separately. Python's beautiful
>> soup can extract text from the html pages. I'm sure there maybe such things
>> in Java too (can Tika do this?).
>>
>> On Mon, 31 Aug, 2020, 11:18 pm Alexandre Rafalovitch, 
>> wrote:
>>
>>> Hi,
>>> I need a sanity check.
>>>
>>> I am in the planning stages for the new example datasets to ship with
>>> Solr 9. The one I am looking at is great for structured information,
>>> but is quite light on full-text content. So, I am thinking of how
>>> important that is and what other sources could be used.
>>>
>>> One - only slightly - crazy idea is to use Solr Reference Guide itself
>>> as a document source. I am not saying we need to include the guide
>>> with Solr distribution, but:
>>> 1) I could include a couple of sample pages
>>> 2) I could index the whole guide (with custom Java-code) during the
>>> final build and we could ship the full index (with stored=false) with
>>> Solr, which then basically becomes a local search for the remote guide
>>> (with absolute URLs).
>>>
>>> Either way would allow us to also explore what a good search
>>> configuration could look like for the Ref Guide for when we are
>>> actually ready to move beyond its current "headings-only" javascript
>>> search. Actually, done right, same/similar tool could also feed
>>> subheadings into the javascript search.
>>>
>>> Like I said, sanity check?
>>>
>>> Regards,
>>>Alex.
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Approach towards solving split package issues?

2020-08-31 Thread Alexandre Rafalovitch

It was causing issues when I was building solr start resource website too.

So +1 on sorting it out.

Regards,
   Alex.

On Mon., Aug. 31, 2020, 5:50 p.m. Tomoko Uchida, <
tomoko.uchida.1...@gmail.com> wrote:

> Hello devs,
>
> we have lots of package name conflicts (shared package names) between
> modules in the Lucene/Solr source tree. It is not only annoying for
> devs/users but also indeed bad practice since Java 9 (according to my
> understanding), and we already have some problems with Javadocs due to
> these splitted packages as some of us would know. I'm curious about the
> issue from a while ago. My questions are, Q1: How can we solve the issue in
> an organized way? Q2: How many of us really have interests about that?
>
> To break down Q1,
> - A JIRA for building a grand design and organizing sub tasks is needed?
> We have a couple of issues (e.g. LUCENE-9317 and LUCENE-9319) about it and
> I had been playing around them before; but I feel like an umbrella ticket
> would be needed.
> - When to start and what's the target version to be out? My feeling is
> that after cutting branch_9x is the right moment to start and 10.0.0 is
> suitable for the target, does this make sense?
> - Are there any other tasks/concerns to be considered except for just
> renaming packages?
>
> Regarding Q2,
> I know some of us have deep knowledge and thoughts in this topic, but for
> now I am not sure how many of you have the will to give help or take time
> for that.
> It can't be a one-man effort. The more people understand and can
> contribute to the build, the more healthy it will be. (I borrowed this
> phrase from Gradle build issue LUCENE-9077).
>
> I don't intend to rush into making a decision, my purpose here is to
> collect information to see if I can handle it before opening a JIRA.
>
> Thanks,
> Tomoko
>

SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-08-31 Thread Alexandre Rafalovitch

Hi,
I need a sanity check.

I am in the planning stages for the new example datasets to ship with
Solr 9. The one I am looking at is great for structured information,
but is quite light on full-text content. So, I am thinking of how
important that is and what other sources could be used.

One - only slightly - crazy idea is to use Solr Reference Guide itself
as a document source. I am not saying we need to include the guide
with Solr distribution, but:
1) I could include a couple of sample pages
2) I could index the whole guide (with custom Java-code) during the
final build and we could ship the full index (with stored=false) with
Solr, which then basically becomes a local search for the remote guide
(with absolute URLs).

Either way would allow us to also explore what a good search
configuration could look like for the Ref Guide for when we are
actually ready to move beyond its current "headings-only" javascript
search. Actually, done right, same/similar tool could also feed
subheadings into the javascript search.

Like I said, sanity check?

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

With ant removed, will GitHub PR job always fail 1st check?

2020-08-28 Thread Alexandre Rafalovitch

For the Pull Request, GitHub is running both Ant and Gradle precommit.

Now that ant is gone, it is probably safe to remove that check as
well. It will always fail at "Ivy bootstrap" phase.

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr configuration options

2020-08-28 Thread Alexandre Rafalovitch

This is way above my head, but I wonder if we could dogfood any of
this with a future Solr cloud example? At the moment, it sets up 2-4
nodes, 1 collection, any number of shards/replicas. And it does it by
directory clone and some magic in bin/solr to ensure logs don't step
on each other's foot.

If we have an idea of what this should look like and an example we
actually ship, we could probably make it much more concrete.

Regards,
   Alex.


On Fri, 28 Aug 2020 at 15:12, Gus Heck  wrote:
>
> Sure of course someone has to set up the first one, that should be an initial 
> collaboration with devops one can never escape that. Mount points can be 
> established in an automated fashion and named by convention. My yearning is 
> to make the devops side of it devops based (provide machines that look like X 
> where all the "X things" are attributes familiar to devops people such as 
> CPUs/mounts/RAM/etc.) and the Solr side of it controlled by those who are 
> experts in Solr to the greatest extent possible. So my desire is that Solr 
> specific stuff go in ZK and machine definitions be controlled by devops. Once 
> the initial setup for type X is done then the solr guy says to devops pls 
> give me 3 more of type X (zk locations are a devops thing btw, they might 
> move zk as they see fit) and when they start, the nodes join the cluster. 
> Solr guy does his thing, twiddles configs to make it hum (within limits, of 
> course, some changes require machine level changes), occasionally requests 
> reboots, and when he doesn't need the machines he says... you can turn off 
> machine A, B and C now. Solr guy doesn't care if it's AMI or docker or that 
> new Flazllebarp thing that devops seem to like for no clear reason other than 
> it's sold to them by TABS (TinyAuspexBananaSoft Inc) who threw it in when 
> they sold them a bunch of other stuff...
>
> The config is packaged with the code because there's no better way for a lot 
> of software out there. Use of Zk to serve up configuration gives us the 
> opportunity to do better (well I think it sounds better YMMV of course).
>
> -Gus
>
> On Fri, Aug 28, 2020 at 2:43 PM Tomás Fernández Löbbe  
> wrote:
>>
>> As for AMIs, you have to do it at least once, right? or are you thinking in 
>> someone using an pre-existing AMI? I see your point for the case of someone 
>> using the official Solr image as-is without any volume mounts I guess. I'm 
>> wondering if trying to put node configuration inside ZooKeeper is another 
>> thing were we try to solve things inside Solr that the industry already 
>> solved differently (AMIs, Docker images are exactly about packaging code and 
>> config)
>>
>> On Fri, Aug 28, 2020 at 11:11 AM Gus Heck  wrote:
>>>
>>> Which means whoever wants to make changes to solr needs to be 
>>> able/willing/competent to make AMI/dockers/etc ... and one has to manage 
>>> versions of those variants as opposed to managing versions of config files.
>>>
>>> On Fri, Aug 28, 2020 at 1:55 PM Tomás Fernández Löbbe 
>>>  wrote:

 I think if you are using AMIs (or Docker), you could put the node 
 configuration inside the AMI (or Docker image), as Ilan said, together 
 with the binaries. Say you have a custom top-level handler (Collections, 
 Cores, Info, whatever), which takes some arguments and it's configured in 
 solr.xml and you are doing an upgrade, you probably want your old nodes 
 (running with your old AMI/Docker image with old jars) to keep the old 
 configuration and your new nodes to use the new.

 On Fri, Aug 28, 2020 at 10:42 AM Gus Heck  wrote:
>
> Putting solr.xml in zookeeper means you can add a node simply by starting 
> solr pointing to the zookeeper, and ensure a consistent solr.xml for the 
> new node if you've customized it. Since I rarely (never) hit use cases 
> where I need different per node solr.xml. I generally advocate putting it 
> in ZK, I'd say heterogeneous node configs is the special case for 
> advanced use here.  I'm a fan of a (hypothetical future) world where 
> nodes can be added/removed simply without need for local configuration. 
> It would be desirable IMHO to have a smooth node add and remove process 
> and having to install a file into a distribution manually after unpacking 
> it (or having coordinate variations of config to be pushed to machines) 
> is a minus. If and when autoscaling is happy again I'd like to be able to 
> start an AMI in AWS pointing at zk (or similar) and have it join 
> automatically, and then receive replicas to absorb load (per whatever 
> autoscaling is specified), and then be able to issue a single command to 
> a node to sunset the node that moves replicas back off of it (again per 
> autoscaling preferences, failing if autoscaling constraints would be 
> violated) and then asks the node to shut down so that the instance in AWS 
> (or wherever) can be shut down safely.

Do we actually merge the GitHub pull requests?

2020-08-28 Thread Alexandre Rafalovitch

I am working on finalizing
https://github.com/apache/lucene-solr/pull/1794 (DIH removal) and when
I looked at it (before conflicts), it did not allow me to merge.

It said instead:
"Only those with write access to this repository can merge pull requests."

Did I have to do some extra linking of my personal and apache
identities? I am already an apache org member and I am pretty sure
(not 100%) that I pushed directly into GitHub repo before. But now
when I am doing it via PR, it seems different.

Can anybody else merge and I can't? Or we just don't do merges?

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr configuration options

2020-08-28 Thread Alexandre Rafalovitch

I am not sure I understand the deeper point of the question, but just
in terms of files
*) Per core: solrconfig.xml can get overridden with config-api and the
overrides go into configoverlay.json
*) Per core: core.properties can store some configurations values that
are used in variable substitutions
*) Per solr.xml: solrcore.properties:
https://lucene.apache.org/solr/guide/6_6/configuring-solrconfig-xml.html#Configuringsolrconfig.xml-solrcore.properties
(never used it myself, apparently deprecated, standalone only)

Regards,
   Alex.

On Fri, 28 Aug 2020 at 07:59, Ilan Ginzburg  wrote:
>
> I want to ramp-up/discuss/inventory configuration options in Solr. Here's my 
> understanding of what exists and what could/should be used depending on the 
> need. Please correct/complete as needed (or point to documentation I might 
> have missed).
>
> There are currently 3 sources of general configuration I'm aware of:
>
> Collection specific config bootstrapped by file solrconfig.xml and copied 
> into the initial (_default) then subsequent Config Sets in Zookeeper.
> Cluster wide config in Zookeeper /clusterprops.json editable globally through 
> Zookeeper interaction using an API. Not bootstrapped by anything (i.e. does 
> not exist until the user explicitly creates it)
> Node config file solr.xml deployed with Solr on each node and loaded when 
> Solr starts. Changes to this file are per node and require node restart to be 
> taken into account.
>
> The Collection specific config (file solrconfig.xml then in Zookeeper 
> /configs//solrconfig.xml) allows Solr devs to set reasonable 
> defaults (the file is part of the Solr distribution). Content can be changed 
> by users as they create new Config Sets persisted in Zookeeper.
>
> Zookeeper's /clusterprops.json can be edited through the collection admin API 
> CLUSTERPROP. If users do not set anything there, the file doesn't even exist 
> in Zookeeper therefore `Solr devs cannot use it to set a default cluster 
> config, there's no clusterprops.json file in the Solr distrib like there's a 
> solrconfig.xml.
>
> File solr.xml is used by Solr devs to set some reasonable default 
> configuration (parametrized through property files or system properties). 
> There's no API to change that file, users would have to edit/redeploy the 
> file on each node and restart the Solr JVM on that node for the new config to 
> be taken into account.
>
> Based on the above, my vision (or mental model) of what to use depending on 
> the need:
>
> solrconfig.xml is the only per collection config. IMO it does its job 
> correctly: Solr devs can set defaults, users tailor the content to what they 
> need for new config sets. It's the only option for per collection config 
> anyway.
>
> The real hesitation could be between solr.xml and Zookeeper 
> /clusterprops.json. What should go where?
>
> For user configs (anything the user does to the Solr cluster AFTER it was 
> deployed and started), /clusterprops.json seems to be the obvious choice and 
> offers the right abstractions (global config, no need to worry about 
> individual nodes, all nodes pick up configs and changes to configs 
> dynamically).
>
> For configs that need to be available without requiring user intervention or 
> needed before the connection to ZK is established, there's currently no other 
> choice than using solr.xml. Such configuration obviously include parameters 
> that are needed to connect to ZK (timeouts, credential provider and hopefully 
> one day an option to either use direct ZK interaction code or Curator code), 
> but also configuration of general features that should be the default without 
> requiring users to opt in yet allowing then to easily opt out by editing 
> solr.xml before deploying to their cluster (in the future, this could include 
> which Lucene version to load in Solr for example).
>
> To summarize:
>
> Collection specific config? --> solrconfig.xml
> User provided cluster config once SolrCloud is running? --> ZK 
> /clusterprops.json
> Solr dev provided cluster config? --> solr.xml
>
>
> Going forward, some (but only some!) of the config that currently can only 
> live in solr.xml could be made to go to /clusterprops.json or another ZK 
> based config file. This would require adding code to create that ZK file upon 
> initial cluster start (to not force the user to push it) and devise a 
> mechanism (likely a script, could be tricky though) to update that file in ZK 
> when a new release of Solr is deployed and a previous version of that file 
> already exists. Not impossible tasks, but not trivial ones either. Whatever 
> the needs of such an approach are, it might be easier to keep the existing 
> solr.xml as a file and allow users to define overrides in Zookeeper for the 
> configuration parameters from solr.xml that make sense to be overridden in ZK 
> (obviously ZK credentials or connection timeout do not make sense in that 
> context, but defining the shard handler implementation

Re: RoadMap?

2020-08-27 Thread Alexandre Rafalovitch

Ok, sounds like UIMA repeat to me.

+1 to take it out and point at one of those other solutions in CHANGES
or whatever.

Regards,
   Alex.

On Thu, 27 Aug 2020 at 20:45, Erick Erickson  wrote:
>
> CDCR does work, kind of. But it requires extensive care and feeding and, as 
> Ishan says, it’s _very_ easy to shoot yourself in the foot. Or run out of 
> disk space. Or get to a state where you have to replicate the index. And 
> “bi-directional” means you can go from A -> B _or_ B -> A, but you can’t 
> index to both A and B at once. Anyone who’s using it invariably rolls their 
> own monitoring to make sure it’s still running. You want “fire and forget” 
> functionality, but that’s not where CDCR is at.
>
> The consequence of not having the monitoring in place is that the tlogs fill 
> up, and then your index can become corrupt. Yes, it’s fixable, but there’s 
> always problem N+1...
>
> I think CDCR could be made acceptable _if_ someone was willing to own it and 
> devote a lot of time to maintenance. But nobody is stepping up to do it, 
> certainly not me. And it’s a side issue, Solr is a search engine. There are 
> solutions out there that are built from the start to deal with keeping 
> separate DCs in sync. Let’s use those rather than a “kinda works” solution.
>
> One of the problems with Solr is that it’s become a hodgepodge of peripheral 
> stuff that somebody found useful at some point. And in a number of instances, 
> capabilities were added to Solr when no other tools were available. But the 
> state of the art have progressed, it’s time to jettison older stuff...
>
> The advantage of CDCR is that it is all contained in Solr, no outside 
> packages required. The disadvantage is that has very few people willing to 
> work on it.
>
> So I’m for taking it out of Solr. My prediction is that if it’s made a 
> package, it’ll languish and at some point become unusable with the 
> then-current version of Solr. And nobody who complains will be willing to 
> devote the time and effort to making it work with Solr X.Y.
>
> FWIW...
>
>
> > On Aug 27, 2020, at 7:50 PM, Ishan Chattopadhyaya 
> >  wrote:
> >
> > It does start. It is broken because it is fraught with dangers of users 
> > shooting themselves in their feet. Some context here: 
> > https://issues.apache.org/jira/browse/SOLR-14616?focusedCommentId=17153129=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17153129
> >
> > On Fri, Aug 28, 2020 at 4:52 AM Alexandre Rafalovitch  
> > wrote:
> > If CDCR is actively broken (does not start?), then isn't it
> > effectively deprecated from the last version that did not work? And if
> > it is not going to be maintained, then isn't the 'latest' version is
> > whichever we still did not delete it in. Because a broken feature is
> > only worth keeping in, if we ever plan to fix it.
> >
> > We have been through the same with UIMA, if I recall. It was broken
> > for a bit and then when I pulled it, ONE person got all upset.
> > SOLR-11694
> >
> > Regards,
> >Alex
> > Ps. I don't know the degree of 'broken' of this specific feature. So,
> > I am mostly talking practical principles here.
> >
> > On Thu, 27 Aug 2020 at 19:03, Ishan Chattopadhyaya
> >  wrote:
> > >
> > > > I find it highly depressing that we can't, *in a major release*, manage 
> > > > to get rid of our deprecations -- particularly for code that has a new 
> > > > home and is packaged in a form that is trivial to install (thanks to 
> > > > our new awesome package manager).
> > >
> > > I'm not sure why you think "we can't". I can't even remember a single 
> > > committer standing in the way of removing those *that already have a 
> > > package*. However, there's a backlash against removing CDCR even though 
> > > there is no one volunteering to support it (as a package) and it is 
> > > clearly broken, which is what totally puzzles me. 
> > > https://issues.apache.org/jira/browse/SOLR-14616
> > >
> > > On Fri, Aug 28, 2020 at 4:19 AM Alexandre Rafalovitch 
> > >  wrote:
> > >>
> > >> Well, I have created SOLR-14783 (Remove DIH from 9.0) and am busily
> > >> learning magic gradle commands to make that happen without leaving
> > >> behind random crumbs.  Once that lands, I will do Jira search on all
> > >> DIH still-open tasks after that and close them pointing to the said
> > >> Jira.
> > >>
> > >> So, I guess somebody better -1 the Jira if they really want that one
> > >> to stay until ... ? And then

Re: RoadMap?

2020-08-27 Thread Alexandre Rafalovitch

If CDCR is actively broken (does not start?), then isn't it
effectively deprecated from the last version that did not work? And if
it is not going to be maintained, then isn't the 'latest' version is
whichever we still did not delete it in. Because a broken feature is
only worth keeping in, if we ever plan to fix it.

We have been through the same with UIMA, if I recall. It was broken
for a bit and then when I pulled it, ONE person got all upset.
SOLR-11694

Regards,
   Alex
Ps. I don't know the degree of 'broken' of this specific feature. So,
I am mostly talking practical principles here.

On Thu, 27 Aug 2020 at 19:03, Ishan Chattopadhyaya
 wrote:
>
> > I find it highly depressing that we can't, *in a major release*, manage to 
> > get rid of our deprecations -- particularly for code that has a new home 
> > and is packaged in a form that is trivial to install (thanks to our new 
> > awesome package manager).
>
> I'm not sure why you think "we can't". I can't even remember a single 
> committer standing in the way of removing those *that already have a 
> package*. However, there's a backlash against removing CDCR even though there 
> is no one volunteering to support it (as a package) and it is clearly broken, 
> which is what totally puzzles me. 
> https://issues.apache.org/jira/browse/SOLR-14616
>
> On Fri, Aug 28, 2020 at 4:19 AM Alexandre Rafalovitch  
> wrote:
>>
>> Well, I have created SOLR-14783 (Remove DIH from 9.0) and am busily
>> learning magic gradle commands to make that happen without leaving
>> behind random crumbs.  Once that lands, I will do Jira search on all
>> DIH still-open tasks after that and close them pointing to the said
>> Jira.
>>
>> So, I guess somebody better -1 the Jira if they really want that one
>> to stay until ... ? And then read very carefully through SIP-10 of
>> which, this is just a first step.
>>
>> In general, maybe we can manage to do so many new features and cleanup
>> in 9 that will make Solr TLP look like a great Big Bang moment...
>>
>> And it will probably take a little longer to achieve that, so the -
>> effective - deprecation schedule would still be ok.
>>
>> Regards,
>>Alex.
>>
>> On Thu, 27 Aug 2020 at 18:35, David Smiley  wrote:
>> >>
>> >> It has been proposed on the list to NOT rip out all deprecations in 9.0, 
>> >> but allow users to upgrade to 9.0 with e.g. SolrCell still available, and 
>> >> then have yet some time to change their processes to adapt to the new way 
>> >> of doing stuff. I like that proposal. Sure, 9.0 will remove lots of 
>> >> deprecated code, but I think it is a mistake to do all of the proposed 
>> >> removals at once. We can spread removals out in 9.x releases, after users 
>> >> have had a few releases with a choice between old and new and the new 
>> >> alternative is solid.
>> >
>> >
>> > I find it highly depressing that we can't, *in a major release*, manage to 
>> > get rid of our deprecations -- particularly for code that has a new home 
>> > and is packaged in a form that is trivial to install (thanks to our new 
>> > awesome package manager).  I'm sympathetic to waiting to delete until 
>> > *after* there is an actual package ready at that time (rather than just 
>> > the promise of one).
>> >
>> > Also, users generally are cautious on performing a major version upgrade.  
>> > There's time.
>> >
>> > ~ David Smiley
>> > Apache Lucene/Solr Search Developer
>> > http://www.linkedin.com/in/davidwsmiley
>> >
>> >
>> > On Wed, Aug 12, 2020 at 4:06 AM Jan Høydahl  wrote:
>> >>
>> >> I edited the page to introduce the (super important) Solr TLP split into 
>> >> the roadmap.
>> >> Also added a rough timeframe and a «major theme» for each release above 
>> >> the issue table.
>> >> I added 8.8 and 9.1 as I think it is important to track what gets done 
>> >> just before 9.0 and what can be deferred to after 9.0.
>> >>
>> >> It has been proposed on the list to NOT rip out all deprecations in 9.0, 
>> >> but allow users to upgrade to 9.0 with e.g. SolrCell still available, and 
>> >> then have yet some time to change their processes to adapt to the new way 
>> >> of doing stuff. I like that proposal. Sure, 9.0 will remove lots of 
>> >> deprecated code, but I think it is a mistake to do all of the proposed 
>> >> removals at once. We can spread removals out in 9.x releases, after users 
>>

Re: RoadMap?

2020-08-27 Thread Alexandre Rafalovitch

Well, I have created SOLR-14783 (Remove DIH from 9.0) and am busily
learning magic gradle commands to make that happen without leaving
behind random crumbs.  Once that lands, I will do Jira search on all
DIH still-open tasks after that and close them pointing to the said
Jira.

So, I guess somebody better -1 the Jira if they really want that one
to stay until ... ? And then read very carefully through SIP-10 of
which, this is just a first step.

In general, maybe we can manage to do so many new features and cleanup
in 9 that will make Solr TLP look like a great Big Bang moment...

And it will probably take a little longer to achieve that, so the -
effective - deprecation schedule would still be ok.

Regards,
   Alex.

On Thu, 27 Aug 2020 at 18:35, David Smiley  wrote:
>>
>> It has been proposed on the list to NOT rip out all deprecations in 9.0, but 
>> allow users to upgrade to 9.0 with e.g. SolrCell still available, and then 
>> have yet some time to change their processes to adapt to the new way of 
>> doing stuff. I like that proposal. Sure, 9.0 will remove lots of deprecated 
>> code, but I think it is a mistake to do all of the proposed removals at 
>> once. We can spread removals out in 9.x releases, after users have had a few 
>> releases with a choice between old and new and the new alternative is solid.
>
>
> I find it highly depressing that we can't, *in a major release*, manage to 
> get rid of our deprecations -- particularly for code that has a new home and 
> is packaged in a form that is trivial to install (thanks to our new awesome 
> package manager).  I'm sympathetic to waiting to delete until *after* there 
> is an actual package ready at that time (rather than just the promise of one).
>
> Also, users generally are cautious on performing a major version upgrade.  
> There's time.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Aug 12, 2020 at 4:06 AM Jan Høydahl  wrote:
>>
>> I edited the page to introduce the (super important) Solr TLP split into the 
>> roadmap.
>> Also added a rough timeframe and a «major theme» for each release above the 
>> issue table.
>> I added 8.8 and 9.1 as I think it is important to track what gets done just 
>> before 9.0 and what can be deferred to after 9.0.
>>
>> It has been proposed on the list to NOT rip out all deprecations in 9.0, but 
>> allow users to upgrade to 9.0 with e.g. SolrCell still available, and then 
>> have yet some time to change their processes to adapt to the new way of 
>> doing stuff. I like that proposal. Sure, 9.0 will remove lots of deprecated 
>> code, but I think it is a mistake to do all of the proposed removals at 
>> once. We can spread removals out in 9.x releases, after users have had a few 
>> releases with a choice between old and new and the new alternative is solid.
>>
>> Thanks Gus for taking ownership and suggesting a process! Feel free to 
>> rework what I edited into a structure you see more fit.
>>
>> Jan
>>
>> 11. aug. 2020 kl. 18:51 skrev Gus Heck :
>>
>> I was thinking that level of detail is in the Jira... I don't see any reason 
>> for things to disappear (in fact rejected should go in a rejected list for 
>> future reference.)
>>
>> On Tue, Aug 11, 2020 at 12:04 PM Ilan Ginzburg  wrote:
>>>
>>> Maybe also add “in progress”? So items do not disappear suddenly from the 
>>> page when work really starts on them?
>>>
>>> On Tue 11 Aug 2020 at 17:15, Gus Heck  wrote:

 Cool, since I brought it up, I can volunteer to help manage the page. We 
 should get jira issue links in there wherever possible. Do we want to 
 build an initial list and have some sort of Proposed/Planned workflow so 
 readers can have confidence (or appropriate lack of confidence) in what 
 they see there? voting on things seems like too much but maybe folks who 
 care watch the page, and if something is on there for a week without 
 objection it can be called accepted? If a discussion starts here it can be 
 marked "Considering" so... something like this:

 4 states: Proposed, Considering, Planned, Rejected

 Workflow like this:
 Proposed ---(no objection 1 wk) --> Planned
 Proposed ---(discussion)--> Considering
 Considering (agreement) --> Planned
 Considering (deferred) ---> Proposed (later release)
 Considering (unsuitable) -> Rejected
 Considering (promoted) ---> Proposed (earlier release)
 Planned (difficulty found) ---> Considering

 Anything in "Considering" should have an active dev list thread, and if it 
 didn't happen on the list it didn't happen :). Any of that (or differences 
 of opinion during Considering) can be overridden by a formal vote of course

 -Gus

 On Tue, Aug 11, 2020 at 10:29 AM Ishan Chattopadhyaya 
  wrote:
>
> I've created a placeholder

Current workflow for steps before commiting changes

2020-08-27 Thread Alexandre Rafalovitch

Hello,

So, what's the current post-changes pre-commit workflow for master (9)?

Do I run gradlew precommit? Does that include actual tests or need
those separately? Do I need to run ant precommit as well?

I am mostly removing things, but need to make sure no dangling
references will cause issues.

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

SIP-10 Improve Getting Started experience

2020-08-26 Thread Alexandre Rafalovitch

Dear all,

Based on the discussion in SOLR-14726, Slack, and in many other JIRA
issues, I am proposing to spearhead the Big Cleanup focused on
improving getting started experience, targeting Solr 9 only. This is
mostly about examples, but it also touches quite heavily on default
configuration files and - less heavily - on the directory layout
issues.

As this is bigger than any of my previous contributions to the project
and is quite cross-cutting, I've made a Solr Improvement Proposal.

https://cwiki.apache.org/confluence/display/SOLR/SIP-10+Improve+Getting+Started+experience

Regards,
Alex.
Ps. As a side note, in many of the discussions, people say "I would
prefer if we did X", but with the reality of "X" not being complete,
actually fully tested or even applicable to the
standalone/cloud/whatever aspect of the discussion. Those preferences
are still welcome, but I am begging for explicit qualification of
their current availability for the critical path. This proposal is an
attempt to minimize dependence on actual new code features.
Pps. I am also very aware that the proposal implies lots of
semi-related changes and I will end up potentially stepping on
people's feet. And I don't yet know the proper flow of work to do this
(individual Jiras, combined Jiras, separate branch(s), etc). I would
really need advice or a mentor or even a group of mentors to make this
happen in a community-minded fashion.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Name for the directory above solr.home?

2020-08-25 Thread Alexandre Rafalovitch

Hello,

What do we call the directory above the solr.home?
E.g.
- "schemaless" in example/schemaless/solr/gettingstarted
- "node1" in example/cloud/node1/solr/gettingstarted_shard1_replica_n2
- "solr" in server/solr/book/conf

In solr.in.cmd we "may be" calling it "solr start dir"
In bin/solr, we call it SOLR_SERVER_DIR
In install_solr-service.sh, we sort-of call it SOLR_VAR_DIR "Directory
for live/writable Solr files", which is not quite the same because
apparently pid files go there, while for distribution they seem to go
into bin

I guess it mostly matters when you have multiple Solr instances
running on the same machine and you need to separate log locations,
etc.

Regards,
  Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: First Issue Label

2020-08-05 Thread Alexandre Rafalovitch

I think we already had some. But nobody really jumped on it. Still, if
somebody wants to monitor it, it can be restarted.

Regards,
Alex

On Wed., Aug. 5, 2020, 11:04 p.m. Marcus Eagan, 
wrote:

> Community,
>
> In the vane of more developer friendly, I think we should create a first
> issue label. In my experience, that label has been a great way to get
> newcomers involved in projects new to them.
>
> I've seen it in a number of Apache projects that I have contributed to,
> proprietary projects, and in CNCF projects.
>
> Please let me know what you think about a first issue label to make it
> easier for people not necessarily in the community looking to join to do so
> in the future.
>
> Thanks,
> --
> Marcus Eagan
>
>

Re: Deprecate Schemaless Mode?

2020-08-05 Thread Alexandre Rafalovitch

As David said, I did a lot of breaking apart of default configuration
and it is a bit of a mess in there. (if anybody wants to review the
breakdown for Solr 6:
https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016,
slide 19 is the kicker)

I certainly agree with others that said that it is very hard for a
user to figure out what a 'production' schema should look like and
they just keep the one we give, including the schemaless part and all.
This seems to crop-up on the User list over and over again.

My +1 is SOLR-11741 (Offline training mode) and on it being an
explicit configuration to let users define their own
chain/type-widening sequence. So, the user would throw a subset (or
all) of the data at a separate end-point and receive back the
suggested schema addition commands to support the data. Perhaps this
learning mode should not live in a default schema either but in a
kitchen sync one that also has all the extra type definitions
(separate discussion, especially since DIH and 5 DIH schemas are going
away as well).

Regards,
   Alex.

On Wed, 5 Aug 2020 at 01:01, David Smiley  wrote:
>
> Thanks for starting this thread Marcus!  For a historical note, the current 
> _default configSet being "data driven" (aka "schemaless", a worse name) is 
> largely because of SOLR-10272  Maybe I should have fought harder against it 
> then.  I threatened to veto but I was placated by it being easily disabled.  
> And it's true; you can disable it, and there are some loud warnings on the 
> CLI so... yeah.
>
> I think my views most align with Gus.  The name "default" is suggestive of 
> good settings you ought to change if you know what you are doing.  Perhaps 
> there simply can be no reasonable "default" for a search platform.  There 
> might be "basic minimal blah blah" etc. that _is_ the default choice if you 
> don't specify it but naming the configSet itself as "default" gives too much 
> blessing to it.  I've seen too many configs with tons of stuff that were 
> there because it was inherited, and then it's hard to guess what's _actually_ 
> being used.  Alexandre Rafalov had done some great work in figuring out how 
> to minimize configs.  There's more to do there.
>
> I'd be happy to see basically any change though; even a simple change from 
> opt-out to opt-in to "data driven" URPs.  I don't like the status quo.
>
> BTW I've also seen people try to take "bin/solr -e cloud" to production :-(   
> "Hey look, this is how a tutorial told me to run SolrCloud" (so the logic 
> goes).
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Aug 4, 2020 at 2:24 PM Jan Høydahl  wrote:
>>
>> Learning mode won’t work if you have 10 existing collections and want to 
>> create #11. We could rather have a SchemaLearningUpdateHandler so people 
>> could explicitly post documents to say  /schema-guess to modify the schema. 
>> We could even have this implicit. Then the _default config would have just 
>> _root_, is and a few more, and if you want guessing you first send a number 
>> of docs to /schema-guess endpoint and then inspect in schema browser what 
>> you got. That handler could support a Parma =true which would wipe the 
>> schema to start guessing from scratch.
>>
>> Jan Høydahl
>>
>> 4. aug. 2020 kl. 15:30 skrev Gus Heck :
>>
>> 
>> Interesting read. Might have changed now that we have authentication 
>> capabilities... but let's not thread jack :)
>>
>> On Tue, Aug 4, 2020 at 8:28 AM Erick Erickson  
>> wrote:
>>>
>>> Having the admin UI allow uploads may not be secure. When I had a similar 
>>> idea a long time ago it got shot down, see the discussion at: 
>>> https://issues.apache.org/jira/browse/SOLR-5287.
>>>
>>> I _think_ this is a different issue if the configs have to be residing on 
>>> the system, not coming in from outside, just FYI...
>>>
>>> > On Aug 3, 2020, at 7:03 PM, Gus Heck  wrote:
>>> >
>>> >
>>> >
>>> > On Mon, Aug 3, 2020 at 5:03 PM Erick Erickson  
>>> > wrote:
>>> > Gus’s point about implementing something before removing it is well 
>>> > taken, but we can deprecate it immediately without removing it. Gus’s 
>>> > point about dynamic fields not being found until later in the cycle is 
>>> > well taken, but not enough to persuade me.
>>> >
>>> > Fair enough :)
>>> >
>>> > I’m not enthusiastic about multiple getting started schemas. The whole 
>>> > motivation behind schemaless is that the user doesn’t need to know about 
>>> > schemas to get started. By providing multiple “getting started” schemas 
>>> > we require them to become aware of schemas again.
>>> >
>>> > Here's my theory (which may or may not be persuasive :) )
>>> >
>>> > My thinking in that suggestion is that the majority of the problem is due 
>>> > to the fact that people new to a technology will tend to latch onto the 
>>> > defaults that come with something as being something that should be held 
>>> > onto until you

Re: [VOTE] Solr to become a top-level Apache project (TLP)

2020-05-18 Thread Alexandre Rafalovitch

+1 (committer)

Regards,
   Alex.

On Tue, 12 May 2020 at 03:37, Dawid Weiss  wrote:
>
> Dear Lucene and Solr developers!
>
> According to an earlier [DISCUSS] thread on the dev list [2], I am
> calling for a vote on the proposal to make Solr a top-level Apache
> project (TLP) and separate Lucene and Solr development into two
> independent entities.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr Admin UI Refresh 2020

2020-04-06 Thread Alexandre Rafalovitch

I always wondered if Solr could benefit from Language Server Protocol:
https://microsoft.github.io/language-server-protocol/ , at least for
the Query screen. That would have allowed us to integrate with a bunch
of tools automatically rather than having a great query implementation
ourselves.

But I don't know how feasible or relevant this is, so mostly just
throwing it out there in case others also thought of it and/or if it
will seem promising as a line of thought.

Regards,
   Alex.

On Mon, 6 Apr 2020 at 10:53, Jan Høydahl  wrote:
>
> Thanks for kickstarting this and bringing some fresh blood and enthusiasm :)
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Eric Pugh as a Lucene/Solr committer

2020-04-06 Thread Alexandre Rafalovitch

Congratulation. That's an awesome news.

Regards,
 Alex

On Mon., Apr. 6, 2020, 8:21 a.m. Jan Høydahl,  wrote:

> Hi all,
>
> Please join me in welcoming Eric Pugh as the latest Lucene/Solr committer!
>
> Eric has been part of the Solr community for over a decade, as a code
> contributor, book author, company founder, blogger and mailing list
> contributor! We look forward to his future contributions!
>
> Congratulations and welcome! It is a tradition to introduce yourself with
> a brief bio, Eric.
>
> Jan Høydahl
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [VOTE] Release Lucene/Solr 8.4.0 RC1

2019-12-18 Thread Alexandre Rafalovitch

If there is a respin, maybe the Changenotes (for Solr) can be fixed too:
1) First entry under Upgrade Notes is missing JIRA and probably does
not need to include the full class path
2) Optimization section can probably be skipped as it just has "no
changes" entry
3) Couple of JIRAs are missing contributor attributions (is it
compulsory? I don't know.)

Also Lucene's one under "build" section is missing one JIRA number.

Regards,
   Alex.

On Wed, 18 Dec 2019 at 14:20, Noble Paul  wrote:
>
> I'm so sorry to come at this moment and tell you that one of the
> critical bug fixes I made to master was not ported to 8x and 8.4.
>
> This breaks a critical functionality of package loading feature.
> Is it possible to do a respin?
> https://github.com/apache/lucene-solr/commit/f98555854cbdb9d396c34a93fde9c1610df74882#diff-04f78a3e0960a743f6b4267e2d0f7f49
>
>
> On Thu, Dec 19, 2019 at 6:11 AM Adrien Grand  wrote:
> >
> > Hi Gus,
> >
> > If the test is flaky, would you mind annotating it with "@BadApple"? It 
> > will make sure this test doesn't get in the way of building or voting on a 
> > release until the test is fixed.
> >
> > On Wed, Dec 18, 2019 at 3:52 PM Gus Heck  wrote:
> >>
> >> Hi Mkhail, This is a known flakey test (mine, it's on my to do list). 
> >> Seems to have got slightly more flakey recently possibly because other 
> >> tests have got better at using up CPU?. The flake here is that the code in 
> >> the test didn't manage to wait long enough before running the assertion. 
> >> This failure does not represent an issue with anything other than the test.
> >>
> >> On Wed, Dec 18, 2019 at 1:06 AM Mikhail Khludnev  wrote:
> >>>
> >>> I've got
> >>>
> >>>   2> NOTE: reproduce with: ant test  
> >>> -Dtestcase=DimensionalRoutedAliasUpdateProcessorTest 
> >>> -Dtests.method=testTimeCat -Dtests.seed=D05700662AF3B95B 
> >>> -Dtests.locale=en-GB -Dtests.timezone=Australia/North -Dt
> >>>
> >>> ests.asserts=true -Dtests.file.encoding=ISO-8859-1
> >>>
> >>> [00:42:59.083] FAILURE 29.4s J1 | 
> >>> DimensionalRoutedAliasUpdateProcessorTest.testTimeCat <<<
> >>>
> >>>> Throwable #1: java.lang.AssertionError: expected:<10> but was:<9>
> >>>
> >>>>at 
> >>> __randomizedtesting.SeedInfo.seed([D05700662AF3B95B:E9AF41D56AD2F530]:0)
> >>>
> >>>>at org.junit.Assert.fail(Assert.java:88)
> >>>
> >>>>at org.junit.Assert.failNotEquals(Assert.java:834)
> >>>
> >>>>at org.junit.Assert.assertEquals(Assert.java:645)
> >>>
> >>>>at org.junit.Assert.assertEquals(Assert.java:631)
> >>>
> >>>>at 
> >>> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.assertCatTimeInvariants(DimensionalRoutedAliasUpdateProcessorTest.java:678)
> >>>
> >>>>at 
> >>> org.apache.solr.update.processor.DimensionalRoutedAliasUpdateProcessorTest.testTimeCat(DimensionalRoutedAliasUpdateProcessorTest.java:196)
> >>>
> >>>
> >>> which didn't reproduce to me when I retry.
> >>>
> >>> +0
> >>>
> >>> On Tue, Dec 17, 2019 at 9:23 PM Adrien Grand  wrote:
> 
>  Please vote for release candidate 1 for Lucene/Solr 8.4.0
> 
>  The artifacts can be downloaded from:
>  https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.4.0-RC1-revc91d36f50efb62e55bcc3a1adc0442b207018670
> 
>  You can run the smoke tester directly with this command:
> 
>  python3 -u dev-tools/scripts/smokeTestRelease.py \
>  https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.4.0-RC1-revc91d36f50efb62e55bcc3a1adc0442b207018670
> 
>  The vote will be open for at least 3 working days, i.e. until 2019-12-20 
>  19:00 UTC.
> 
>  [ ] +1  approve
>  [ ] +0  no opinion
>  [ ] -1  disapprove (and reason why)
> 
>  Here is my +1
> 
>  --
>  Adrien
> >>>
> >>>
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>
> >>
> >>
> >> --
> >> http://www.needhamsoftware.com (work)
> >> http://www.the111shift.com (play)
> >
> >
> >
> > --
> > Adrien
>
>
>
> --
> -
> Noble Paul
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

(Solr) Can we include an example dataset under CC license?

2019-11-02 Thread Alexandre Rafalovitch

Hi,

Somebody more familiar with license situation may be able to help me here.

I think it would be nice if Solr shipped with a more comprehensive
example. I have used and love the generated names from
https://www.fakenamegenerator.com/ . They have good fields, but also
support different languages and scripts (Chinese, Russian, Arabic),
which would be good to showcase Solr's language handling.

They have a license for the generated content:
https://www.fakenamegenerator.com/license.php . It is dual licensed
under GPLv3 or Creative Commons Attribution-Share Alike 3.0 United
States.

Can we ship the content under either of those licenses? Or does it
absolutely have to be public domain or Apache license?

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Duplicate jira emails for watched issues?

2019-11-01 Thread Alexandre Rafalovitch

Yes, I love the watched feature as well and filter all watched emails
into a separate folder. But the filter stopped working and I thought
it was because LUCENE's mail template was different from SOLR somehow.
So, my filters were not triggering.

Turned out to be that I was just getting LUCENE notifications in those
days and the cause was actually different. The cause was enabling LDAP
and the target email address switched from my personal one to the
apache one. So, my filter was being bypassed.

Regards,
   Alex.

On Sat, 2 Nov 2019 at 00:26, Uwe Schindler  wrote:
>
> That’s exactly the same for me.
>
>
>
> All issues go (by filter) to the folder of the issues mailing list. Those I 
> am directly involved or those I explicitly watch go to my personal inbox. So 
> the stuff important to me is getting correct attention.
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> From: David Smiley 
> Sent: Friday, November 1, 2019 2:21 PM
> To: Solr/Lucene Dev 
> Subject: Re: Duplicate jira emails for watched issues?
>
>
>
> Personally, I like this behavior very much.  When I "watch" an issue in JIRA, 
> I want to ensure I am more aware of this issue than some random JIRA I have 
> never even seen before.  I also like auto-watch.  I can manually un-watch an 
> issue if I choose to step away from an issue; though it's very rare for me to 
> take this step.
>
>
> ~ David Smiley
>
> Apache Lucene/Solr Search Developer
>
> http://www.linkedin.com/in/davidwsmiley
>
>
>
>
>
> On Wed, Oct 30, 2019 at 3:08 PM Alexandre Rafalovitch  
> wrote:
>
> Hi,
>
> Is anybody else getting duplicate JIRA notification emails for the
> issues they are watching (but possibly not commented on).
>
> I seem to get two emails, one goes to the iss...@apache.org and is
> marked to be part of the issues list (and I guess I am BCCed) and
> another goes from the commenter directly to me with no list markings.
>
> This _may_ only be happening for LUCENE Jiras only.
>
> This second email  makes it very hard to create filtering rules and so
> every comment on those issues bring the conversation back to my inbox.
>
> Does anybody see/deal with this?
>
> Thanks,
>   Alex.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Duplicate jira emails for watched issues?

2019-10-30 Thread Alexandre Rafalovitch

Hi,

Is anybody else getting duplicate JIRA notification emails for the
issues they are watching (but possibly not commented on).

I seem to get two emails, one goes to the iss...@apache.org and is
marked to be part of the issues list (and I guess I am BCCed) and
another goes from the commenter directly to me with no list markings.

This _may_ only be happening for LUCENE Jiras only.

This second email  makes it very hard to create filtering rules and so
every comment on those issues bring the conversation back to my inbox.

Does anybody see/deal with this?

Thanks,
  Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Which is the most convenient technique to convert NSF to PST file?

2019-10-29 Thread Alexandre Rafalovitch

This is spam, right? Designed to show in the web archives. We should -
at least - block the user.

On the other hand, if somebody was actually interested in extracting
information from Lotus Notes, I do have an open-source tool for that.
https://github.com/arafalov/Lotus-Notes-Exporter

On Tue, 29 Oct 2019 at 21:29, kellyjohnson1  wrote:
>
>  As per the suggestions and experience of the majority of Lotus Notes users,
> the most convenient technique to  convert NSF to PST

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 8.3.0 RC1

2019-10-21 Thread Alexandre Rafalovitch

Super minor documentation note:
In the HTML changes file, the parsing of - I guess - SOLR-12368 record
makes the authors information become a separate point.
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.3.0-RC1-revd796eca84dbabe3ae9b3c27afc01ef3bee35acb1/solr/changes/Changes.html#v8.3.0.improvements

Regards,
   Alex.

On Mon, 21 Oct 2019 at 13:51, Ishan Chattopadhyaya
 wrote:
>
> Please vote for release candidate 1 for Lucene/Solr 8.3.0
>
> The artifacts can be downloaded from:
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.3.0-RC1-revd796eca84dbabe3ae9b3c27afc01ef3bee35acb1
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.3.0-RC1-revd796eca84dbabe3ae9b3c27afc01ef3bee35acb1
>
> The vote will be open for at least 3 working days, i.e. until
> 2019-10-24 18:00 UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Access to SOLR-13158

2019-10-06 Thread Alexandre Rafalovitch

Hi,

I am unable to see SOLR-13158 (security issue). I am guessing it was
supposed to be released in 8.1.2 (as per release notes) , which became
8.2 and is now released.

I can't tell if I cannot see it:
1) because its permissions were not fixed due to 8.1.2/8.2.0 confusion
2) It is protected and only PMC can see it (so by design)
3) It is protected and a committer should see, but my LDAP link is
messed up (which may be the case, I can't tell).

Hopefully it is 2) and no actions are required. Maybe somebody with
higher/different privileges can resolve this puzzle for me.

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Rethinking how we publish the Solr Ref Guide

2019-09-18 Thread Alexandre Rafalovitch

+1 on the suggested process. +1 on PDF just being too big, though it
is fun to quote the page count.

An additional idea piggy-backing on this is that in step 4, we could
also automatically build a local example/index that links to the
public version. So, people could search the guide locally and that
would link to the known public URLs for the real HTML.

Regards,
   Alex.

On Wed, 18 Sep 2019 at 12:07, Cassandra Targett  wrote:
>
> The delays getting the Ref Guides for 8.x releases out have caused me to 
> think a bit about the Ref Guide publication process. It seems clear others 
> aren't able to pick up the process when I can't and I’m sure there are a 
> million individual reasons for that so I don't intend to shame or blame 
> anyone, but a process that relies on a single person in a community our size 
> isn’t a very good one. And, if I think about _why_ we have a process like we 
> have today [1], I’m not sure it makes a ton of sense any longer.
>
> So, I propose making some radical changes. My ideas here require a shift from 
> thinking of the Guide as a release artifact like the binaries to thinking of 
> it similar to how we treat javadocs. These ideas also allow us to finally get 
> to the goal of unifying these currently separate processes.
>
> 1. Make the HTML version the “official” version.
> -- What to do with the PDF is TBD after that decision, see below.
>
> 2. Stop voting for the Ref Guide release as a separate VOTE thread.
>
> 3. Jenkins jobs are already created when a release branch is cut. We can 
> change these jobs so they always automatically push the HTML version to the 
> website, although before the version binaries are released the pages would 
> still have a DRAFT watermark across them [2].
> -- By ASF policy, release artifacts must be produced on a machine controlled 
> by the committer. However, the point here is that the Ref Guide would no 
> longer be a release artifact, so I think that gets us around that rule? If 
> anyone sees this differently that would change things here a little bit.
> -- I know other projects have similar Jenkins->publish workflows, but I’m not 
> sure exactly what’s involved in setting it up. Might need to discuss with the 
> Infra team and other changes may be required depending.
> -- The goal, though, is to automate this as much as possible.
>
> 4. When a VOTE has passed, a simple step could be added to the release 
> process to run a Jenkins job to regenerate the HTML pages without the current 
> DRAFT watermark and automatically push them to the production website.
> -- Since we usually leave branch jobs configured-but-disabled for a little 
> bit in case a patch release is necessary, typos or other things fixed 
> “post-release" could be fixed and the Ref Guide Jenkins job would just push 
> new commits to the branch to the live production site.
> -- These updates would be done without the DRAFT status, since the Ref Guide 
> in that branch is now considered “live”.
> -- This part of the idea would allow us to more easily backport any docs 
> changes and re-publish the Guide without having to do a new vote, which we 
> would need today. This might be rare, but it is a question that comes up from 
> time-to-time. I feel that if the publication process was easier, we might fix 
> things retroactively more often.
>
> 5. Some tooling would be nice to automate parts of the copy edit process I do 
> today, so it can be run by any committer at any point in the process. This 
> can follow on later. I have a list.
>
> So, that's the idea in a nutshell - thoughts?
>
> [1] Current release process: 
> https://lucene.apache.org/solr/guide/8_1/how-to-contribute.html#ref-guide-publication-process
> [2] Example of DRAFT watermark (it's all CSS, it could look however we want 
> it to): 
> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/
>
> PS, As for the PDF, I believe there are mixed opinions about it. Some rely on 
> it, others only use it when they need it (portability, easier to search, 
> etc.), others don’t ever look at it. The fact is it’s over 1600 pages, and 
> that’s just really too big. Joel is about to add a significant number of new 
> images as part of a new "visual" guide (see SOLR-13105), which will make it 
> even longer and bigger. Trying to split it to make it smaller would bring in 
> a lot of complexity with how to deal with links between pages that end up in 
> different PDF files (believe me, I've done it before). And finally, it holds 
> us back a little - some things we could do with HTML/JS can't be done in PDF. 
> I’d be fine continuing to produce it, just not as our main artifact. We could 
> have Jenkins push that also to the SVN dist/dev repo where it currently lives.
>
> Cassandra
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [POLL] Should notifications of NEW Jira issues go to dev@?

2019-09-18 Thread Alexandre Rafalovitch

Either one of "Created" notifications works for me and would be very nice:
> [X] A mail to dev@ for every new JIRA
> [X] One daily digest mail per day with a list of new JIRAs

I sort of had that workflow already with GMail rules, so it would be
nice to have it more explicitly. And, the subscribed issues still
arrive directly anyway, so nothing needs to be changed there.

Regards,
   Alex.

On Wed, 18 Sep 2019 at 05:10, Jan Høydahl  wrote:
>
> Hi,
>
> The transition to issues@ and builds@ lists (LUCENE-8951) is now completed, 
> and I already enjoy a quieter dev@ folder!
>
> I'd like to check with all of you whether there is interest in getting 
> notified here at dev@ about NEW Jira issue created. Currently there is an 
> average of 4 new issues per day. The main motivation for this would be for 
> those who want to follow new development but not all the details/discussions. 
> We could easily configure JIRA to send all [Created] mails to dev@ in 
> addition to issues@. Or we could try to have one daily digest mail of new 
> issues, whether that's a small bot or a feature in JIRA (don't know). Let's 
> to a poll:
>
> [ ] Leave it as is - I like quiet
> [ ] A mail to dev@ for every new JIRA
> [ ] One daily digest mail per day with a list of new JIRAs
> [ ] Other (explain): ___
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13593) Allow to look-up analyzer components by their SPI names in field type configuration

2019-08-29 Thread Alexandre Rafalovitch (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918841#comment-16918841
 ] 

Alexandre Rafalovitch commented on SOLR-13593:
--

Do we have a list of all those *names* somewhere, accessible to a non-developer?
How would a user find additional documentation on properties for those names. 
Or if they wanted to compose their own chain?

Previously, they searched/checked Javadoc by the class name. What would they do 
now?

> Allow to look-up analyzer components by their SPI names in field type 
> configuration
> ---
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13593-add-spi-ReversedWildcardFilterFactory.patch, 
> SOLR-13593.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> 
>   
> 
>  />
> 
>   
> 
> {code}
> would be
> {code:xml}
> 
>   
> 
> 
> 
>   
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-9894) Tokenizer work randomly

2019-08-14 Thread Alexandre Rafalovitch (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch closed SOLR-9894.
---

> Tokenizer work randomly
> ---
>
> Key: SOLR-9894
> URL: https://issues.apache.org/jira/browse/SOLR-9894
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 6.2.1
> Environment: solrcloud 6.2.1(3 solr nodes)
> OS:linux
> RAM:8G
>Reporter: 王海涛
>Priority: Critical
>  Labels: patch
> Attachments: step1.png, step2.png, step3.png, step4.png
>
>
> my schema.xml has a fieldType as folow:
> 
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
>class="org.wltea.pinyin.solr5.PinyinTokenFilterFactory" pinyinAll="true" 
> minTermLength="2"/> 
>   
>   
>   
>class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
>  
>   
>   
> Attention:
>   index tokenzier useSmart is false
>   query tokenzier useSmart is true
> But when I send query request with parameter q ,
> the query tokenziner sometimes useSmart equals true
> sometimes useSmart equal false.
> That is so terrible!
> I guess the problem may be caught by tokenizer cache.
> when I query ,the tokenizer should use true as the useSmart's value,
> but it had cache the wrong tokenizer result which created by indexWriter who 
> use false as useSmart's value.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Separate dev mailing list for automated mails?

2019-08-08 Thread Alexandre Rafalovitch

I apply the following (gmail) rules, just in case it helps somebody.
With this combination, I am able to track human conversations
reasonably well.

Human conversation:
Matches: from:(-g...@apache.org) subject:(-[jira]) list:
Do this: Skip Inbox, Apply label "ML/Lucene-dev"

All JIRA issues, regardless of other filters
Matches: subject:([jira] {SOLR- LUCENE-}) list:"dev.lucene.apache.org"
Do this: Skip Inbox, Apply label "ML/Lucene-jira", Never send it to Spam

New JIRA issues (that I check to see if I want to track/comment before
I remove the label)
Matches: subject:("[Created]") list:()
Do this: Skip Inbox, Apply label "ML/Lucene-Jira-Interesting", Never
send it to Spam

Updates on JIRA issues from me (I already know them)
Matches: from:(Alexandre Rafalovitch (JIRA) )
Do this: Skip Inbox, Mark as read, Star it, Apply label "Solr-Jiras"

All JIRA issues I am involved in or marked to track
Matches: from:(j...@apache.org) to:(arafa...@gmail.com)
Do this: Skip Inbox, Apply label "Solr-Jiras"

Delete JENKINS stuff, as I am currently not contributing
Matches: subject:([JENKINS]) list:()
Do this: Delete it

Git emails that I am not really tracking right now, but do keep
Matches: from:(g...@apache.org) list:()
Do this: Skip Inbox, Mark as read, Apply label "ML/Lucene-GitBox",
Never send it to Spam

Moderation emails I help with
Matches: subject:(MODERATE for solr-u...@lucene.apache.org)
Do this: Skip Inbox, Apply label "Solr-Moderate"

Matches: list:""
Do this: Skip Inbox, Apply label "ML/SolrUsers"

Regards,
Alex.

On Wed, 7 Aug 2019 at 07:54, David Smiley  wrote:
>
> It's a problem.  I am mentoring a colleague who is stressed with the prospect 
> of keeping up with our community because of the volume of email, and so it's 
> a serious barrier to community involvement.  I too have email filters to help 
> me, and it took some time to work out a system.  We could share our filter 
> descriptions for this with workflow?  I'm sure I could learn from you all on 
> your approaches, and new collaborators would appreciate this advise.
>
> I think automated builds (Jenkins/CI) could warrant its own list.  Separate 
> lists would make setting up email filters easier in general.
>
> I like the idea of a list, like dev, but which does not include JIRA comments 
> or GH code review comments, and does not include Jenkins/CI  This would be a 
> good way for potential contributors to have a light-weight way of getting 
> involved.  If they are involved or interested in specific issues, they can 
> "watch" / "subscribe" to JIRA/GH issues and consequently they will get direct 
> notifications from those systems.  Then people who choose to get more 
> involved, like us, can subscribe to the other list(s).
>
> We do have instances where "ASF subversion and git services" can be excessive 
> due to feature branches that ought not to generate JIRA posts to unrelated 
> issues, and I think we should work to prevent that.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Aug 7, 2019 at 7:01 AM Tomoko Uchida  
> wrote:
>>
>> Hi
>>
>> +1 for separated list(s) for JIRA/Github updates and Jenkins jobs.
>> While I myself am not in trouble with assorting the mails thanks to
>> gmail filters, I know an user (external dev) who unsubscribed this
>> list. The one reason is the volume of the mail flow :)
>>
>> Tomoko
>>
>> 2019年8月7日(水) 8:17 Jan Høydahl :
>> >
>> > Hi
>> >
>> > The mail volume on dev@ is fairly high, betwen 2500-3500/month.
>> > To break down the numbers last month, see 
>> > https://lists.apache.org/trends.html?dev@lucene.apache.org:lte=1M:
>> >
>> > Top 10 participants:
>> > -GitBox: 420 emails
>> > -ASF subversion and git services (JIRA): 351 emails
>> > -Apache Jenkins Server: 261 emails
>> > -Policeman Jenkins Server: 234 emails
>> > -Munendra S N (JIRA): 134 emails
>> > -Joel Bernstein (JIRA): 84 emails
>> > -Tomoko Uchida (JIRA): 77 emails
>> > -Jan Høydahl (JIRA): 52 emails
>> > -Andrzej Bialecki (JIRA): 47 emails
>> > -Adrien Grand (JIRA): 46 emails
>> >
>> > I have especially noticed how every single GitHub PR review comment 
>> > triggers its own email instead of one email per review session.
>> > Also, every commit/push triggers an email since a bot adds a comment to 
>> > JIRA for it.
>> >
>> > Personally I think the ratio of notifications vs human emails is a bit too 
>> > high. I fear external devs who just want to fo

[jira] [Commented] (SOLR-13652) Remove update from initParams in example solrconfig files that only mention "df"

2019-07-25 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892870#comment-16892870
 ] 

Alexandre Rafalovitch commented on SOLR-13652:
--

There is two reference to update there. One just update and one wildcard. The 
proposal is to remove both? 

I wonder if the wildcard option hits some sort of use-case that we may not 
remember about.

> Remove update from initParams in example solrconfig files that only mention 
> "df"
> 
>
> Key: SOLR-13652
> URL: https://issues.apache.org/jira/browse/SOLR-13652
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Priority: Minor
>  Labels: easyfix, newbie
>
> At least some of the solrconfig files we ship have this entry:
>  path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse,update">
>     
>   text
>     
>   
>  
> which has lead at least one user to wonder if there's some kind of automatic 
> way to have the df field populated for updates. I don't even know how you'd 
> send an update that didn't have a specific field. We should remove the 
> "update/**".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr or Lucene stickers?

2019-07-10 Thread Alexandre Rafalovitch

I have Solr one. I think there may have been a Lucene one as well.

No idea how to get them outside of ApacheCon though.

Regards,
 Alex

On Wed, Jul 10, 2019, 10:57 PM David Smiley, 
wrote:

> Does anyone know if Lucene or Solr stickers are made available anywhere,
> perhaps by the ASF?  Not only would I like some but some colleagues of mine
> inquired.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>

[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-07-09 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881276#comment-16881276
 ] 

Alexandre Rafalovitch commented on LUCENE-8883:
---

[~dsmiley] I was more referring to the fact that the change entries have 
several possible formats, but they are kind of implicit. It could be nice to 
have those explicit, so people copy/past/fill-in blanks. But I realize that 
this may be better belonging to the internal documentation that we will have 
once the Wiki is migration to the new Guide infrastructure.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-8883.patch
>
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-28 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874944#comment-16874944
 ] 

Alexandre Rafalovitch commented on SOLR-13571:
--

We could definitely do a sitemap.

But also, we could update the redirect list and see if that makes a lot of 
difference. I had a quick look in the infra repo and it seems to be two files: 
(solr_id_to_new.map.txt and solr_name_to_new.map.txt). This seems to correspond 
to those we generated in SOLR-10595. So perhaps we just need to review those 
files for target file name changes (may be 99% same) and ask Infra to refresh 
files with new URL base of 8.1. 

Also, if we could get access to the Google Webmaster tools, that would be nice. 
It can be done by publishing a file to the server, can we do that outside of a 
full publication process.

Finally, if we republish 6.6 with additional canonical header pointing to 
latest (or 8.1 or whatever), this may also refocus the search ranking. The work 
for that would probably be identical to that required to redo the maps. 


> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-27 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874253#comment-16874253
 ] 

Alexandre Rafalovitch commented on SOLR-13571:
--

I guess, one place to start thinking this through is on how important it is 
that users find the reference manual. As a reference, Stack Overflow (and rest 
of the network) have more focus on being discovered by Google than on their 
internal engines. Obviously, they have too, as that's where money and attention 
is. But it is still an interesting explicit goal post.

For us, if the users cannot find a relevant reference guide page quickly, they 
may
* think a particular feature does not exist
* join and ask on the User Mailing list
* discover the reference guide in general and browse through it
* discover the reference guide and use our - still limited - internal search

None of the options above seem optimal compared to leveraging the public search 
engine. But then, we have to worry about SEO. Clearly, the current SEO works 
well enough to get us to the 6.6 version of the guide and - very importantly - 
to a somewhat relevant page. Switching that to be a single target page would be 
easier for us, but may cost a lot of SEO. And, frankly, I am not at all sure 
that our guide is SEO-friendly enough on its own. I just did a search for 
MappingCharFilterFactory (as an example) and 6.6 RefGuide is at the top 
followed by (old) Javadoc, (old) Wiki, two source-code class links and then 
random websites and blogs. Latest version link just does not seem to appear in 
the first couple of pages (though 7.x clone of the RefGuide on some Chinese 
community site does).

I suspect that Google is detecting multiple guide versions as duplicate content 
and therefore only displays one version and the 6.6 version has more weight due 
to redirects. But if we remove/collapse that link, I am not sure if the 
correct/latest version of the manual will be picked up. This feels risky to me. 

I don't know what the optimal solution is, given the limited resources 
available for this part of the project. I am just really worried that lost 
Google ranking is hard to get back. Perhaps, as a minimum step, we could just 
refresh the URL map periodically to use whatever latest version is.

> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8883) CHANGES.txt: Auto add issue categories on new releases

2019-06-26 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873367#comment-16873367
 ] 

Alexandre Rafalovitch commented on LUCENE-8883:
---

I wonder if we could also include the templates for various ways to make the 
actual change entry (single JIRA, multi JIRA, multi-users, hattip, username vs 
real names, etc). The ideal thing would have been to be able to completely 
parse the README entries, instead of regexp hack them as happens now for HTML 
conversion.

> CHANGES.txt: Auto add issue categories on new releases
> --
>
> Key: LUCENE-8883
> URL: https://issues.apache.org/jira/browse/LUCENE-8883
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> As I write this, looking at Solr's CHANGES.txt for 8.2 I see we have some 
> sections: "Upgrade Notes", "New Features", "Bug Fixes", and "Other Changes".  
> There is no "Improvements" so no surprise here, the New Features category 
> has issues that ought to be listed as such.  I think the order vary as well.  
> I propose that on new releases, the initial state of the next release in 
> CHANGES.txt have these sections.  They can easily be removed at the upcoming 
> release if there are no such sections, or they could stay as empty.  It seems 
> addVersion.py is the code that sets this up and it could be enhanced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13548) Migrate Solr's Moin wiki to Confluence

2019-06-20 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868563#comment-16868563
 ] 

Alexandre Rafalovitch commented on SOLR-13548:
--

I think discarding Google juice would be very hurtful. My suggestion would be 
instead to update the redirects to the whatever the latest guide is. This may 
(or may not) involve having to compare the redirect map to the current 8.x 
pages list.

The alternative/additive option could be to regenerate all the previous guide 
versions and add [canonical 
link|https://en.wikipedia.org/wiki/Canonical_link_element] to the latest one 
(where they match). 

And yes, the "latest" url could be nice. May even help with the canonical 
references by providing stable end-point.

> Migrate Solr's Moin wiki to Confluence
> --
>
> Key: SOLR-13548
> URL: https://issues.apache.org/jira/browse/SOLR-13548
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: SolrCwikiPages.txt, SolrMoinTitles.txt, 
> create_dummy_confluence_pages.py
>
>
> We have a deadline end of June to migrate Moin wiki to Confluence.
> This Jira will track migration of Solr's [https://wiki.apache.org/solr/] over 
> to [https://cwiki.apache.org/confluence/display/SOLR]
> The old Confluence space currently hosts the old Reference Guide for version 
> 6.5 before we moved to asciidoc. This will be overwritten.
> Steps:
>  # Delete all pages in current SOLR space
>  ## Q: Can we do a bulk delete ourselves or do we need to ask INFRA?
>  # The rules in {{.htaccess}} which redirects to the 6.6 guide will remain as 
> is
>  # Run the migration tool at 
> [https://selfserve.apache.org|https://selfserve.apache.org/]
>  # Add a clearly visible link from front page to the ref guide for people 
> landing there for docs
> After migration we'll clean up and weed out what is not needed, and then 
> start moving developer-centric content into the main git repo (which will be 
> covered in other JIRAs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13548) Migrate Solr's Moin wiki to Confluence

2019-06-20 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868446#comment-16868446
 ] 

Alexandre Rafalovitch commented on SOLR-13548:
--

For the SolrMoin titles, if we could get a similar listing of another project, 
all the shared names are probably the system ones. May be an easy way to cull 
at least 30% of them...

I can also confirm that all 7 Russian pages are just System information (Help 
mostly).

> Migrate Solr's Moin wiki to Confluence
> --
>
> Key: SOLR-13548
> URL: https://issues.apache.org/jira/browse/SOLR-13548
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: SolrCwikiPages.txt, SolrMoinTitles.txt, 
> create_dummy_confluence_pages.py
>
>
> We have a deadline end of June to migrate Moin wiki to Confluence.
> This Jira will track migration of Solr's [https://wiki.apache.org/solr/] over 
> to [https://cwiki.apache.org/confluence/display/SOLR]
> The old Confluence space currently hosts the old Reference Guide for version 
> 6.5 before we moved to asciidoc. This will be overwritten.
> Steps:
>  # Delete all pages in current SOLR space
>  ## Q: Can we do a bulk delete ourselves or do we need to ask INFRA?
>  # The rules in {{.htaccess}} which redirects to the 6.6 guide will remain as 
> is
>  # Run the migration tool at 
> [https://selfserve.apache.org|https://selfserve.apache.org/]
>  # Add a clearly visible link from front page to the ref guide for people 
> landing there for docs
> After migration we'll clean up and weed out what is not needed, and then 
> start moving developer-centric content into the main git repo (which will be 
> covered in other JIRAs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13548) Migrate Solr's Moin wiki to Confluence

2019-06-20 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868444#comment-16868444
 ] 

Alexandre Rafalovitch commented on SOLR-13548:
--

A related but tangential comment. Our strongest Google Reference guide all show 
version 6.6. I never understood why, but I guess the confluence redirects are 
that reason. I am not sure if removing those redirects will help surface latest 
content or will bury Solr Reference guide even further.

> Migrate Solr's Moin wiki to Confluence
> --
>
> Key: SOLR-13548
> URL: https://issues.apache.org/jira/browse/SOLR-13548
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: SolrCwikiPages.txt, SolrMoinTitles.txt, 
> create_dummy_confluence_pages.py
>
>
> We have a deadline end of June to migrate Moin wiki to Confluence.
> This Jira will track migration of Solr's [https://wiki.apache.org/solr/] over 
> to [https://cwiki.apache.org/confluence/display/SOLR]
> The old Confluence space currently hosts the old Reference Guide for version 
> 6.5 before we moved to asciidoc. This will be overwritten.
> Steps:
>  # Delete all pages in current SOLR space
>  ## Q: Can we do a bulk delete ourselves or do we need to ask INFRA?
>  # The rules in {{.htaccess}} which redirects to the 6.6 guide will remain as 
> is
>  # Run the migration tool at 
> [https://selfserve.apache.org|https://selfserve.apache.org/]
>  # Add a clearly visible link from front page to the ref guide for people 
> landing there for docs
> After migration we'll clean up and weed out what is not needed, and then 
> start moving developer-centric content into the main git repo (which will be 
> covered in other JIRAs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr Developer content

2019-06-17 Thread Alexandre Rafalovitch

Does it need to go to Confluence? I know Apache built an export tool,
but is there a way to dump the whole thing into a text archive? Or
both :-)

I am wondering if this could be a good opportunity for dog-fooding.
Load the wiki export into Solr, cross-match against RefGuide, manually
inspect the differences, etc. I already have the code for pulling the
Ref-Guide into Solr split at the lowest-header level. A similar thing
could be done for Wiki and then we do "more like this" or "similarity"
or some such.

Also, how much non-Javadoc information actually exists for Lucene? How
many pages would a Lucene branch have. An honest question, I never
really paid attention to the documentation proportions.

Regards,
   Alex.


On Mon, 17 Jun 2019 at 17:32, David Smiley  wrote:
>
> Great plan, Jan!
>
> A sticky bit of this I think is how to remove old stuff.  It's easy to keep 
> content around forever but it gets stale and clutters things up with better 
> content.  Maybe if I/someone wants to remove content, we send out a proposal 
> to the list with links for easy peer review of what's at stake, and with a 
> justification.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Jun 17, 2019 at 4:22 PM Joel Bernstein  wrote:
>>
>> +1 for more asciidoc guides. I find these to be extremely useful anytime I 
>> run across these on projects.
>>
>> I'd be happy to add developer level docs in Streaming Expressions / Math 
>> Expressions.
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Mon, Jun 17, 2019 at 4:18 PM Jan Høydahl  wrote:
>>>
>>> Hi devs,
>>>
>>> Today we have mainly two sources of developer documentation (apart from 
>>> Javadoc and refGuide):
>>>
>>> * The websites. Very short instructions and linking to WIKI for in-depth
>>> * The old Moin wikis at wiki.apache.org with more details
>>>
>>> Soon the old Moin wiki is being discontinued and I plan to migrate that 
>>> content to Confluence this week, see 
>>> https://issues.apache.org/jira/browse/LUCENE-8858 and 
>>> https://issues.apache.org/jira/browse/SOLR-13548
>>>
>>> So the first step will be to just start using Confluence instead of Moin. 
>>> Help appreciated with the cleanup once the first migration is done in the 
>>> two JIRAs above. A LOT of the content in old WIKIs is outdated and a big 
>>> cleanup once this is in Confluence is highly needed!
>>>
>>>
>>> Someone has also suggested to move most developer resources found in the 
>>> WIKI into the main GIT code tree, so you have it right there with your git 
>>> clone. What I want to discuss here is more detailed how that would look 
>>> like and what info to move over.
>>>
>>> One idea is to create one or more Asciidoc guides in the source tree, e.g
>>>
>>> * /dev-docs : Common info i.e. Git, Pull requests, building, doing releases 
>>> etc. Publish in TLP site
>>> * /lucene/dev-guide : Lucene-specific developer content. Publish in Lucene 
>>> web site
>>> * /solr/dev-guide : Solr-specific developer content. Publish in Solr web 
>>> site
>>>
>>> These will be built with Jekyll by Jenkins, into nice HTML guides and 
>>> published to the web sites.
>>>
>>> There may be other ways to do this as well, such as creating a new git repo 
>>> for dev docs, but I think people have good experience from Solr's ref-guide 
>>> with keeping code and docs in sync. What do you think?
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: Apache Solr Reference Guide for Solr 8.1

2019-06-12 Thread Alexandre Rafalovitch

Ok, I just run a public link checker (random one:
https://www.drlinkcheck.com) and it found 27 dead link, for various
reasons, some not actually dead.

But one very interesting thing it did find is that Solr Package
"org.apache.solr.api" is - for some reason - not present on the public
Javadoc site, even though it is clearly in the index. Very easy way to
see that is at:
https://lucene.apache.org/solr/8_1_1/solr-core/overview-summary.html
(the 3rd package from the top). It just redirects to "not-found". Same
for all of the classes in that package.

Very strange. And it affects older 8.0 release as well I think. So,
perhaps something in the release process catches it, a wrong regexp or
some such.

Regards,
   Alex.

On Wed, 12 Jun 2019 at 16:30, Cassandra Targett  wrote:
>
> There is a link validation that runs as part of the build to be sure Javadoc 
> links will resolve, but there is no regular mechanism of link checking that 
> occurs as far as I’m aware. Not that it shouldn’t happen, someone just needs 
> to make it happen.
>
> Cassandra
> On Jun 12, 2019, 10:08 AM -0500, Alexandre Rafalovitch , 
> wrote:
>
> A question (not a vote unfortunately). Do we run Link Checkers on
> release or even once in a while?
>
> I found one dead link on
> (https://lucene.apache.org/solr/guide/8_1/errata.html), the
> self-reference uses 8.1 instead 8_1 in the URL. But the issue is
> probably bigger.
>
> Regards.
> Alex.
>
> On Wed, 12 Jun 2019 at 10:47, Cassandra Targett  wrote:
>
>
> Please vote to release the Solr Reference Guide for 8.1
>
> The PDF artifacts can be downloaded from:
> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-8.1-RC1/
>
> $ cat apache-solr-ref-guide-8.1.pdf.sha512
> cc76882fb3061fa03d1aa291d9705c1df17f948ff47f3f7d6a18e8ddef907f1c74f078ed482f7f5e04b7c6779a88ad85297cd31ae03570db2acc5930ba2feaf0
>  apache-solr-ref-guide-8.1.pdf
>
> The HTML version is also available: https://lucene.apache.org/solr/guide/8_1
>
> Here's my +1.
>
> Thanks,
> Cassandra
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: Apache Solr Reference Guide for Solr 8.1

2019-06-12 Thread Alexandre Rafalovitch

A question (not a vote unfortunately). Do we run Link Checkers on
release or even once in a while?

I found one dead link on
(https://lucene.apache.org/solr/guide/8_1/errata.html), the
self-reference uses 8.1 instead 8_1 in the URL. But the issue is
probably bigger.

Regards.
   Alex.

On Wed, 12 Jun 2019 at 10:47, Cassandra Targett  wrote:
>
> Please vote to release the Solr Reference Guide for 8.1
>
> The PDF artifacts can be downloaded from:
> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-8.1-RC1/
>
> $ cat apache-solr-ref-guide-8.1.pdf.sha512
> cc76882fb3061fa03d1aa291d9705c1df17f948ff47f3f7d6a18e8ddef907f1c74f078ed482f7f5e04b7c6779a88ad85297cd31ae03570db2acc5930ba2feaf0
>   apache-solr-ref-guide-8.1.pdf
>
> The HTML version is also available: https://lucene.apache.org/solr/guide/8_1
>
> Here's my +1.
>
> Thanks,
> Cassandra

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13540) Is it possible configure a single data-config.xml file for all the environments?

2019-06-12 Thread Alexandre Rafalovitch (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch resolved SOLR-13540.
--
Resolution: Invalid

This is not a correct usage of issue tracker for Solr project. We use it to 
track bugs and issues.

The correct location for this question is Solr Users mailing list. Especially 
because what you are asking is possible in several different ways and worth a 
discussion.

In short points:
* DIH is no longer recommended for the production environments (never really 
was)
* You can pass environmental variables into most of Solr and you can define 
those IIRC on command line, in solrconfig.xml or in core.properties
* You may also be able to use JDBC pools and use Java standard ways to separate 
environments for that (this may or may not work with recent Solr versions, due 
to bundling of Jetty)

> Is it possible configure a single data-config.xml file for all the 
> environments?
> 
>
> Key: SOLR-13540
> URL: https://issues.apache.org/jira/browse/SOLR-13540
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.5
>Reporter: Hugo Rodriguez
>Priority: Major
>
> Hi
> I need to configure a single data-config.xml file in solr for SAS AML 7.1. I 
> have three environments: Development, quality and production, and you know 
> the first lines in a data-config.xml file is for connection to a database 
> (database name, database server, port, user, password, etc). According to 
> this, is it possible to configure only one file (data.config.xml) that 
> dinamically connects for each of the databases in all environments?
> Thanks for all your answers
> Best regards
> Hugo Rodriguez R



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-13540) Is it possible configure a single data-config.xml file for all the environments?

2019-06-12 Thread Alexandre Rafalovitch (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch closed SOLR-13540.


> Is it possible configure a single data-config.xml file for all the 
> environments?
> 
>
> Key: SOLR-13540
> URL: https://issues.apache.org/jira/browse/SOLR-13540
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.5
>Reporter: Hugo Rodriguez
>Priority: Major
>
> Hi
> I need to configure a single data-config.xml file in solr for SAS AML 7.1. I 
> have three environments: Development, quality and production, and you know 
> the first lines in a data-config.xml file is for connection to a database 
> (database name, database server, port, user, password, etc). According to 
> this, is it possible to configure only one file (data.config.xml) that 
> dinamically connects for each of the databases in all environments?
> Thanks for all your answers
> Best regards
> Hugo Rodriguez R



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13534) Dynamic loading of jars from a url

2019-06-11 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861574#comment-16861574
 ] 

Alexandre Rafalovitch commented on SOLR-13534:
--

Well, all the security worries are based on somehow the URL being injecting 
into the system. So, the URL could point at a non-jar file that will trigger 
some negative reaction (escalation, crash, denial of service, etc). E.g. if the 
target file is a 200Gb video...




> Dynamic loading of jars from a url
> --
>
> Key: SOLR-13534
> URL: https://issues.apache.org/jira/browse/SOLR-13534
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Dynamic loading is possible from {{.system}} collection. It's much easier to 
> host the jars on a remote service and load it from there. This way the user 
> should have no problem in loading jars when the {{.system}} collection is not 
> available for some reason.
> The steps should look as follows
>  # get the hash of your jar file.  {{openssl dgst -sha512 }}
>  # upload it your hosting service . say the location is 
> {{[http://host:port/my-jar/location|http://hostport/]}}
>  # create a runtime lib entry for the collection as follows
> {code:java}
> curl http://localhost:8983/solr/techproducts/config -H 
> 'Content-type:application/json' -d '{
>"add-runtimelib": { "name":"jarblobname", 
> "sha512":"e94bb3990b39aacdabaa3eef7ca6102d96fa46766048da50269f25fd41164440a4e024d7a7fb0d5ec328cd8322bb65f5ba7886e076a8f224f78cb310fd45896d"
>  , "url" : "http://host:port/my-jar/loaction"}
> }'
> {code}
> to update the jar, just repeat the steps and use the {{update-runtimelib}} to 
> update the sha512 hash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13534) Dynamic loading of jars from a url

2019-06-11 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861563#comment-16861563
 ] 

Alexandre Rafalovitch commented on SOLR-13534:
--

I feel sha512 is good for verification, but it still opens up an attack vector 
of whatever downloads the archive before the actual verification step.

Also, perhaps there should be a INFO log message or similar if this is enabled. 

And I am guessing if the local copies of the jar are missing, it (they) will 
all be loaded from the remote location on startup. Is there an issue with 
multiple entries hitting the same URL?

> Dynamic loading of jars from a url
> --
>
> Key: SOLR-13534
> URL: https://issues.apache.org/jira/browse/SOLR-13534
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Dynamic loading is possible from {{.system}} collection. It's much easier to 
> host the jars on a remote service and load it from there. This way the user 
> should have no problem in loading jars when the {{.system}} collection is not 
> available for some reason.
> The steps should look as follows
>  # get the hash of your jar file.  {{openssl dgst -sha512 }}
>  # upload it your hosting service . say the location is 
> {{[http://host:port/my-jar/location|http://hostport/]}}
>  # create a runtime lib entry for the collection as follows
> {code:java}
> curl http://localhost:8983/solr/techproducts/config -H 
> 'Content-type:application/json' -d '{
>"add-runtimelib": { "name":"jarblobname", 
> "sha512":"e94bb3990b39aacdabaa3eef7ca6102d96fa46766048da50269f25fd41164440a4e024d7a7fb0d5ec328cd8322bb65f5ba7886e076a8f224f78cb310fd45896d"
>  , "url" : "http://host:port/my-jar/loaction"}
> }'
> {code}
> to update the jar, just repeat the steps and use the {{update-runtimelib}} to 
> update the sha512 hash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-13266) /update/json/docs should support the JSON record format

2019-06-11 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861378#comment-16861378
 ] 

Alexandre Rafalovitch commented on SOLR-13266:
--

We support [JSONLines|http://jsonlines.org/], what exactly is the difference?

> /update/json/docs should support the JSON record format
> ---
>
> Key: SOLR-13266
> URL: https://issues.apache.org/jira/browse/SOLR-13266
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Priority: Major
>
> This is a standard [JSON format |https://tools.ietf.org/html/rfc7464]that 
> Solr does not support



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: No email notifications from JIRA when attaching a patch

2019-06-10 Thread Alexandre Rafalovitch

Maybe this would be helpful in a meanwhile:
http://jirasearch.mikemccandless.com/search.py?chg=dds==project=Lucene=0=49414=recentlyUpdated=list=pf1nc4synql5=project%3ASolr=allUsers%3AAdrien+Grand=attachments%3APatch=

Regards,
   Alex.

On Mon, 10 Jun 2019 at 04:26, Adrien Grand  wrote:
>
> Hello,
>
> I have been bitten a couple more times by it, I can try to find which
> JIRA issues exactly if that helps.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Sanity check: JIRA search discrepancy when logged-in/anon

2019-06-08 Thread Alexandre Rafalovitch

Ok,

I've done it for all currently public issues. It was a bit
nerve-wracking to modify 4k+ issues (1k at a time) :-)

Notes:
1) It is a bit hard to find all Public issues as Security level is not
in the dropdowns. So, the search query has to be in the advanced mode
and be: project = SOLR AND  Level = "Public"
2) Can only be done maximum 1000 issues at a time (under Tools/Bulk Change)
3) Check Edit
4) Set Security Level to None, scroll down all the way and uncheck
Send email, then submit
5) Confirm
6) Wait (approx 7min per 1000)
7) Acknowledge

I am not sure I have the right to update the release instructions, but
- as David said - this probably could be a good place to keep them in
check. Possibly in the same step as does the final issue cleanup
(something about versions I think).

Regards,
   Alex.

On Fri, 7 Jun 2019 at 09:26, David Smiley  wrote:
>
> Wow that's annoying!
>
> Perhaps we could add this bulk edit task as part of the release process 
> towards the end?  It's strictly not release related but it's a good a time as 
> any to do clean up while the RM is following a script.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Jun 7, 2019 at 8:52 AM Cassandra Targett  
> wrote:
>>
>> What you’re seeing is exactly the security level situation. Recall that it 
>> occurs with all issues that have a value in the Security Level field, 
>> whether that value is Public or Private. For all queries, if you are not 
>> logged in, you only see issues that have an empty value in the Security 
>> Level field. You can see the issues just fine if you know the ID, but they 
>> will not appear in query results unless you are logged in.
>>
>> The 14 you see for 7.4 when not logged in have no Security Level set at all. 
>> All 10 of the 7.3.1 issues you can’t see unless you are logged in have a 
>> value in the Security Level field, Public as it happens.
>>
>> One of the several draft emails I haven’t had time to send is a suggestion 
>> that we just simply do a bulk edit for all Public issues to remove the value 
>> entirely, and periodically do the same. I’d do it as often as I have time 
>> and remember to do it, but any of us could do it as long as we agree it’s a 
>> good idea.
>>
>> Cassandra
>> On Jun 7, 2019, 5:53 AM -0500, Alexandre Rafalovitch , 
>> wrote:
>>
>> Hi,
>>
>> It seems to be that something is not right with JIRA. It seems somehow
>> related to minor (x.y.Z) releases when looking for them in Anon mode.
>> And issues that target them.
>>
>> Reproduction:
>> 1) In Anon window, go to: https://issues.apache.org/jira/browse/SOLR-12202
>> 2) Click (to new window) on 7.3.1 release, which should show all
>> issues fixed in 7.3.1. I get nothing in anon and 10 in logged-in
>> 3) Click (to new window) on 7.4 release, and I see 14, instead of 169
>>
>> Same happens with https://issues.apache.org/jira/browse/SOLR-13255
>> (part of 7.7.1 release).
>>
>> I thought maybe this was connected to Security Level situation, but
>> both of these are marked public explicitly.
>>
>> Does anybody else sees this and/or knows whether this is some kind of
>> known issue?
>>
>> Regards,
>> Alex.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Sanity check: JIRA search discrepancy when logged-in/anon

2019-06-07 Thread Alexandre Rafalovitch

Hi,

It seems to be that something is not right with JIRA. It seems somehow
related to minor (x.y.Z) releases when looking for them in Anon mode.
And issues that target them.

Reproduction:
1) In Anon window, go to: https://issues.apache.org/jira/browse/SOLR-12202
2) Click (to new window) on 7.3.1 release, which should show all
issues fixed in 7.3.1. I get nothing in anon and 10 in logged-in
3) Click (to new window) on 7.4 release, and I see 14, instead of 169

Same happens with https://issues.apache.org/jira/browse/SOLR-13255
(part of 7.7.1 release).

I thought maybe this was connected to Security Level situation, but
both of these are marked public explicitly.

Does anybody else sees this and/or knows whether this is some kind of
known issue?

Regards,
   Alex.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11724) Cdcr Bootstrapping does not cause "index copying" to follower nodes on Target

2019-06-07 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-11724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858464#comment-16858464
 ] 

Alexandre Rafalovitch commented on SOLR-11724:
--

This issue is marked as part of 7.3.1, but was it actually? It is the only 
issue as marked unfinished against the completed 7.3.1 release.

> Cdcr Bootstrapping does not cause "index copying" to follower nodes on Target
> -
>
> Key: SOLR-11724
> URL: https://issues.apache.org/jira/browse/SOLR-11724
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Major
> Fix For: 7.3.1, 7.4, 8.0
>
> Attachments: SOLR-11724.patch, SOLR-11724.patch, SOLR-11724.patch, 
> SOLR-11724.patch
>
>
> Please find the discussion on:
> http://lucene.472066.n3.nabble.com/Issue-with-CDCR-bootstrapping-in-Solr-7-1-td4365258.html
> If we index significant documents in to Source, stop indexing and then start 
> CDCR; bootstrapping only copies the index to leader node of shards of the 
> collection, and followers never receive the documents / index until and 
> unless atleast one document is inserted again on source; which propels to 
> target and target collection trigger index replication to followers.
> This behavior needs to be addressed in proper manner, either at target 
> collection or while bootstrapping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-14 Thread Alexandre Rafalovitch

Welcome aboard and congratulations.

Regards,
Alex


On Mon, May 13, 2019, 3:49 PM Dawid Weiss,  wrote:

> > I am pretty sure my first interaction with the Apache Solr/Lucene
> community was back in 2012,
>
> Yeah... I really don't know how it happened you haven't been
> invited earlier. Everyone just kind of assumed you
> have committer rights already! :)
>
>
> D.
>
> On Mon, May 13, 2019 at 9:23 PM Michael Sokolov 
> wrote:
> >
> > Thanks Dawid, and thank you to everyone who voted to grant me access
> > to this awesome project!
> >
> > I spent many years building full text search web applications serving
> > large texts (especially dictionaries, encyclopedias, and academic
> > journals). I cut my teeth with AltaVista back in 1998, and tried many
> > other search engines before finally coming around to Solr/Lucene.
> >
> > I am pretty sure my first interaction with the Apache Solr/Lucene
> > community was back in 2012, when I was looking to solve a performance
> > problem we encountered highlighting gigantic documents. Since then
> > I've worked on many projects involving Solr and Lucene, and
> > ElasticSearch, and made various contributions, implemented some of my
> > own extensions, made a separate XML query engine based on Solr (Lux -
> > no longer active), went to a few Lucene/Solr Revolutions (spoke at
> > one), and always in the back of my mind was the idea of contributing
> > more actively and becoming a full participant in this thriving open
> > source project. Now I'm really excited that has come to pass, and look
> > forward to digging in even deeper, and helping to keep this thing
> > going.
> >
> > -Mike
> >
> > On Mon, May 13, 2019 at 3:12 PM Dawid Weiss 
> wrote:
> > >
> > > Hello everyone,
> > >
> > > Please join me in welcoming Michael Sokolov as Lucene/ Solr committer!
> > >
> > > Many of you probably know Mike as he's been around for quite a while
> > > -- answering questions, reviewing patches, providing insight and
> > > actively working on new code.
> > >
> > > Congratulations and welcome! It is a tradition to introduce yourself
> > > with a brief bio, Mike.
> > >
> > > Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Welcome Tomoko Uchida as Lucene/Solr committer

2019-04-09 Thread Alexandre Rafalovitch

Welcome Tomoko,

Watching the Luke issue, I really felt your sense of patience and
collaboration. Awesome to have you as a committer.

Regards,
   Alex.

On Mon, 8 Apr 2019 at 11:21, Uwe Schindler  wrote:
>
> Hi all,
>
> Please join me in welcoming Tomoko Uchida as the latest Lucene/Solr committer!
>
> She has been working on https://issues.apache.org/jira/browse/LUCENE-2562 for 
> several years with awesome progress and finally we got the fantastic Luke as 
> a branch on ASF JIRA: 
> https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/jira/lucene-2562-luke-swing-3
> Looking forward to the first release of Apache Lucene 8.1 with Luke bundled 
> in the distribution. I will take care of merging it to master and 8.x 
> branches together with her once she got the ASF account.
>
> Tomoko also helped with the Japanese and Korean Analyzers.
>
> Congratulations and Welcome, Tomoko! Tomoko, it's traditional for you to 
> introduce yourself with a brief bio.
>
> Uwe & Robert (who nominated Tomoko)
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr and Java versions, what we know

2019-03-27 Thread Alexandre Rafalovitch

Side note: Isn't wiki going away? I saw message on Commons list 4 days ago:

"Infra is decommissioning the MoinMoin wiki software that runs the wiki.a.o
system in May.  That means all the content there needs to be.migrated to
new systems if it's still relevant."

Regards,
 Alex

On Tue, Mar 26, 2019, 11:39 AM Erick Erickson, 
wrote:

> So I assume everyone thinks I’ve nailed it perfectly with this page?
> https://wiki.apache.org/solr/SolrJavaVersions. ‘cause I haven’t seen much
> feedback.
>
> Look, we give _no_ guidance at this point about whether Lucene/Solr even
> work on Java X. Well, I guess we’re saying Solr 9 works on with Java 11. Or
> at least it will since it’s about to be required.
>
> I don’t particularly care if we say “If you’re upgrading Java, use Java
> 11” for Lucene/Solr 8x or 7x or 6x for that matter. Let’s just get our
> collective act together and give some guidance.
>
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Commented] (SOLR-10329) Rebuild Solr examples

2019-03-26 Thread Alexandre Rafalovitch (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801563#comment-16801563
 ] 

Alexandre Rafalovitch commented on SOLR-10329:
--

The JIRA itself is still open, anybody can work on it. Any member of community 
will see the proposed patches and will be able to comment on them.

Specifically, for GSoC, I am not mentoring this year, so this is probably not 
the best JIRA to jump on for that purpose.

> Rebuild Solr examples
> -
>
> Key: SOLR-10329
> URL: https://issues.apache.org/jira/browse/SOLR-10329
> Project: Solr
>  Issue Type: Wish
>  Components: examples
>    Reporter: Alexandre Rafalovitch
>Priority: Major
>  Labels: gsoc2017
>
> Apache Solr ships with a number of examples. They evolved from a kitchen sync 
> example and are rather large. When new Solr features are added, they are 
> often shoehorned into the most appropriate example and sometimes are not 
> represented at all. 
> Often, for new users, it is hard to tell what part of example is relevant, 
> what part is default and what part is demonstrating something completely 
> different.
> It would take significant (and very appreciated) effort to review all the 
> examples and rebuild them to provide clean way to showcase best practices 
> around base and most recent features.
> Specific issues are around kitchen sync vs. minimal examples, better approach 
> to "schemaless" mode and creating examples and datasets that allow to create 
> both "hello world" and more-advanced tutorials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

2019-01-12 Thread Alexandre Rafalovitch

I have created https://issues.apache.org/jira/browse/INFRA-17631

Regards,
   Alex.

On Fri, 11 Jan 2019 at 16:01, David Smiley  wrote:
>
> On Fri, Jan 11, 2019 at 3:14 PM Steve Rowe  wrote:
>>
>> +1 to ask Infra for an auto redirect for the links in all the existing JIRA 
>> comments.
>>
>
> +1 to that!
>
> Please post the JIRA INFRA link here so we can follow.  Alex, if you're too 
> busy to get to it than I will.  Hopefully just a <=15min thing.
>
> ~ David
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
> http://www.solrenterprisesearchserver.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-11393) Unable to index field names in JSON

2019-01-11 Thread Alexandre Rafalovitch (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-11393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch closed SOLR-11393.


> Unable to index field names in JSON
> ---
>
> Key: SOLR-11393
> URL: https://issues.apache.org/jira/browse/SOLR-11393
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 6.6.1
>Reporter: Cheburakshu
>Priority: Major
>
> I am not able to index documents with below field names in JSON doc.
> config_os_version
> location_region
> custom_var_v2
> deleted
> I get the below error
> ERROR: [doc=29128e37-c6d9-4d2b-814e-1d42f84be9b5] Error adding field 
> 'location_region'='test' msg=For input string: "test"
> The input given in admin UI /update endpoint is 
> {"location_region":"test"}
> Same error was encountered for other field names as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1699 matches

Mail list logo