Re: Survey on ManagedResources feature

2020-08-18 Thread Noble Paul
So, it's not very different from directly reading a file from ZK?

what benefit do you get by using the ManagedResourceStorage?

On Sun, Aug 16, 2020 at 7:08 PM Matthias Krueger  wrote:
>
>
> In a custom SolrRequestHandler#handleRequest something like this:
>
> final ManagedResourceStorage.StorageIO storageIO =
> ManagedResourceStorage.newStorageIO(core.getCoreDescriptor().getCollectionName(),
> resourceLoader, new NamedList<>());
>
> And then using
>
> storageIO.openOutputStream(resourceName)
>
> to store some (well-known) resources.
>
> Matt
>
>
> On 15.08.20 11:38, Noble Paul wrote:
> >> I use MangedResource#StorageIO and its implementations as a convenient way 
> >> to abstract away the underlying config storage when creating plugins that 
> >> need to support both, SolrCloud and Solr Standalone.
> > Can you give us some more details on how you use it?
> >
> > On Sat, Aug 15, 2020 at 7:32 PM Noble Paul  wrote:
> >>> As authentication is plugged into the SolrDispatchFilter I would assume 
> >>> that you would need to be authenticated to read/write Managed Resources
> >> I'm talking about the authorization plugins
> >>
> >> On Fri, Aug 14, 2020 at 10:20 PM Matthias Krueger  wrote:
> >>>
> >>> As authentication is plugged into the SolrDispatchFilter I would assume 
> >>> that you would need to be authenticated to read/write Managed Resources 
> >>> but no authorization is checked (i.e. any authenticated user can 
> >>> read/write them), correct?
> >>>
> >>> Anyway, I came across Managed Resources in at least two scenarios:
> >>>
> >>> The LTR plugin is using them for updating model/features.
> >>> I use MangedResource#StorageIO and its implementations as a convenient 
> >>> way to abstract away the underlying config storage when creating plugins 
> >>> that need to support both, SolrCloud and Solr Standalone.
> >>>
> >>> IMO an abstraction that allows distributing configuration (ML models, 
> >>> configuration snippets, external file fields...) that exceeds the typical 
> >>> ZK size limits to SolrCloud while also supporting Solr Standalone would 
> >>> be nice to have.
> >>>
> >>> Matt
> >>>
> >>>
> >>> On 12.08.20 02:08, Noble Paul wrote:
> >>>
> >>> The end point is served by restlet. So, your rules are not going to be 
> >>> honored. The rules work only if it is served by a Solr request handler
> >>>
> >>> On Wed, Aug 12, 2020, 12:46 AM Jason Gerlowski  
> >>> wrote:
>  Hey Noble,
> 
>  Can you explain what you mean when you say it's not secured?  Just for
>  those of us who haven't been following the discussion so far?  On the
>  surface of things users taking advantage of our RuleBasedAuth plugin
>  can secure this API like they can any other HTTP API.  Or are you
>  talking about some other security aspect here?
> 
>  Jason
> 
>  On Tue, Aug 11, 2020 at 9:55 AM Noble Paul  wrote:
> > Hi all,
> > The end-point for Managed resources is not secured. So it needs to be
> > fixed/eliminated.
> >
> > I would like to know what is the level of adoption for that feature
> > and if it is a critical feature for users.
> >
> > Another possibility is to offer a replacement for the feature using a
> > different API
> >
> > Your feedback will help us decide on what a potential solution should be
> >
> > --
> > -
> > Noble Paul
> >>
> >>
> >> --
> >> -
> >> Noble Paul
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>


-- 
-
Noble Paul

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again

2020-08-18 Thread Roman Chyla
Hi Mike,

I'm sorry, the problem all the time is inside related to a
word-delimiter filter factory. This is embarrassing but I have to
admit publicly and self-flagellate.

A word-delimiter filter is used to split tokens, these then are used
to find multi-token synonyms (hence the connection). In my desire to
simplify, I have omitted that detail while writing my first email.

I went to generate the stack trace:

```
assertU(adoc("id", "603", "bibcode", "xx603",
"title", "THE HUBBLE constant: a summary of the HUBBLE SPACE
TELESCOPE program"));```

stage:indexer term=xx603 pos=1 type=word offsetStart=0 offsetEnd=13
stage:indexer term=acr::the pos=1 type=ACRONYM offsetStart=0 offsetEnd=3
stage:indexer term=hubble pos=1 type=word offsetStart=4 offsetEnd=10
stage:indexer term=acr::hubble pos=0 type=ACRONYM offsetStart=4 offsetEnd=10
stage:indexer term=constant pos=1 type=word offsetStart=11 offsetEnd=20
stage:indexer term=summary pos=1 type=word offsetStart=23 offsetEnd=30
stage:indexer term=hubble pos=1 type=word offsetStart=38 offsetEnd=44
stage:indexer term=syn::hubble space telescope pos=0 type=SYNONYM
offsetStart=38 offsetEnd=60
stage:indexer term=syn::hst pos=0 type=SYNONYM offsetStart=38 offsetEnd=60
stage:indexer term=space pos=1 type=word offsetStart=45 offsetEnd=50
stage:indexer term=telescope pos=1 type=word offsetStart=51 offsetEnd=60
stage:indexer term=program pos=1 type=word offsetStart=61 offsetEnd=68

that worked, only the next one failed:

```assertU(adoc("id", "605", "bibcode", "xx604",
"title", "MIT and anti de sitter space-time"));```


stage:indexer term=xx604 pos=1 type=word offsetStart=0 offsetEnd=13
stage:indexer term=mit pos=1 type=word offsetStart=0 offsetEnd=3
stage:indexer term=acr::mit pos=0 type=ACRONYM offsetStart=0 offsetEnd=3
stage:indexer term=syn::massachusetts institute of technology pos=0
type=SYNONYM offsetStart=0 offsetEnd=3
stage:indexer term=syn::mit pos=0 type=SYNONYM offsetStart=0 offsetEnd=3
stage:indexer term=anti pos=1 type=word offsetStart=8 offsetEnd=12
stage:indexer term=syn::ads pos=0 type=SYNONYM offsetStart=8 offsetEnd=28
stage:indexer term=syn::anti de sitter space pos=0 type=SYNONYM
offsetStart=8 offsetEnd=28
stage:indexer term=syn::antidesitter spacetime pos=0 type=SYNONYM
offsetStart=8 offsetEnd=28
stage:indexer term=de pos=1 type=word offsetStart=13 offsetEnd=15
stage:indexer term=sitter pos=1 type=word offsetStart=16 offsetEnd=22
stage:indexer term=space pos=1 type=word offsetStart=23 offsetEnd=28
stage:indexer term=time pos=1 type=word offsetStart=29 offsetEnd=33
stage:indexer term=spacetime pos=0 type=word offsetStart=23 offsetEnd=33

```325677 ERROR
(TEST-TestAdsabsTypeFulltextParsing.testNoSynChain-seed#[ADFAB495DA8F6F40])
[] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Exception writing document id
605 to the index; possible analysis error: startOffset must be
non-negative, and endOffset must be >= startOffset, and offsets must
not go backwards startOffset=23,endOffset=33,lastStartOffset=29 for
field 'title'
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:242)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:1002)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:1233)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$2(DistributedUpdateProcessor.java:1082)
at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1082)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:694)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:261)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:188)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2551)
at 
org.apache.solr.servlet.DirectSolrConnection.request(DirectSolrConnection.java:125)
at org.apache.solr.util.TestHarness.update(TestHarness.java:285)
at 
org.apache.solr.util.BaseTestHarness.checkUpdateStatus(BaseTestHarness.java:274)
at org.apache.solr.util.BaseTestHarness.validateUpdate(BaseTestHarness.java:244)
at org.apache.solr.SolrTestCaseJ4.checkUp

Atomic updates, copyField and stored=true

2020-08-18 Thread Erick Erickson
It _finally_ occurred to me to ask why we have the restriction that the 
destination of a copyField must have stored=false. I understand what currently 
happens when that’s the case, you get repeats. 

What I wondered is why we can’t detect that a field is the destination of a 
copyField and _not_ pull the stored values out of it during atomic updates?

Or do we run afoul of things in tlog retrieval or RTG?

Is this a silly idea or should I raise a JIRA?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Migrating to Cloudbees

2020-08-18 Thread Uwe Schindler
Thank you Cassandra,

I quickly looked into config. Sems easy all together. I can possibly also set 
this up for Javadocs.

The good thing with Javadocs is that we can better configure linking between 
Solr and Lucene, so this is a good thing.

I will try to set something up, if I have some time.

Uwe

Am August 18, 2020 7:54:39 PM UTC schrieb Cassandra Targett 
:
>Follow-up on this - we fixed the Solr Ref Guide builds last week, but
>there was an outstanding issue which was the Content Security Policy on
>Cloudbees is too stringent to display the Ref Guide’s CSS and JS. It
>blocked all the content basically, rendering them unreadable.
>
>Infra helped us straighten it out by setting up the ability for us to
>push the artifacts of Ref Guide builds to a new server they’ve recently
>set up to host nightly builds. I’ve updated all the Ref Guide jobs to
>do that and fixed their descriptions to point to the new locations. You
>can find them at https://nightlies.apache.org/Lucene/.
>
>The Javadocs for both Lucene and Solr also suffer from the same limited
>CSP, but the Javadocs seem to be able to mostly recover from it. It is
>possible to push them to the nightlies server for the full JS-enabled
>experience if we choose.
>
>Infra is also quite open (enthusiastic?) for people to use this server,
>so if there is any interest in pushing other build artifacts out to it
>as a regular place to get pre-release builds, we’re welcome to do so. I
>can help, or you can look at one of the Ref Guide jobs for an example.
>
>Cassandra
>On Aug 7, 2020, 12:17 PM -0500, Ishan Chattopadhyaya
>, wrote:
>> Thanks for your work, Uwe. I would love to run a public Jenkins
>server soon (maybe be September), would like to try out your scripts
>:-)
>>
>> > On Fri, Aug 7, 2020 at 10:12 PM David Smiley 
>wrote:
>> > > Sweet!  Thanks Uwe!
>> > > ~ David Smiley
>> > > Apache Lucene/Solr Search Developer
>> > > http://www.linkedin.com/in/davidwsmiley
>> > >
>> > >
>> > > > On Thu, Aug 6, 2020 at 5:52 PM Uwe Schindler 
>wrote:
>> > > > > Thanks Erick!
>> > > > >
>> > > > > I hope the remaining issues sort out quite soon.
>> > > > >
>> > > > > For the release managers: As I did a more scripted, automatic
>migration using the Jenkins REST API (otherwise the 50 jobs we have
>would have been a desaster to migrate), I already have a plan to reuse
>that script to allow the release manager to create clones of all
>"master" jobs, preconfigured for the release branch. All you need is a
>Lucene PMC status and a Jenkins API Token and then you will be able to
>start a script who creates all release branch jobs in a few seconds 😊
>> > > > >
>> > > > > Uwe
>> > > > >
>> > > > > -
>> > > > > Uwe Schindler
>> > > > > Achterdiek 19, D-28357 Bremen
>> > > > > https://www.thetaphi.de
>> > > > > eMail: u...@thetaphi.de
>> > > > >
>> > > > > > -Original Message-
>> > > > > > From: Erick Erickson 
>> > > > > > Sent: Thursday, August 6, 2020 11:39 PM
>> > > > > > To: dev@lucene.apache.org
>> > > > > > Subject: Migrating to Cloudbees
>> > > > > >
>> > > > > > If nobody has expressed their _extreme_ gratitude to Uwe,
>infra (and helpers?)
>> > > > > > for the migration, I hereby rectify that!!
>> > > > > >
>-
>> > > > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > > > > For additional commands, e-mail: dev-h...@lucene.apache.org
>> > > > >
>> > > > >
>> > > > >
>-
>> > > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > > > For additional commands, e-mail: dev-h...@lucene.apache.org
>> > > > >

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: Migrating to Cloudbees

2020-08-18 Thread Cassandra Targett
Follow-up on this - we fixed the Solr Ref Guide builds last week, but there was 
an outstanding issue which was the Content Security Policy on Cloudbees is too 
stringent to display the Ref Guide’s CSS and JS. It blocked all the content 
basically, rendering them unreadable.

Infra helped us straighten it out by setting up the ability for us to push the 
artifacts of Ref Guide builds to a new server they’ve recently set up to host 
nightly builds. I’ve updated all the Ref Guide jobs to do that and fixed their 
descriptions to point to the new locations. You can find them at 
https://nightlies.apache.org/Lucene/.

The Javadocs for both Lucene and Solr also suffer from the same limited CSP, 
but the Javadocs seem to be able to mostly recover from it. It is possible to 
push them to the nightlies server for the full JS-enabled experience if we 
choose.

Infra is also quite open (enthusiastic?) for people to use this server, so if 
there is any interest in pushing other build artifacts out to it as a regular 
place to get pre-release builds, we’re welcome to do so. I can help, or you can 
look at one of the Ref Guide jobs for an example.

Cassandra
On Aug 7, 2020, 12:17 PM -0500, Ishan Chattopadhyaya 
, wrote:
> Thanks for your work, Uwe. I would love to run a public Jenkins server soon 
> (maybe be September), would like to try out your scripts :-)
>
> > On Fri, Aug 7, 2020 at 10:12 PM David Smiley  wrote:
> > > Sweet!  Thanks Uwe!
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > >
> > > > On Thu, Aug 6, 2020 at 5:52 PM Uwe Schindler  wrote:
> > > > > Thanks Erick!
> > > > >
> > > > > I hope the remaining issues sort out quite soon.
> > > > >
> > > > > For the release managers: As I did a more scripted, automatic 
> > > > > migration using the Jenkins REST API (otherwise the 50 jobs we have 
> > > > > would have been a desaster to migrate), I already have a plan to 
> > > > > reuse that script to allow the release manager to create clones of 
> > > > > all "master" jobs, preconfigured for the release branch. All you need 
> > > > > is a Lucene PMC status and a Jenkins API Token and then you will be 
> > > > > able to start a script who creates all release branch jobs in a few 
> > > > > seconds 😊
> > > > >
> > > > > Uwe
> > > > >
> > > > > -
> > > > > Uwe Schindler
> > > > > Achterdiek 19, D-28357 Bremen
> > > > > https://www.thetaphi.de
> > > > > eMail: u...@thetaphi.de
> > > > >
> > > > > > -Original Message-
> > > > > > From: Erick Erickson 
> > > > > > Sent: Thursday, August 6, 2020 11:39 PM
> > > > > > To: dev@lucene.apache.org
> > > > > > Subject: Migrating to Cloudbees
> > > > > >
> > > > > > If nobody has expressed their _extreme_ gratitude to Uwe, infra 
> > > > > > (and helpers?)
> > > > > > for the migration, I hereby rectify that!!
> > > > > > -
> > > > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > > > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > > > >
> > > > >
> > > > > -
> > > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > > > >


2020-08 Committer virtual meeting

2020-08-18 Thread David Smiley
Hello fellow committers,

I'd like to organize another virtual Lucene/Solr committer meeting this
month.  I created a meeting notes page in confluence here:
https://cwiki.apache.org/confluence/display/LUCENE/2020-08+Committer+meeting
It has some topics I'd like to talk about, some copied from last month that
might be worth following up on, and I'm hoping others might add to the
tentative agenda as well.  As usual there are many topics to discuss.  I
suppose if we have these meetings more often, I'll be less compelled to
raise seemingly all topics.

When exactly is this?:  Perhaps next Thursday or maybe later. I'm using a
"Doodle poll" to determine an optimal time slot.  For the link to the poll,
go to the ASF Slack, #lucene-dev or #solr-dev channel, and you will see
it.  You could also email me directly for it.

For this virtual committer meeting and future ones:

   - This is in the spirit of committer meetings co-located with
   conferences.  ASF policy says that no "decisions" can be made in such a
   venue.  We make decisions on this dev list and indirectly via JIRA out in
   the open and with the opportunity for anyone to comment.
   - Who:  Committer-only or by invitation
   - Video chat with option of audio dial-in.  This time I will use Google
   Hangout.
   - Recorded for those invited only.  I'll dispose of the recording a week
   after.  The intention is for those who cannot be there due to a scheduling
   conflict to see/hear what was said.  I have the ability to do this
   recording via Salesforce's G-Suite subscription.
   - Published notes:  I (or someone) will take written meeting notes that
   are ultimately published for anyone to see (not restricted to those
   invited).  They will be transmitted to the dev list.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley