I'm in agreement with Eric here that fewer ways (or at least a clearer
default way) of supplying resources would be better. Additionally, it
should be easy to specify that this resource that I've shared should be
loaded on a per SolrCore or per node basis (or even better per collection
present on the node, accessible under a standard name to replicas belonging
to that collection?). Not many cases beyond the simplest single collection
install few shards where you want a 1GB resource to be duplicated in memory
across N cores running on the same node, though obviously there's ample
cases where the 10k stop words file is meant to differ across collections.

As it stands Eric's list seems like something that should be in the
documentation somewhere just so people can properly troubleshoot where
something they don't expect to be loaded is getting loaded from, or why
their attempts to load something new aren't working...  especially if it
were ordered to show the precedence of these options.

As for ease of editing configurations, I've long felt that this should be
possible via the admin UI though there's been much worry about security
implications there. Personally, I think that those concerns are resolvable,
but have not found time to make that case. Aside from that I think we need
to support tooling to enable easy management of config sets rather than
expanding the possible number of places the configurations might get loaded
from.

Several years ago I wrote a plugin for gradle that is very very basic, but
after some configuration so that it can see zookeeper, it will happily pull
configs down and push them up for you which is convenient for keeping
configs under version control during development. There's LOTS to improve
there, most especially adding support to manage multiple configs at a time,
and I had hoped that folks would use it and have suggestions,
contributions, but I've got no indication that anyone but me uses it. (
https://github.com/nsoft/solr-gradle)

-Gus

On Fri, Jan 22, 2021 at 8:19 AM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> There is a lot in here ;-).
>
> With the caveat that I don’t have recent experience that many of you do
> with massive solr clusters, I think that we need to commit to fewer, not
> more, ways of maintaining the supporting resources that these clusters
> need..   I’d like to see ways of managing our Solr clusters that encourage
> easy change and experimentation, and encourage us to separate the physical
> layer (version of Solr, networking setup, packages used) from the logical
> layer (individual collections and their supporting code and resources).
>
> I think the configSet was a huge jump forward..   My workflow is to think
> 1) What’s unusual about this Solr setup?  What is the physical layer need
> to be?  Special package?  Special code?   Build a Docker image.
> 2) Fire up a three node Solr cluster, wait till it’s up and responsive via
> checking APIs.
> 3) Now think about my specific use case.   What collections do I need?  Is
> it just 1, or is it 5 or 10 collections.  Are they on the same configSet or
> different.   Great, zip up the configSet and pop it into Solr via APIs.
> 4) Create the collections in the shapes I need with the APIs, and now
> start iterating on what I need to do.  Use the APIs to create fields, or
> set up different ParamSets.
>
> However, with configSets we only did half the job, because we still don’t
> have a single well understood way of handling Jars and other resources.  We
> have many ways of doing it.   Which generates constant user confusion and
> contributes to the perspective that “Solr is hard to use”.
>
> Right now, across the Solr landscape I can think of many ways of adding
> “external” files to my Solr:
>
> 1) Classic ./lib as a place to put things.
> 2) The new to me solr.allow.unsafe.resourceloading=true approach
> 3) The userfiles directory in Solr accessed by streaming expressions load
> function.
> 4) The “package store” for packages located in file store
> 5) The blob store .system concept from before the package store
> 6) the LTR feature store (which I guess is backed by ZK but could be on
> the disk as well through more hoops...
> 7) Layering stuff in directly via Docker build files
>
> These are each a little different, with varying levels of support.
>
> Let’s figure out how we can include a resource that is 10 KB, 1 MB or 1 GB
> and not have to think about ZooKeeper or any of the other implementation
> details of backing that.    Let’s figure out where the package manager is
> letting us down and keep working on it.
>
>
>
> On Jan 22, 2021, at 12:16 AM, David Smiley <dsmi...@apache.org> wrote:
>
> Summary:  I've been contemplating a simple enhancement to how SolrCloud
> resolves files in a configSet:  when a file isn't in ZooKeeper, fallback
> resolution to the same-named configset on the file system (which normally
> is ignored in SolrCloud today).  A further fallback to _default on the
> filesystem could be useful as well. The mutable space is always ZK if you
> edit a schema or configOverlay.json or whatever.
>
> My primary motivation is allowing for upgrades to plugins, configs, or
> Solr itself to be easier in some scenarios (certainly not all!).  Imagine
> that you've got configOverlay.json (with some handlers defined) &
> params.json & schema.xml in ZK, and solrconfig.xml on the file system, plus
> some partial xml file of schema field types that is "xi:include"-ed by
> schema.xml.  Assume that a custom Solr Docker image is used including
> custom plugins, and with this configSet baked in.  One day you add some new
> token filters, add a new Lucene merge policy, and remove some outdated
> update request processor.  You do plugin code changes and xi:included
> field type changes and edit solrconfig.xml, and build this into your latest
> company Solr Docker image, and you get it deployed using Kubernetes.  Those
> changes can be safe to deploy without touching any ZK resident configSet.
> Other changes might not be (e.g. removing a field type that is referenced,
> etc. or doing changes to analyzed text that are too incompatible requiring
> a re-index) but my point is that some are, and this would be easier.
>
> An additional motivation is storing large relatively static common
> resources on the file system.  Where I work, I've got over a gig of them
> :-). This can be worked around with solr.allow.unsafe.resourceloading=true
> but... it'd be nice to not have to resort to that.
>
> Another benefit would be to make it easier to separate one's own
> configuration with that of the _default configSet you took from Solr when
> starting a new project.  Resolving differences and then doing Solr upgrades
> was a common task I had to do as a consultant and my own Solr upgrades.
> Granted this is possible today but perhaps if this overlay was
> emphasized/embraced more, it would lead to this outcome.  It's still a
> problem that a bare-bones solrconfig.xml & schema.xml are either too
> bare-bones or say too much, and it's a separate issue for Solr to improve
> that.
>
> Probably secondary related issue: If the SolrCloud configSet ZK node were
> to be optional instead of required (thus assume the configSet is entirely
> on the file system), it would bring other benefits.  It would allow users
> to use the "file store" or some network mounted storage (NFS) as the
> configSet location.  It would accelerate experimentation with SolrCloud in
> docker locally. The biggest PITA anyone notices when first exploring
> SolrCloud is that configs are fundamentally not on the file system despite
> you seeing them there; it's all in ZK.  And there's no super convenient way
> to edit the configuration, not even a web UI.  Using the file system for
> configSets would be especially nice when doing local SolrCloud
> experimentation in Docker, eliminating an annoying configSet deployment
> step.
>
> I plan to file an issue of course but I think this deserved a dev list
> discussion.
>
> I know the new package manager could help with my primary motivating
> use-case, but I think at present there are too many obstacles there, at
> least at present.  A file system fallback is a simple thing by comparison.
>
> Question:  Does the k8s Solr Operator do anything to make configSet &
> plugin upgrades better?
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> _______________________
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to