Re: Programmatic Basic Auth on CloudSolrClient

2021-03-04 Thread Mark H. Wood
On Wed, Mar 03, 2021 at 10:34:50AM -0800, Tomás Fernández Löbbe wrote:
> As far as I know the current OOTB options are system properties or
> per-request (which would allow you to use different per collection, but
> probably not ideal if you do different types of requests from different
> parts of your code). A workaround (which I've used in the past) is to have
> a custom client that overrides and sets the credentials in the "request"
> method (you can put whatever logic there to identify which credentials to
> use). I recently created https://issues.apache.org/jira/browse/SOLR-15154
> and https://issues.apache.org/jira/browse/SOLR-15155 to try to address this
> issue in future releases.

I have not tried it, but could you not:

1. set up an HttpClient with an appropriate CredentialsProvider;
2. pass it to HttpSolrClient.Builder.withHttpClient();
2. pass that Builder to LBHttpSolrClient.Builder.withHttpSolrClientBuilder();
3. pass *that* Builder to CloudSolrClient.Builder.withLBHttpSolrClientBuilder();

Now you have control of the CredentialsProvider and can have it return
whatever credentials you wish, so long as you still have a reference
to it.

> On Wed, Mar 3, 2021 at 5:42 AM Subhajit Das  wrote:
> 
> >
> > Hi There,
> >
> > Is there any way to programmatically set basic authentication credential
> > on CloudSolrClient?
> >
> > The only documentation available is to use system property. This is not
> > useful if two collection required two separate set of credentials and they
> > are parallelly accessed.
> > Thanks in advance.
> >

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Tangent: old Solr versions

2020-10-28 Thread Mark H. Wood
On Tue, Oct 27, 2020 at 04:25:54PM -0500, Mike Drob wrote:
> Based on the questions that we've seen over the past month on this list,
> there are still users with Solr on 6, 7, and 8. I suspect there are still
> Solr 5 users out there too, although they don't appear to be asking for
> help - likely they are in set it and forget it mode.

Oh, there are quite a few instances of Solr 4 out there as well.  Many
of them will be moving to v7 or v8, probably starting in the next 6-12
months.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Solr queries slow down over time

2020-09-25 Thread Mark H. Wood
On Fri, Sep 25, 2020 at 11:49:22AM +0530, Goutham Tholpadi wrote:
> I have around 30M documents in Solr, and I am doing repeated *:* queries
> with rows=1, and changing start to 0, 1, 2, and so on, in a
> loop in my script (using pysolr).
> 
> At the start of the iteration, the calls to Solr were taking less than 1
> sec each. After running for a few hours (with start at around 27M) I found
> that each call was taking around 30-60 secs.
> 
> Any pointers on why the same fetch of 1 records takes much longer now?
> Does Solr need to load all the 27M before getting the last 1 records?

I and many others have run into the same issue.  Yes, each windowed
query starts fresh, having to find at least enough records to satisfy
the query, walking the list to discard the first 'start' worth of
them, and then returning the next 'rows' worth.  So as 'start' increases,
the work required of Solr increases and the response time lengthens.

> Is there a better way to do this operation using Solr?

Another answer in this thread gives links to resources for addressing
the problem, and I can't improve on those.

I can say that when I switched from start= windowing to cursormark, I
got a very nice improvement in overall speed and did not see the
progressive slowing anymore.  A query loop that ran for *days* now
completes in under five minutes.  In some way that I haven't quite
figured out, a cursormark tells Solr where in the overall document
sequence to start working.

So yes, there *is* a better way.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Need to update SOLR_HOME in the solr service script and getting errors

2020-09-17 Thread Mark H. Wood
On Wed, Sep 16, 2020 at 02:59:32PM +, Victor Kretzer wrote:
> My setup is two solr nodes running on separate Azure Ubuntu 18.04 LTS vms 
> using an external zookeeper assembly.
> I installed Solr 6.6.6 using the install file and then followed the steps for 
> enabling ssl. I am able to start solr, add collections and the like using 
> bin/solr script.
> 
> Example:
> /opt/solr$ sudo bin/solr start -cloud -s cloud/test2 -force
> 
> However, if I restart the machine or attempt to start solr using the 
> installed service, it naturally goes back to the default SOLR_HOME in the 
> /etc/default/solr.in.sh script: "/var/solr/data"
> 
> I've tried updating SOLR_HOME to "/opt/solr/cloud/test2"

That is what I would do.

> but then when I start the service I see the following error on the Admin 
> Dashboard:
> SolrCore Initialization Failures
> mycollection_shard1_replica1: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> /opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock
> Please check your logs for more information
> 
> I'm including what I believe to be the pertinent information from the logs 
> below:

You did well.

> I suspect this is a permission issue because the solr user created by the 
> install script isn't allowed access to  /opt/solr but I'm new to Linux and 
> haven't completely wrapped my head around the way permissions work with it. 
> Am I correct in guessing the cause of the error and, if so, how do I correct 
> this so that the service can be used to run my instances?

Yes, the stack trace actually tells you explicitly that the problem is
permissions on that file.  Follow the chain of "Caused by:" and you'll see:

  Caused by: java.nio.file.AccessDeniedException: 
/opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock

Since, in the past, you have started Solr using 'sudo', this probably
means that write.lock is owned by 'root'.  Solr creates this file with
permissions that allow only the owner to write it.  If the service
script runs Solr as any other user (and it should!) then Solr won't be
able to open this file for writing, and because of this it won't
complete the loading of that core.

You should find out what user account is used by the service script,
and 'chown' Solr's entire working directories tree to be owned by that
user.  Then, refrain from ever running Solr as 'root' or the problem
may recur.  Use the normal service start/stop mechanism for
controlling your Solr instances.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-29 Thread Mark H. Wood
Wandering off topic, but still apropos Solr.

On Sun, Jun 28, 2020 at 12:14:56PM +0200, Ilan Ginzburg wrote:
> I disagree Ishan. We shouldn't get rid of standalone mode.
> I see three layers in Solr:
> 
>1. Lucene (the actual search libraries)
>2. The server infra ("standalone Solr" basically)
>3. Cluster management (SolrCloud)
> 
> There's value in using lower layers without higher ones.
> SolrCloud is a good solution for some use cases but there are others that
> need a search server and for which SolrCloud is not a good fit and will
> likely never be. If standalone mode is no longer available, such use cases
> will have to turn to something other than Solr (or fork and go their own
> way).

A data point:

While working to upgrade a dependent product from Solr 4 to Solr 7, I
came across a number of APIs which would have made things simpler,
neater and more reliable...except that they all are available *only*
is SolrCloud.  I eventually decided that asking thousands of sites to
run "degenerate" SolrCloud clusters (of a single instance, plus the ZK
stuff that most would find mysterious) was just not worth the gain.

So, my wish-list for Solr includes either (a) abolish standalone so
the decision is taken out of my hands, or (b) port some of the
cloud-only APIs back to the standalone layer.  I haven't spent a
moment's thought on how difficult either would be -- as I said, just a
wish.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-24 Thread Mark H. Wood
On Wed, Jun 24, 2020 at 12:45:25PM +0200, Jan Høydahl wrote:
> Master/slave and standalone are used interchangably to mean zk-less Solr. I 
> have a feeling that master/slave is the more popular of the two, but 
> personally I have been using both.

I've been trying to stay quiet and let the new-terminology issue
settle, but I had a thought.  Someone has already pointed out that the
so-called master/slave cluster is misnamed:  the so-called "master"
node doesn't order the "slaves" about and indeed has no notion of
being a master in any sense.  It acts as a servant to the "slave"
nodes, which are in charge of keeping themselves updated.

So, it's kind of odd, but I could get used to calling this mode a
"client/server cluster".

That leaves the question of what to call Solr Cloud mode, in which no
node is permanently special.  I could see calling it a "herd" or
suchlike.

Now I'll try to shut up again. :-)

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-19 Thread Mark H. Wood
On Fri, Jun 19, 2020 at 09:22:49AM -0400, j.s. wrote:
> On 6/18/20 9:50 PM, Rahul Goswami wrote:
> > So +1 on "slave" being the problematic term IMO, not "master".
> 
> but you cannot have a master without a slave, n'est-ce pas?

Well, yes.  In education:  Master of Science, Arts, etc.  In law:
Special Master (basically a judge's delegate).  See also "magistrate."
None of these has any connotation of the ownership of one person by
another.

(It's a one-way relationship:  there is no slavery without mastery,
but there are other kinds of mastery.)

But this is an emotional issue, not a logical one.  If doing X makes
people angry, and we don't want to make those people angry, then
perhaps we should not do X.

> i think it is better to use the metaphor of copying rather than one of 
> hierarchy. language has so many (unintended) consequences ...

Sensible.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Mark H. Wood
Primary / satellite?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Script to check if solr is running

2020-06-05 Thread Mark H. Wood
On Thu, Jun 04, 2020 at 12:36:30PM -0400, Ryan W wrote:
> Does anyone have a script that checks if solr is running and then starts it
> if it isn't running?  Occasionally my solr stops running even if there has
> been no Apache restart.  I haven't been able to determine the root cause,
> so the next best thing might be to check every 15 minutes or so if it's
> running and run it if it has stopped.

I've used Monit for things that must be kept running:

  https://mmonit.com/monit/

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: 404 response from Schema API

2020-05-15 Thread Mark H. Wood
On Thu, May 14, 2020 at 02:47:57PM -0600, Shawn Heisey wrote:
> On 5/14/2020 1:13 PM, Mark H. Wood wrote:
> > On Fri, Apr 17, 2020 at 10:11:40AM -0600, Shawn Heisey wrote:
> >> On 4/16/2020 10:07 AM, Mark H. Wood wrote:
> >>> I need to ask Solr 4.10 for the name of the unique key field of a
> >>> schema.  So far, no matter what I've done, Solr is returning a 404.
> 
> The Luke Request Handler, normally assigned to the /admin/luke path, 
> will give you the info you're after.  On a stock Solr install, the 
> following URL would work:
> 
> /solr/admin/luke?show=schema
> 
> I have tried this on solr 4.10.4 and can confirm that the response does 
> have the information.

Thank you, for the information and especially for taking the time to test.

> Since you are working with a different context path, you'll need to 
> adjust your URL to match.
> 
> Note that as of Solr 5.0, running with a different context path is not 
> supported.  The admin UI and the more advanced parts of the startup 
> scripts are hardcoded for the /solr context.

Yes.  5.0+ isn't packaged to be run in Tomcat, as we do now, so Big
Changes are coming when we upgrade.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: 404 response from Schema API

2020-05-14 Thread Mark H. Wood
On Thu, May 14, 2020 at 03:13:07PM -0400, Mark H. Wood wrote:
> Anyway, I'll be reading up on how to upgrade to 5.  (Hopefully not
> farther, just yet -- changes between, I think, 5 and 6 mean I'd have
> to spend a week reloading 10 years worth of data.  For now I don't
> want to go any farther than I have to, to make this work.)

Nope, my memory was faulty:  those changes happened in 5.0.  (The
schemas I've been given, used since time immemorial, are chock full of
IntField and DateField.)  I'm stuck with reloading.  Might as well go
to 8.x.  Or give up on asking Solr for the schema's uniqueKey,
configure the client with the field name and cross fingers.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: 404 response from Schema API

2020-05-14 Thread Mark H. Wood
On Fri, Apr 17, 2020 at 10:11:40AM -0600, Shawn Heisey wrote:
> On 4/16/2020 10:07 AM, Mark H. Wood wrote:
> > I need to ask Solr 4.10 for the name of the unique key field of a
> > schema.  So far, no matter what I've done, Solr is returning a 404.
> > 
> > This works:
> > 
> >curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'
> > 
> > This gets a 404:
> > 
> >curl 
> > 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'
> > 
> > So does this:
> > 
> >curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'
> > 
> > We normally use the ClassicIndexSchemaFactory.  I tried switching to
> > ManagedIndexSchemaFactory but it made no difference.  Nothing is
> > logged for the failed requests.
> 
>  From what I can see, the schema API handler was introduced in version 
> 5.0.  The SchemaHandler class exists in the released javadoc for the 5.0 
> version, but not the 4.10 version.  You'll need a newer version of Solr.

*sigh*  That's what I see too, when I dig through the JARs.  For some
reason, many folks believe that the Schema API existed at least as
far back as 4.2:

  
https://stackoverflow.com/questions/7247221/does-solr-has-api-to-read-solr-schema-xml

Perhaps because the _Apache Solr Reference Guide 4.10_ says so, on
page 53.

This writer thinks it worked, read-only, on 4.10.3:

  
https://stackoverflow.com/questions/33784998/solr-rest-api-for-schema-updates-returns-method-not-allowed-405

But it doesn't work here, on 4.10.4:

  curl 'https://toolshed.wood.net:8443/isw6/solr/statistics/schema?wt=json'
  14-May-2020 15:07:03.805 INFO 
[https-jsse-nio-fec0:0:0:1:0:0:0:7-8443-exec-60] 
org.restlet.engine.log.LogFilter.afterHandle 2020-05-14  15:07:03
fec0:0:0:1:0:0:0:7  -   fec0:0:0:1:0:0:0:7  8443GET 
/isw6/solr/schema   wt=json 404 0   0   0   
https://toolshed.wood.net:8443  curl/7.69.1 -

Strangely, Solr dropped the core-name element of the path!

Any idea what happened?

Anyway, I'll be reading up on how to upgrade to 5.  (Hopefully not
farther, just yet -- changes between, I think, 5 and 6 mean I'd have
to spend a week reloading 10 years worth of data.  For now I don't
want to go any farther than I have to, to make this work.)

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Solr Ref Guide Redesign coming in 8.6

2020-04-29 Thread Mark H. Wood
At first glance, I have no big issues.  It looks clean and functional,
and I like that.  I think it will work well enough for me.

This design still has a minor annoyance that I have noted in the past:
in the table of contents pane it is easy to open a subtree, but the
only way to close it is to open another one.  Obviously not a big
deal.

I'll probably spend too much time researching how to widen the
razor-thin scrollbar in the TOC panel, since it seems to be
independent of the way I spent too much time fixing the browser's own
inadequate scrollbar width. :-) Also, the thumb's color is so close to
the surrounding color that it's really hard to see.  And for some
reason when I use the mouse wheel to scroll the TOC, when it gets to
the top or the bottom the content pane starts scrolling instead, which
is surprising and mildly inconvenient.  Final picky point:  the
scrolling is *very* insensitive -- takes a lot of wheel motion to move
the panel just a bit.

(I'm aware that a lot of the things I complain about in "modern" web
sites are the things that make them "modern".  So, I'm an old fossil. :-)

Firefox 68.7.0esr, Gentoo Linux.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: 404 response from Schema API

2020-04-16 Thread Mark H. Wood
On Thu, Apr 16, 2020 at 02:00:06PM -0400, Erick Erickson wrote:
> Assuming isw6_3 is your collection name, you have
> “solr” and “isw6_3” reversed in the URL.

No.  Solr's context is '/isw6_3/solr' and the core is 'statistics'.

> Should be something like:
> https://toolshed.wood.net:8443/solr/isw6_3/schema/uniquekey
> 
> If that’s not the case you need to mention your collection. But in
> either case your collection name comes after /solr/.

Thank you.  I think that's what I have now.

> > On Apr 16, 2020, at 12:07 PM, Mark H. Wood  wrote:
> > 
> > I need to ask Solr 4.10 for the name of the unique key field of a
> > schema.  So far, no matter what I've done, Solr is returning a 404.
> > 
> > This works:
> > 
> >  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'
> > 
> > This gets a 404:
> > 
> >  curl 
> > 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'
> > 
> > So does this:
> > 
> >  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'
> > 
> > We normally use the ClassicIndexSchemaFactory.  I tried switching to
> > ManagedIndexSchemaFactory but it made no difference.  Nothing is
> > logged for the failed requests.
> > 
> > Ideas?
> > 
> > -- 
> > Mark H. Wood
> > Lead Technology Analyst
> > 
> > University Library
> > Indiana University - Purdue University Indianapolis
> > 755 W. Michigan Street
> > Indianapolis, IN 46202
> > 317-274-0749
> > www.ulib.iupui.edu
> 
> 

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


404 response from Schema API

2020-04-16 Thread Mark H. Wood
I need to ask Solr 4.10 for the name of the unique key field of a
schema.  So far, no matter what I've done, Solr is returning a 404.

This works:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'

This gets a 404:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'

So does this:

  curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'

We normally use the ClassicIndexSchemaFactory.  I tried switching to
ManagedIndexSchemaFactory but it made no difference.  Nothing is
logged for the failed requests.

Ideas?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Optimal size for queries?

2020-04-15 Thread Mark H. Wood
On Wed, Apr 15, 2020 at 10:09:59AM +0100, Colvin Cowie wrote:
> Hi, I can't answer the question as to what the optimal size of rows per
> request is. I would expect it to depend on the number of stored fields
> being marshaled, and their type, and your hardware.

It was a somewhat naive question, but I wasn't sure how to ask a
better one.  Having thought a bit more, I expect that the eventual
solution to my problem will include a number of different changes,
including larger pages, tuning several caches, providing a progress
indicator to the user, and (as you point out below) re-thinking how I
ask Solr for so many documents.

> But using start + rows is a *bad thing* for deep paging. You need to use
> cursorMark, which looks like it was added in 4.7 originally
> https://issues.apache.org/jira/browse/SOLR-5463
> There's a description on the newer reference guide
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
> and in the 4.10 PDF on page 305
> https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf
> 
> http://yonik.com/solr/paging-and-deep-paging/

Thank you for the links.  I think these will be very helpful.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Optimal size for queries?

2020-04-10 Thread Mark H. Wood
I need to pull a *lot* of records out of a core, to be statistically
analyzed and the stat.s presented to the user, who is sitting at a
browser waiting.  So far I haven't seen a way to calculate the stat.s
I need in Solr itself.  It's difficult to know the size of the total
result, so I'm running the query repeatedly and windowing the results
with 'start' and 'rows'.  I just guessed that a window of 1000
documents would be reasonable.  We currently have about 48GB in the
core.

The product uses Solr 4.10.  Yes, I know that's very old.

What I got is that every three seconds or so I get another 1000
documents, totalling around 500KB per response.  For a user request
for a large range, this is taking way longer than the user's browser
is willing to wait.  The single CPU on my test box is at 99%
continuously, and Solr's memory use is around 90% of 8GB.  The test
hardware is a VMWare guest on an 'Intel(R) Xeon(R) Gold 6150 CPU @
2.70GHz'.

A sample query:

0:0:0:0:0:0:0:1 - - [10/Apr/2020:13:34:18 -0400] "GET 
/solr/statistics/select?q=*%3A*=1000=%2Btype%3A0+%2BbundleName%3AORIGINAL+%2Bstatistics_type%3Aview=%2BisBot%3Afalse=%2Btime%3A%5B2018-01-01T05%3A00%3A00Z+TO+2020-01-01T04%3A59%3A59Z%5D=time+asc=867000=javabin=2
 HTTP/1.1" 200 497475 "-" 
"Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0"

As you can see, my test was getting close to 1000 windows.  It's still
going.  I don't know how far along that is.

So I'm wondering:

o  how can I do better than guessing that 1000 is a good window size?
   How big a response is too big?

o  what else should I be thinking about?

o  given that my test on a full-sized copy of the live data has been
   running for an hour and is still going, is it totally impractical
   to expect that I can improve the process enough to give a response
   to an ad-hoc query while-you-wait?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: How do *you* restrict access to Solr?

2020-03-19 Thread Mark H. Wood
On Mon, Mar 16, 2020 at 11:43:10AM -0400, Ryan W wrote:
> On Mon, Mar 16, 2020 at 11:40 AM Walter Underwood 
> wrote:
> 
> > Also, even if you prevent access to the admin UI, a request to /update can
> > delete
> > all the content. It is really easy. This Gist shows how.
> >
> > https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3
> 
> 
> 
> This seems important.  In other words, my work isn't necessarily done if
> I've secured the graphical UI.  I can't just visit the admin UI page to see
> if my efforts are successful.

It is VERY IMPORTANT.  You are correct.  The Admin. GUI is just a
convenience layer over extensive REST APIs.  You need to secure access
to the APIs, not just the admin. application that runs on top of them.

If all use is from the local host, then running Solr only on the
loopback address will keep outsiders from connecting to any part of
it.

If other internal hosts need access, then I would run Solr only on an
RFC1918 (non-routed) address, and set up the Solr host's firewall to
grant access to Solr's port (8983 by default) only from permitted hosts.

  https://tools.ietf.org/html/rfc1918

Who/what needs access to Solr?  Do you need to grant different levels
of access to specific groups of users?  Then you need something like
Role-Based Access Control.  This is true even if access is only
internal or even just from the same host.  Address-based controls only
divide the universe between those who can do nothing to your Solr and
those who can do *everything* to your Solr.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Dependency log4j-slf4j-impl for solr-core:7.5.0 causing a number of build problems

2020-01-17 Thread Mark H. Wood
On Thu, Jan 16, 2020 at 03:13:17PM +, Wolf, Chris (ELS-CON) wrote:
> --- original message ---
> It looks to me as though solr-core is not the only artifact with that
> dependency.  The first thing I would do is examine the output of 'mvn
> dependency:tree' to see what has dragged log4j-slf4j-impl in even when
> it is excluded from solr-core. 
> --- end of original message ---
> 
> Hi, that's the first thing I did and *only* solr-core is pulling in 
> log4j-slf4j-impl, but there is more weirdness to this.  When I build as a WAR 
> project, then version 2.11.0 of in log4j-slf4j-impl is pulled in which 
> results in "multiple implementations" warning and is non-fatal.  
> 
> However, when building as a spring-boot executable jar, for some reason, it 
> pulls in version 2.7 rather then 2.11.0 resulting in fatal 
> "ClassNotFoundException: org.apache.logging.log4j.util.ReflectionUtil"

For the version problem, I would try adding something like:

  

  org.apache.logging.log4j
  log4j-slf4j-impl
  2.11.0

  

to pin down the version no matter what is pulling it in.  Not ideal,
since you want to be rid of this dependency altogether, but at least
it may allow the spring-boot artifact to run, until the other problem
is sorted.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Dependency log4j-slf4j-impl for solr-core:7.5.0 causing a number of build problems

2020-01-16 Thread Mark H. Wood
On Thu, Jan 16, 2020 at 02:03:06AM +, Wolf, Chris (ELS-CON) wrote:
[snip]
> There are several issues:
> 
>   1.  I don’t want log4j-slf4j-impl at all
>   2.  Somehow the version of “log4j-slf4j-impl” being used for the build is 
> 2.7 rather then the expected 2.11.0
>   3.  Due to the version issue, the app croaks with ClassNotFoundException: 
> org.apache.logging.log4j.util.ReflectionUtil
> 
> For issue #1, I tried:
>   
>   org.apache.solr
>   solr-core
>   7.5.0
>   
> 
>   org.apache.logging.log4j
>   log4j-slf4j-impl
> 
>   
> 
> 
> All to no avail, as that dependency ends up in the packaged build - for WAR, 
> it’s version 2.11.0, so even though it’s a bad build, the app runs, but for 
> building a spring-boot executable JAR with embedded webserver, for some 
> reason, it switches log4j-slf4j-impl from version 2.11.0  to 2,7 (2.11.0  
> works, but should not even be there)
> 
> I also tried this:
> https://docs.spring.io/spring-boot/docs/current/maven-plugin/examples/exclude-dependency.html
> 
> …that didn’t work either.
> 
> I’m thinking that solr-core should have added a classifier of “provided” for 
> “log4j-slf4j-impl”, but that’s conjecture of a possible solution going 
> forward, but does anyone know how I can exclude  “log4j-slf4j-impl”  from a 
> spring-boot build?

It looks to me as though solr-core is not the only artifact with that
dependency.  The first thing I would do is examine the output of 'mvn
dependency:tree' to see what has dragged log4j-slf4j-impl in even when
it is excluded from solr-core.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Get Solr to notice new core without restarting?

2019-12-13 Thread Mark H. Wood
I have a product which comes with several empty Solr cores already
configured and laid out, ready to be copied into place where Solr can
find them.  Is there a way to get Solr to notice new cores without
restarting it?  Is it likely there ever will be?  I'm one of the
people who test and maintain the product, so I'm always creating and
destroying instances.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Active directory integration in Solr

2019-11-20 Thread Mark H. Wood
On Mon, Nov 18, 2019 at 03:08:51PM +, Kommu, Vinodh K. wrote:
> Does anyone know that Solr has any out of the box capability to integrate 
> Active directory (using LDAP) when security is enabled? Instead of creating 
> users in security.json file, planning to use users who already exists in 
> active directory so they can use their individual credentials rather than 
> defining in Solr. Did anyone came across similar requirement? If so was there 
> any working solution?

Searching for "solr authentication ldap" turned up this:

https://risdenk.github.io/2018/11/20/apache-solr-hadoop-authentication-plugin-ldap.html

ADS also uses Kerberos, and Solr has a Kerberos authN plugin.  Would
that help?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Anyway to encrypt admin user plain text password in Solr

2019-11-14 Thread Mark H. Wood
On Thu, Nov 14, 2019 at 11:35:47AM +, Kommu, Vinodh K. wrote:
> We store the plain text password in basicAuth.conf file. This is a normal 
> file & we are securing it only with 600 file permissions so that others 
> cannot read it. We also run various solr APIs in our custom script for 
> various purposes using curl commands which needs admin user credentials to 
> perform operations. If admin credentials details from basicAuth.conf file or 
> from curl commands are exposed/compromised, eventually any person within the 
> organization who knows credentials can login to admin UI and perform any 
> read/write operations. This is a concern and auditing issue as well.

If the password is encrypted, then the decryption key must be supplied
before the password can be used.  This leads to one of two unfortunate
situations:

o  The user must enter the decryption key every time.  This defeats
   the purpose of storing credentials at the client.

   - or -

o  The decryption key is stored at the client, making it a new secret
   that must be protected (by encrypting it? you see where this is
   going)

There is no way around this.  If the client system stores a full set
of credentials, then anyone with sufficient access to the client
system can get everything he needs to authenticate an identity, no
matter what you do.  If the client system does not store a full set of
credentials, then the user must supply at least some of them whenever
they are needed.  The best one can usually do is to reduce the
frequency at which some credential must be entered manually.

Solr supplies several authentication mechanisms besides BasicAuth.
Would one of those serve?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: Enumerating cores via SolrJ

2019-08-13 Thread Mark H. Wood
On Fri, Aug 09, 2019 at 03:45:21PM -0600, Shawn Heisey wrote:
> On 8/9/2019 3:07 PM, Mark H. Wood wrote:
> > Did I miss something, or is there no way, using SolrJ, to enumerate
> > loaded cores, as:
> > 
> >curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'
> > 
> > does?
> 
> This code will do so.  I tested it.
[snip]

Thank you.  That was just the example I needed.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Enumerating cores via SolrJ

2019-08-09 Thread Mark H. Wood
Did I miss something, or is there no way, using SolrJ, to enumerate
loaded cores, as:

  curl 'http://solr.example.com:8983/solr/admin/cores?action=STATUS'

does?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: SOLR Suggester returns either the full field value or single terms only

2019-06-20 Thread Mark H. Wood
On Wed, Jun 19, 2019 at 12:20:43PM -0700, ppunet wrote:
> As the SuggeterComponent provides the 'entire content' of the field in the
> suggestions. How is it possible to have Suggester to return only part of the
> content of the field, instead of the entire content, which in my scenario
> quite long?

Possibly worthless newbie suggestion:  could you use highlighting to
locate the text that triggered the suggestion, and just chop off
leading and trailing context down to a reasonable length surrounding
the match?  Kind of like you'd see in a printed KWIC index:  give as
much context as will fit the available space, and don't worry about
the rest.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Proper type(s) for adding a DatePointField value [was: problems with indexing documents]

2019-04-04 Thread Mark H. Wood
One difficulty is that the documentation of
SolrInputDocument.addField(String, Object) is not at all specific.
I'm aware of SOLR-2298 and I accept that the patch is an improvement,
but still...

  @param value Value of the field, should be of same class type as
  defined by "type" attribute of the corresponding field in
  schema.xml.

The corresponding 's 'type' attribute is an arbitrary label
referencing the 'name' attribute of a .  It could be
"boysenberry" or "axolotl".  So we need to look at the 'class'
attribute of the fieldType?  So, if I have in my schema:

  
  

then I need to pass an instance of DatePointField?

  myDoc.addField("created", new DatePointField(bla bla));

That doesn't seem right, but go ahead and surprise me.

But I *know* that it accepts a properly formatted String value for a
field using DatePointField.  So, how can I determine the set of Java
types that is accepted as a new field value for a field whose field
type's class attribute is X?  And where should I have read that?

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: problems with indexing documents

2019-04-02 Thread Mark H. Wood
I'm also working on this with Bill.

On Tue, Apr 02, 2019 at 09:44:16AM +0800, Zheng Lin Edwin Yeo wrote:
> Previously, did you index the date in the same format as you are using now,
> or in the Solr format of "-MM-DDTHH:MM:SSZ"?

As may be seen from the sample code:

> > doc.addField ( "date", new java.util.Date() );

we were not using a string format at all, but passing a java.util.Date
object.  In the past this was interpreted successfully and correctly.
After upgrading, we get an error:

> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'

which suggests to me that something in or below
SolrInputDocument.addField(String, Object) is applying Date.toString()
to the Object, which yields a string format that Solr does not
understand.

I am dealing with this by trying to hunt down all the places where
Date was passed to addField, and explicitly convert it to a String in
Solr format.  But we would like to know if there is a better way, or
at least what I did wrong.

The SolrJ documentation says nothing about how the field value Object
is handled.  It does say that it should match the schema, but I can
find no table showing what Java object types "match" the stock schema
fieldtype classes such as DatePointField.  I would naively suppose that
j.u.Date is a particularly *good* match for DatePointField.  What have
I missed?

> On Tue, 2 Apr 2019 at 00:32, Bill Tantzen  wrote:
> 
> > In a legacy application using Solr 4.1 and solrj, I have always been
> > able to add documents with TrieDateField types using java.util.Date
> > objects, for instance,
> >
> > doc.addField ( "date", new java.util.Date() );
> >
> > having recently upgraded to Solr 7.7, and updating my schema to
> > leverage DatePointField as my type, that code no longer works,  it
> > throws an exception with an error like:
> >
> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'
> >
> > I understand that this String is not what solr expects, but in lieu of
> > formatting the correct String, is there no longer a way to pass in a
> > simple Date object?  Was there some kind of implicit conversion taking
> > place earlier that is no longer happening?
> >
> > In fact, in the some of the example code that come with the solr
> > distribution, (SolrExampleTests.java), document timestamp fields are
> > added using the same AddField call I am attempting to use, so I am
> > very confused.
> >
> > Thanks for any advice!
> >
> > Regards,
> > Bill
> >

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature