Re: wordpress anyone?

2021-03-04 Thread dmitri maziuk

On 2021-03-03 10:24 PM, Gora Mohanty wrote:

... there does seem to be another plugin that is

open-source,and hosted on Github: https://wordpress.org/plugins/solr-power/


I saw it, they lost me at

"you'll need access to a functioning Solr 3.6 instance for the plugin to 
work as expected. This plugin does not support other versions of Solr."


Dima



wordpress anyone?

2021-03-03 Thread dmitri maziuk

Hi all,

does anyone use Solr with WP? It seems there is one for-pay-only 
offering and a few defunct projects from a decade ago... a great web 
search engine is particularly useful if it can actually be used in a client.


So has anyone heard about any active WP integration projects other than 
wpsolr.com?


Dima


Re: R: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-23 Thread dmitri maziuk

On 2021-02-23 1:53 AM, Danilo Tomasoni wrote:

Thank you all for the suggestions,
The OS is not windows, it's centos, a colleague thinks that even on linux 
defragmenting can improve performance about 2X because it keeps the data 
contiguous on disk.


You may want to check the filesystem you're using and read up on XFS vs 
EXT4.


FWIW we've had reasonable success with ZFS on Linux (look on github) 
binary drivers for centos 6 and, a bit less so: 7. With effectively 
RAID-10'ed HDDs and a regular SSD for read & write caching.


Either way, check with `df` first: if you're more than ~75% full, you 
need a bigger disk no matter what else you do.


Dima


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread dmitri maziuk

On 2021-02-22 11:18 AM, Shawn Heisey wrote:

The OS automatically uses unallocated memory to cache data on the disk. 
  Because memory is far faster than any disk, even SSD, it performs better.


Depends on the os, from "defragmenting solrdata folder" I suspect the OP 
is on windows whose filesystems and memory management does not always 
work the way the Unix textbook says.


Dima


Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread dmitri maziuk

On 2021-02-22 1:52 AM, Danilo Tomasoni wrote:

Hello all,
we are running a solr instance with around 41 MLN documents on a SATA class 10 
disk with around 10.000 rpm.
We are experiencing very slow query responses (in the order of hours..) with an 
average of 205 segments.
We made a test with a normal pc and an SSD disk, and there the same solr 
instance with the same data and the same number of segments was around 45 times 
faster.


What is your actual hardware and OS, as opposed to "normal pc"?

Dima


Re: DIH

2021-01-21 Thread dmitri maziuk

On 2021-01-20 6:26 PM, Joshua Wilder wrote:

Please reconsider the removal of the DIH from future versions. The repo
it's been moved to is a ghost town with zero engagement from Rohit (or
anyone). Not sure how 'moving' it caused it to now only support MariaDB but
that appears to be the case. The current implementation is fast, easy to
work with and just works. Please, please and thank you!



*NOTE* that "only MariaDB" is a misnomer: you need a JDBC JAR for the 
driver, and MariaDB is the only one the DIH can redistribute. For other 
databases you just need to provide your own.


Dima



Re: Apache Solr in High Availability Primary and Secondary node.

2021-01-11 Thread Dmitri Maziuk

On 1/11/2021 12:30 PM, Walter Underwood wrote:

Use a load balancer. We’re in AWS, so we use an AWS ALB.

If you don’t have a failure-tolerant load balancer implementation, the site has 
bigger problems than search.


That is the point, you have amazon doing that for you, some of us do it 
ourselves, and it wasn't clear (to me anyway) if OP was asking about that.


Dima


Re: Apache Solr in High Availability Primary and Secondary node.

2021-01-11 Thread Dmitri Maziuk

On 1/11/2021 11:25 AM, Walter Underwood wrote:

There are all sorts of problems with the primary/secondary approach. How do you 
know
the secondary is working? How do you deal with cold caches on the secondary 
when it
suddenly gets lots of load?

Instead, size the cluster with the number of hosts you need, then add one. Send 
traffic
to all of them. If any of them goes down, you have the capacity to handle the 
traffic.
This is called “N+1 provisioning”.


Where do you send your solr queries? If you have an http server at an ip 
address that answers them, that's a single point of failure unless you 
put it on a heartbet'ed cluster ip. (I tend to prefer ucarp to pacemaker 
for that as the latter is bloated and too cumbersome for simple 
active/passive setups, but that's OT.)


Dima



Re: how to check num found

2021-01-04 Thread Dmitri Maziuk

On 1/4/2021 11:25 AM, Chris Hostetter wrote:


Can't you just configure nagios to do a "negative match" against
numFound=0 ? ... ie: "if response matches 'numFound=0' fail the check."

(IIRC there's an '--invert-regex' option for this)


Nothing's ever simple: apparently the standard plugin does not want to 
talk ssl to our iis so they are not using it, they're using curl_http 
instead. Which of course does not have that option.


;)
Dima


how to check num found

2020-12-28 Thread Dmitri Maziuk

Hi all,

we're doing periodic database reloads from external sources and I'm 
trying to figure out how to monitor for errors. E.g. I'd run a query 
'?q=FOO:BAR=0' and check if "numFound" > 0, that'd tell me if the 
reload succeeded.


The check is done using nagios curl plugin, and while it can match a 
string in the response, the "> 0" check would require writing an extra 
parser -- it's a simple enough two-liner, but I'd rather not add more 
moving pieces if I can help it.


The best I can figure so far is
```
fl=result:if(gt(docfreq(FOO,BAR),0)"YES","NO")=1
```
-- returns '"result":"NO"' that our nagios plugin can look for.

Is there a better/simpler way?

TIA
Dima


Re: CPU and memory circuit breaker documentation issues

2020-12-19 Thread Dmitri Maziuk

On 12/18/2020 11:26 AM, Walter Underwood wrote:
...

I’ll file a bug and submit a patch to use for larger batch of entries with data 
in the same format, . How do I fix the documentation?


CPU load may be good if your process is CPU-bound. If you're stuck on 
iowait in your $data filesystem and not "actively running", it's not 
clear that OperatingSystemMXBean.getSystemCPULoad() will account for 
that. IIRC load average will, so you may in fact be better off using 
that divided by number of cores (which you can also get from the bean).


Dima


Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Dmitri Maziuk

On 12/17/2020 4:05 PM, Alexandre Rafalovitch wrote:

Try with the explicit URP chain too. It may work as well.


Actually in this case we're just making sure uniqueKey is in fact unique 
in all documents, so default is what we want.


For this particular dataset I may at some future point look into 
generating ID as a hash of some unique tuple or other, but then I expect 
we'll still want to keep the UUID fallback.


Dima


Re: DIH and UUIDProcessorFactory

2020-12-17 Thread Dmitri Maziuk

On 12/12/2020 4:36 PM, Shawn Heisey wrote:

On 12/12/2020 2:30 PM, Dmitri Maziuk wrote:
Right, ```Every update request received by Solr is run through a chain 
of plugins known as Update Request Processors, or URPs.```


The part I'm missing is whether DIH's 'name="/dataimport"' counts as an "Update Request", my reading is it 
doesn't and URP chain applies only to '

If you define an update chain as default, then it will be used for all 
updates made where a different chain is not specifically requested.


I have used this personally to have my custom update chain apply even 
when the indexing comes from DIH.  I know for sure that this works on 
4.x and 5.x versions; it should work on newer versions as well.




Confirmed w/ 8.7.0: I finally got to importing the one DB where I need 
this, and UUIDs are there with the default URP chain.


Thank you
Dima




Re: DIH and UUIDProcessorFactory

2020-12-12 Thread Dmitri Maziuk

On 12/12/2020 2:50 PM, Shawn Heisey wrote:

The only way I know of to use an update processor chain with DIH is to 
set 'default="true"' when defining the chain.


I did manage to find an example with the default attribute, in javadocs:

https://lucene.apache.org/solr/5_0_0/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html 


Right, ```Every update request received by Solr is run through a chain 
of plugins known as Update Request Processors, or URPs.```


The part I'm missing is whether DIH's 'name="/dataimport"' counts as an "Update Request", my reading is it 
doesn't and URP chain applies only to '

Dmitri


DIH and UUIDProcessorFactory

2020-12-12 Thread Dmitri Maziuk

Hi everyone,

is there an easy way to use the stock UUID generator with DIH? We have a 
hand-written one-liner class we use as DIH entity transformer but I 
wonder if there's a way to use the built-in UUID generator class instead.


From the TFM it looks like there isn't, is that correct?

TIA,
Dmitri


Re: data import handler deprecated?

2020-11-30 Thread Dmitri Maziuk

On 11/30/2020 7:50 AM, David Smiley wrote:

Yes, absolutely to what Eric said.  We goofed on news / release highlights
on how to communicate what's happening in Solr.  From a Solr insider point
of view, we are "deprecating" because strictly speaking, the code isn't in
our codebase any longer.  From a user point of view (the audience of news /
release notes), the functionality has *moved*.


Just FYI, there is the dih 8.7.0 jar in 
repo1.maven.org/maven2/org/apache/solr -- whereas the github build is on 
8.6.0.


Dima



Re: data import handler deprecated?

2020-11-29 Thread Dmitri Maziuk

On 11/29/2020 10:32 AM, Erick Erickson wrote:


And I absolutely agree with Walter that the DB is often where
the bottleneck lies. You might be able to
use multiple threads and/or processes to query the
DB if that’s the case and you can find some kind of partition
key.


IME the difficult part has always been dealing with incremental updates, 
if we were to roll our own, my vote would be for a database trigger that 
does a POST in whichever language the DBMS likes.


But this has not been a part of our "solr 6.5 update" project until now.

Thanks everyone,
Dima


Re: data import handler deprecated?

2020-11-28 Thread Dmitri Maziuk

On 11/28/2020 5:48 PM, matthew sporleder wrote:


...  The bottom of
that github page isn't hopeful however :)


Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC 
JAR" :)


It's a more general queston though, what is the path forward for users 
who with data in two places? Hope that a community-maintained plugin 
will still be there tomorrow? Dump our tables to CSV (and POST them) and 
roll our own delta-updates logic? Or are we to choose one datastore and 
drop the other?


Dima


data import handler deprecated?

2020-11-28 Thread Dmitri Maziuk

Hi all,

trying to set up solr-8.7.0, contrib/dataimporthandler/README.txt says 
this module is deprecated as of 8.6 and scheduled for removal in 9.0.


How do we pull data out of our relational database in 8.7+?

TIA
Dima