JVM Heap Memory Increase (SOLR CLOUD)

2018-04-20 Thread Doss
We have SOLR(7.0.1) cloud 3 VM Linux instances wit 4 CPU, 90 GB RAM with
zookeeper (3.4.11) ensemble running on the same machines. We have 130 cores of
overall size of 45GB. No Sharding, almost all VMs has the same copy of data.
These nodes are under LB.

http://lucene.472066.n3.nabble.com/SOLR-Cloud-1500-threads-are-in-TIMED-WAITING-status-td4383636.html
- Refer this for Merge and commit configs

JVM Heap Size : 15GB

Optimize : Daily Once

GC Config: Default

-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseGCLogFileRotation
-XX:+UseParNewGC
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:GCLogFileSize=20M
-XX:MaxTenuringThreshold=8
-XX:NewRatio=3
-XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/home/solr/bin/oom_solr.sh 8980
/data/solr/server/logs
-XX:ParallelGCThreads=4

ISSUE: After running 8 to 9 hours the JVM heap memory keeps on increasing,
at that time if we did optimize then I am seeing 3 to 3.5 gb reduction, but
doing optimize at day time will be a problem, on the other side if the heap
got full then OOM exception is happening and the cloud crashes.

I read somewhere that G1GC will give better results, but SOLR experts
doesn't encourage to use it

what else we can do to resolve this issue?

Thanks,
Doss.


Re: Run solr server using Java program

2018-04-20 Thread Shawn Heisey

On 4/20/2018 6:01 AM, rameshkjes wrote:

Using solrJ, I am able to access the solr core. But still I need to go to
command prompt to execute command for solr instance. Is there way to do
that?


I saw that you entered the IRC channel previouslyand asked the same 
question, but I got no response from you when I tried to get more 
information.  And then you left the channel shortly afterwards.


Checking IRC this morning, it looks like you logged into the channel 
again hours before I woke up for the day.  And a little later, I tried 
to speak to you there, but your connection timed out before I was done 
typing my message.  This is what I said:


10:16 <@elyograg> Ramesh_: there is EmbeddedSolrServer.  But this is not
  recommended for production.  It cannot be made fault
  tolerant, and it is only accessible from the program that
  started it.

Solr is best used as a service.  There is a service installer script 
included with Solr that works on typical non-windows operating systems.  
You mentioned solr.cmd on IRC, so I'm guessing it's on Windows.  You're 
on your own for getting a service running there.  I recommend looking at 
NSSM.


EmbeddedSolrServer can work well for tests.

Thanks,
Shawn



Re: SolrCloud design question

2018-04-20 Thread Shawn Heisey

On 4/20/2018 5:38 AM, Bernd Fehling wrote:

Thanks Alessandro for the info.

I am currently in the phase to find the right setup with shards,
nodes, replicas and so on.
I have decided to begin with 5 hosts and want to setup 1 collection with 5 
shards.
And start with 2 replicas per shard.

But the next design question is, should each replica get its own instance?

What will give better performance, all replicas in one java instance or
having one instance for each replica?


Erick's reply is pretty complete.  A shorter version: Unless the heap 
required is huge, only run one Solr instance per server.  Huge heap 
requirements are the only good reason I can think of for more than one 
Solr instance on a server. One Solr instance can handle many cores 
(shard replicas).


Exactly what heap size should be considered "huge" is a subject for 
debate.  I would say that if you have to go above 31GB, that's a 
definite candidate for splitting into two instances, but there might be 
reasons for drawing the line at a lower value.


A tangent:  Ideally, the servers will be bare metal, not virtual 
machine.  There's nothing technically wrong with VMs. Most of my reasons 
for preferring bare metal servers are non-technical.  Executive and 
sales/marketing people tend to view a machine running VMs as a 
bottomless resource and expect it to do far more than it is capable of 
doing.  If the physical host is not oversubscribed, then VMs can be a 
great option.


If you do use VMs, then you should ensure that the VMs hosting different 
replicas are running on completely separate physical hardware.  The idea 
is to ensure that if one hardware chassis fails, all of your shards 
still have at least one replica running.


Thanks,
Shawn



Re: Running an analyzer chain in an update request processor

2018-04-20 Thread Walter Underwood
I’m back.

I think I’m following the steps in Eric Hatcher’s slides: 
https://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks

With a few minor changes, like using getIndexAnalyzer() because getAnalyzer() 
is gone. And I’ve pulled the subroutine code into the main processAdd function.

Any ideas about the cause of this error?

java.lang.ClassCastException: Cannot cast 
jdk.internal.dynalink.beans.StaticClass to java.lang.Class
at 
java.lang.invoke.MethodHandleImpl.newClassCastException(MethodHandleImpl.java:361)
at 
java.lang.invoke.MethodHandleImpl.castReference(MethodHandleImpl.java:356)
at 
jdk.nashorn.internal.scripts.Script$Recompilation$37$104A$\^eval\_.processAdd(:15)

This is the code up through line 15:

// Generate minhashes using the "minhash" analyzer chain
var analyzer = 
req.getCore().getLatestSchema().getFieldTypeByName('minhash').getIndexAnalyzer();
var hashes = [];
var token_stream = analyzer.tokenStream(null, new 
java.io.StringReader(question));
var term_att = 
token_stream.getAttribute(Packages.org.apache.lucene.analysis.tokenattributes.CharTermAttribute);

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 7, 2018, at 9:50 AM, Walter Underwood  wrote:
> 
> As I think more about this, we should have a signature processor that uses 
> minhash. The MD5 signature processor was really easy to use.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 7, 2018, at 4:55 AM, Emir Arnautović > > wrote:
>> 
>> Hi Walter,
>> I did this sample processor for the purpose of having doc values on analysed 
>> field: https://github.com/od-bits/solr-multivaluefield-processor 
>>  
>> > >
>> 
>> (+ related blog: 
>> http://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 
>>  
>> > >)
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ 
>> 
>> 
>> 
>> 
>>> On 6 Apr 2018, at 23:46, Walter Underwood >> > wrote:
>>> 
>>> Is there an easy way to define an analyzer chain in schema.xml then run it 
>>> in an update request processor?
>>> 
>>> I want to run a chain ending in the minhash token filter, then take those 
>>> minhashes, convert them to hex, and put them in a string field. I’d like 
>>> the values stored.
>>> 
>>> It seems like this could all work in an update request processor. Grab the 
>>> text from one field, run it through the chain, format the output tokens and 
>>> add them to the field for hashes.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org 
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>> 
> 



Re: Run solr server using Java program

2018-04-20 Thread Alessandro Benedetti
To do what?
If you mean to start a Solr Server instance, you have the solr.sh ( or the
windows starter).
You can set up your automation stack to be able to startup Solr one click.
SolrJ is a client which means you need Solr up and running.

Cheers

On Fri, 20 Apr 2018, 16:51 rameshkjes,  wrote:

> Using solrJ, I am able to access the solr core. But still I need to go to
> command prompt to execute command for solr instance. Is there way to do
> that?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: custom response writer which extends RawResponseWriter fails when shards > 1

2018-04-20 Thread Chris Hostetter

Invariant really means "invariant" ... nothing can change them

In the case of "wt" this may seem weird and unhelpful, but the code that 
handles defaults/appends/invariants is ignorant of what the params are.

Since your writting custom code anyway, my suggestion would be that 
perhaps you could make your custom ResponseWriter delegate to the javabin 
responsewriter if/when you see that this is an "isShard=true" request?



: Date: Thu, 19 Apr 2018 18:42:58 +0100
: From: Lee Carroll 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: custom response writer which extends RawResponseWriter fails when
:  shards > 1
: 
: Hi,
: 
: I rewrote all of my tests to use SolrCloudTestCase rather than SolrTestCaseJ4
: and was able to replicate the responsewriter issue and debug with a sharded
: collection. It turned out the issue was not with my response writer really
: but rather my config.
: 
: 
: 
: 
: 
: content
: 
: 
: 
: 
: In cloud mode having wt as an invariant breaks the collation of results
: from shards. Now I'm sure this is a common mistake which I've repeated
: (blush) but I do sort of want to actually implement my request handler in
: this way. Is their a way to have a request handler support a single
: response writer but still work in cloud mode ?
: 
: Could this be considered a bug ?
: 
: Lee C
: 
: On 18 April 2018 at 13:13, Mikhail Khludnev  wrote:
: 
: > Injecting headers might require deeper customisation up to establishing own
: > filter or so.
: > Speaking regarding your own WT, there might be some issues because usually
: > it's not a big deal to use one wt for responding user query like (wt=csv)
: > and wt=javabin in internal communication between aggregator and slaves like
: > it happens in wt=csv query.
: >
: > On Wed, Apr 18, 2018 at 2:19 PM, Lee Carroll  >
: > wrote:
: >
: > > Inventive. I need to control content-type of the response from the
: > document
: > > field value. I have the actual content field and the content-type field
: > to
: > > use configured in the response writer. I've just noticed that the xslt
: > > transformer allows you to do this but not controlled by document values.
: > I
: > > may also need to set some headers based on content-type and perhaps
: > content
: > > size, accept ranges comes to mind. Although I might be getting ahead of
: > > myself.
: > >
: > >
: > >
: > > On 18 April 2018 at 12:05, Mikhail Khludnev  wrote:
: > >
: > > > well ..
: > > > what if
: > > > http://localhost:8983/solr/images/select?fl=content=id:
: > > 1=1=csv&
: > > > csv.separator==null
: > > > ?
: > > >
: > > > On Wed, Apr 18, 2018 at 1:18 PM, Lee Carroll <
: > > lee.a.carr...@googlemail.com
: > > > >
: > > > wrote:
: > > >
: > > > > sorry cut n paste error i'd get
: > > > >
: > > > > {
: > > > >   "responseHeader":{
: > > > > "zkConnected":true,
: > > > > "status":0,
: > > > > "QTime":0,
: > > > > "params":{
: > > > >   "q":"*:*",
: > > > >   "fl":"content",
: > > > >   "rows":"1"}},
: > > > >   "response":{"numFound":1,"start":0,"docs":[
: > > > >   {
: > > > > "content":"my-content-value"}]
: > > > >   }}
: > > > >
: > > > >
: > > > > but you get my point
: > > > >
: > > > >
: > > > >
: > > > > On 18 April 2018 at 11:13, Lee Carroll  >
: > > > > wrote:
: > > > >
: > > > > > for http://localhost:8983/solr/images/select?fl=content=id:
: > > 1=1
: > > > > >
: > > > > > I'd get
: > > > > >
: > > > > > {
: > > > > >   "responseHeader":{
: > > > > > "zkConnected":true,
: > > > > > "status":0,
: > > > > > "QTime":1,
: > > > > > "params":{
: > > > > >   "q":"*:*",
: > > > > >   "_":"1524046333220"}},
: > > > > >   "response":{"numFound":1,"start":0,"docs":[
: > > > > >   {
: > > > > > "id":"1",
: > > > > > "content":"my-content-value",
: > > > > > "*content-type*":"text/plain"}]
: > > > > >   }}
: > > > > >
: > > > > > when i want
: > > > > >
: > > > > > my-content-value
: > > > > >
: > > > > >
: > > > > >
: > > > > > On 18 April 2018 at 10:55, Mikhail Khludnev 
: > wrote:
: > > > > >
: > > > > >> Lee, from this description I don see why it can't be addressed by
: > > > > fl,rows
: > > > > >> params. What makes it different form the typical Solr usage?
: > > > > >>
: > > > > >>
: > > > > >> On Wed, Apr 18, 2018 at 12:31 PM, Lee Carroll <
: > > > > >> lee.a.carr...@googlemail.com>
: > > > > >> wrote:
: > > > > >>
: > > > > >> > Sure, we want to return a single field's value for the top
: > > matching
: > > > > >> > document for a given query. Bare content rather than a full
: > search
: > > > > >> result
: > > > > >> > listing.
: > > > > >> >
: > > > > >> > To be concrete:
: > > > > >> >
: > > > > >> > For a schema of fields id [unique key],
: > > > content[stored],content-type[
: > > > > >> > 

RE: Specialized Solr Application

2018-04-20 Thread Allison, Timothy B.
>1) the toughest pdfs to identify are those that are partly
searchable (text) and partly not (image-based text).  However, I've
found that such documents tend to exist in clusters.
Agreed.  We should do something better in Tika to identify image-only pages on 
a page-by-page basis, and then ship those with very little text to tesseract.  
We don't currently do this.

>3) I have indexed other repositories and noticed some silent
failures (mostly for large .doc documents).  Wish there was some way
to log these errors so it would be obvious what documents have been
excluded.
Agreed on the Solr side.  You can run `java -jar tika-app.jar -J -t -i 
 -o ` and then tika-eval on the  to count 
exceptions, even exceptions in embedded documents, which are now silently 
ignored. ☹

>   4) I still don't understand the use of tika.eval - is that an
application that you run against a collection or what?
Currently, it is set up to run against a directory of extracts (text+metadata 
extracted from pdfs/word/etc).  It will give you info about # of exceptions, 
lang id, and some other statistics that can help you get a sense of how well 
content extraction worked.  It wouldn't take much to add an adapter that would 
have it run against Solr to run the same content statistics.

>5) I've seen reference to tika-server - but I have no idea on how
that tool might be usefully applied.
 We have to harden it, but the benefit is that you isolate the tika process in 
its own jvm so that it can't harm Solr.  By harden, I mean we need to spawn a 
child process and set a parent process that will kill and restart on oom or 
permanent hang.  We don't have that yet.  Tika very rarely runs into serious, 
show stopping problems (kill -9 just might solve your problem).  If you only 
have a few 10s of thousands of docs, you aren't likely to run into these 
problems.  If you're processing a few million, esp noisy things that come of 
the internet, you're more likely to run into these kinds of problems.

>6) Adobe Acrobat Pro apparently has a batch mode suitable for
flagging unsearchable (that is, image-based) pdf files and fixing them.
 Great.  If you have commercial tools available, use them.  IMHO, we have a 
ways to go on our OCR integration with PDFs.

>7) Another problem I've encountered is documents that are themselves
a composite of other documents (like an email thread).  The problem
is that a hit on such a document doesn't tell you much about the
true relevance of each contained document.  You have to do a
laborious manual search to figure it out.


Agreed.  Concordance search can be useful for making sense of large documents 
 https://github.com/mitre/rhapsode  The other 
thing that can be useful for handling genuine attachments (pdfs inside of 
email) is to treat the embedded docs as their own standalone/child doc (see 
github link and SOLR-7229.


>8) Is there a way to return the size of a matching document (which,
I think, would help identify non-searchable/image documents)?
Not that I'm aware of, but that's one of the stats calculated by tika-eval.  
Length of extracted string, number of tokens, number of alphabetic tokens, 
number of "common words" (I took top 20k most common words from Wikipedia dumps 
per lang)...and others.

Cheers,

Tim


Re: Solr group by works only after the filter queries

2018-04-20 Thread Erick Erickson
group.sort? See: https://lucene.apache.org/solr/guide/6_6/result-grouping.html

Best,
Erick

On Fri, Apr 20, 2018 at 12:57 AM, krishnakittu528
 wrote:
> HI team,
>
> My simple doc as follows.
>
> 
>
> -
>
> -
> 1
> 1
> open
> 
>
> -
> 1
> 2
> open
> 
>
> -
> 1
> 3
> closed
>
>
>
> -
> 2
> 1
> open
> 
>
> -
> 2
> 2
> open
> 
>
> -
> 2
> 3
> closed
> 
>
> -
> 2
> 4
> open
> 
>
>
>
>
>
> 
>
> I need to find the closed docs where it contains in the max action present
> in each request. But when I tried to query its giving both the docs. I used
> query as follow
>
> *http://localhost:8983/solr/test/select?fq=status:closed=true=requestid*
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr group by works only after the filter queries

2018-04-20 Thread krishnakittu528
HI team,

My simple doc as follows.



-

-
1
1
open


-
1
2
open


-
1
3
closed



-
2
1
open


-
2
2
open


-
2
3
closed


-
2
4
open








I need to find the closed docs where it contains in the max action present
in each request. But when I tried to query its giving both the docs. I used
query as follow

*http://localhost:8983/solr/test/select?fq=status:closed=true=requestid*



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Run solr server using Java program

2018-04-20 Thread rameshkjes
Using solrJ, I am able to access the solr core. But still I need to go to
command prompt to execute command for solr instance. Is there way to do
that? 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Boosting on matching results

2018-04-20 Thread Erick Erickson
Function queries have things like termfrequencies, document
frequencies and the like that might be helpful, see:
https://lucene.apache.org/solr/guide/6_6/function-queries.html

Best,
Erick

On Fri, Apr 20, 2018 at 3:58 AM, Ugo Matrangolo  wrote:
> Hi,
>
> is it possible to boost a document based on how many of the 'same kind' are
> in the current search result?
>
> An example:
>
> I'm looking at 'red dress' and this is the current situation on the facet
> counts:
>
>   "facet_counts": {
>
> "facet_queries": {},
>
> "facet_fields": {
>
>   "sku_fashion": [
>
> "children",
>
> 994,
>
> "home",
>
> 9,
>
> "men",
>
> 245,
>
> "women-apparel",
>
> 2582,
>
> "women-jewelry-access",
>
> 3,
>
> "women-shoes-handbags",
>
> 2
>
>   ]
>
> },
>
> For this user a personalisation signal is going to make me blindly boost
> all the items in the `men` fashion but looks like they are not worth of
> being pushed up given that they are less than 8% of the entire result set
> (they are probably junk that is better not to show to the user).
>
> The problem is that I have no idea how to access this info from the
> function query I use to re-score the documents based on the personalisation
> signals.
>
> Ideally, I would love to access the above info and kill the personalisation
> signal telling me to boost the `men` fashion.
>
> Any idea?
>
> Best
> Ugo


Re: SolrCloud design question

2018-04-20 Thread Erick Erickson
bq: should each replica get its own instance

By "instance" here I'm assuming you mean a JVM, i.e. running multiple
JVMs on a single physical node (host).

"It Depends"(tm) of course. Each JVMs have some overhead. What I've
usually found is that a better
question is "how much heap do I need to allocate?" The most common
performance issue
I see is GC-related, especially when it comes to "solr runs fine,
except occasionally we see long pauses"
which can result from stop-the-world GC pauses. This can lead to all
sorts of issues, like followers
going into recovery and the like.

There's also some consideration for how many CPUs etc. on each node.
So here's my
rule of thumb on where to start: start with, say, a 16G heap and run
as many JVMs per node as
it takes to accommodate the number of replicas you need. So say you
have 4 replicas/node and
they all run fine in 1 16G heap. Use one JVM.

OTOH, you discover that you need 64G to run all 4. Consider 2 or even 4 JVMs.

All bounded by how beefy your machines are. If you ave 256G RAM than 4 JVMs
is reasonable. If you nave 16G of physical RAM and 2 CPUs, well you better only
count on 1 JVM ;)

How much heap do you need? Nobody knows until you stress test, here's
a blog in case
you haven't seen it before:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Finally, I use 16G as a starting point, 'cause you have to start
somewhere. I've seen heaps range from 4G
to 80G (this latter with Azul Zing). If you have a test setup you can
see where your sweet spot is.

Oh, one more thing. If you want to precisely control where each
replica lands, the collections create
command can take an "EMPTY" parameter that sets up the collection
state in ZooKeeper but does
not add _any_ replicas. You then place each one with ADDREPLICA and
the "node" parameter. That
said unless you're hosting a bunch of different collections it's
usually just fine to let Solr place
the replicas where it wants, it tries to distribute them evenly. And
then there's the replica placement
rules you can specify...

Best,
Erick

On Fri, Apr 20, 2018 at 4:38 AM, Bernd Fehling
 wrote:
> Thanks Alessandro for the info.
>
> I am currently in the phase to find the right setup with shards,
> nodes, replicas and so on.
> I have decided to begin with 5 hosts and want to setup 1 collection with 5 
> shards.
> And start with 2 replicas per shard.
>
> But the next design question is, should each replica get its own instance?
>
> What will give better performance, all replicas in one java instance or
> having one instance for each replica?
>
> What is your opinion?
>
> Regards
> Bernd
>
>
> Am 20.04.2018 um 12:17 schrieb Alessandro Benedetti:
>> Unless you use recent Solt 7.x features where replicas can have different
>> properties[1], each replica is functionally the same at Solr level.
>> Zookeeper will elect a leader among them ( so temporary a replica will have
>> more responsibilities ) but (R1-R2-R3) does not really exist at Solr level.
>> It will just be Shard1 (ReplicaHost1, ReplicaHost2, ReplicaHost3).
>>
>> So you can't really shuffle anything at this level.
>>
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>


Re: need help on search on last name + middile initial

2018-04-20 Thread Wendy2
The issue was resolved.

*I created a new fieldType:*

  

 

  
  


  
 


*A reference:*
https://opensourceconnections.com/blog/2013/08/21/name-search-in-solr/



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: need help on search on last name + middile initial

2018-04-20 Thread Wendy2
Hi Shawn,

The issue got resolved :-)  Thank you very much for your help!!

*I created a new fieldType:*

  

 

  
  


  
 

*
A reference:*
https://opensourceconnections.com/blog/2013/08/21/name-search-in-solr/



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to protect middile initials during search

2018-04-20 Thread Wendy2
Hi Alessandro,

Thank you very much for your reply!

I got the issue resolved based on the suggestion from the article below:
https://opensourceconnections.com/blog/2013/08/21/name-search-in-solr/

*I created a new fieldType:*

  

 

  
  


  




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud design question

2018-04-20 Thread Bernd Fehling
Thanks Alessandro for the info.

I am currently in the phase to find the right setup with shards,
nodes, replicas and so on.
I have decided to begin with 5 hosts and want to setup 1 collection with 5 
shards.
And start with 2 replicas per shard.

But the next design question is, should each replica get its own instance?

What will give better performance, all replicas in one java instance or
having one instance for each replica?

What is your opinion?

Regards
Bernd


Am 20.04.2018 um 12:17 schrieb Alessandro Benedetti:
> Unless you use recent Solt 7.x features where replicas can have different
> properties[1], each replica is functionally the same at Solr level.
> Zookeeper will elect a leader among them ( so temporary a replica will have
> more responsibilities ) but (R1-R2-R3) does not really exist at Solr level.
> It will just be Shard1 (ReplicaHost1, ReplicaHost2, ReplicaHost3).
> 
> So you can't really shuffle anything at this level.
> 
> 
> 
> 
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 


Boosting on matching results

2018-04-20 Thread Ugo Matrangolo
Hi,

is it possible to boost a document based on how many of the 'same kind' are
in the current search result?

An example:

I'm looking at 'red dress' and this is the current situation on the facet
counts:

  "facet_counts": {

"facet_queries": {},

"facet_fields": {

  "sku_fashion": [

"children",

994,

"home",

9,

"men",

245,

"women-apparel",

2582,

"women-jewelry-access",

3,

"women-shoes-handbags",

2

  ]

},

For this user a personalisation signal is going to make me blindly boost
all the items in the `men` fashion but looks like they are not worth of
being pushed up given that they are less than 8% of the entire result set
(they are probably junk that is better not to show to the user).

The problem is that I have no idea how to access this info from the
function query I use to re-score the documents based on the personalisation
signals.

Ideally, I would love to access the above info and kill the personalisation
signal telling me to boost the `men` fashion.

Any idea?

Best
Ugo


Re: Run solr server using Java program

2018-04-20 Thread Alessandro Benedetti
There are various client API to use Apache Solr[1], in your case what you
need is SolrJ[2] .

Cheers

[1] https://lucene.apache.org/solr/guide/7_3/client-apis.html
[2] https://lucene.apache.org/solr/guide/7_3/using-solrj.html#using-solrj



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud design question

2018-04-20 Thread Alessandro Benedetti
Unless you use recent Solt 7.x features where replicas can have different
properties[1], each replica is functionally the same at Solr level.
Zookeeper will elect a leader among them ( so temporary a replica will have
more responsibilities ) but (R1-R2-R3) does not really exist at Solr level.
It will just be Shard1 (ReplicaHost1, ReplicaHost2, ReplicaHost3).

So you can't really shuffle anything at this level.




-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to protect middile initials during search

2018-04-20 Thread Alessandro Benedetti
Hi Wendy,
I recommend to properly configure your analysis chain.
You can start posting it here and we can help.

Generally speaking you should use the analysis tool in the Solr admin to
verify first the analysis chain is configured as you expect, then you can
pass modelling the query appropriately.

Cheers




-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html