ValueSource with long BinaryField

2019-02-07 Thread Jason
Hi,
I'm suffering from very long search response.
Specifically, sort by ValueSource.
I stored 4,096 length double array using BinaryField per a doc.
In searching, I searched with sort function which is calculate a distance
between two double array in custom defined ValueSource.
But response time is too long.
I think that both reading binary values and decoding base64 are too long.
Is there a good solution to improve response time?

Custom ValueSource snippet is below.

public class VectorValueSource extends ValueSource {

private final java.util.Base64.Decoder decoder =
java.util.Base64.getDecoder();

public double distance(double[] qVector, double[] dVector) {
double result = 0;
for (int i = 0; i < qVector.length; i++) {
  double v = qVector[i] - dVector[i];
  result += v * v;
}
return Math.sqrt(result);
}

@Override
public FunctionValues getValues(Map context, LeafReaderContext
readerContext) throws IOException {
final FieldInfo fieldInfo =
readerContext.reader().getFieldInfos().fieldInfo(field);

if (fieldInfo != null && fieldInfo.getDocValuesType() ==
DocValuesType.BINARY) {
final BinaryDocValues binaryValues =
DocValues.getBinary(readerContext.reader(), field);
final Bits docsWithField =
DocValues.getDocsWithField(readerContext.reader(), field);

return new FunctionValues() {

@Override
public double doubleVal(int doc) {
BytesRef target = null;
if (binaryValues.get(doc).length > 0) {
target = binaryValues.get(doc);
   
VectorUtils.byte2Double(decoder.decode(target.bytes), dFeature, false);
return distance(qFeature, dFeature);
}
else {
return maxDistance;
}
}
};
}
}
}



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr collapse result repeat in 6.6.5 cloud example techproducts.

2019-02-07 Thread Joel Bernstein
Do you have more then one shard? Collapse requires that all docs in the
same collapse group be co-located on the same shard.

Grouping I believe does not require this is some scenarios.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Feb 7, 2019 at 4:07 PM 刘正  wrote:

> i try request this to techproducts collection
>
> {code}
>
> select?fl=id,genre_s={!collapse%20field=genre_s}=on=genre_s:*=json
> {code}
>
> and i get response
>
> {code:json}
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":6,
> "params":{
>   "q":"genre_s:*",
>   "indent":"on",
>   "fl":"id,genre_s",
>   "fq":"{!collapse field=genre_s}",
>   "wt":"json"}},
>   "response":{"numFound":3,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"0812521390",
> "genre_s":"fantasy"},
>   {
> "id":"0553573403",
> "genre_s":"fantasy"},
>   {
> "id":"0553293354",
> "genre_s":"scifi"}]
>   }}
> {code}
>
> when i request in grouping
>
> {code}
>
> select?fl=id,genre_s=genre_s=1=true=on=genre_s:*=json
> {code}
>
> i get the response
>
> {code:json}
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":9,
> "params":{
>   "q":"genre_s:*",
>   "indent":"on",
>   "fl":"id,genre_s",
>   "group.limit":"1",
>   "wt":"json",
>   "group.field":"genre_s",
>   "group":"true"}},
>   "grouped":{
> "genre_s":{
>   "matches":10,
>   "groups":[{
>   "groupValue":"fantasy",
>   "doclist":{"numFound":8,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"0553573403",
> "genre_s":"fantasy"}]
>   }},
> {
>   "groupValue":"scifi",
>   "doclist":{"numFound":2,"start":0,"docs":[
>   {
> "id":"0553293354",
> "genre_s":"scifi"}]
>   }}]}}}
> {code}
>


Solr collapse result repeat in 6.6.5 cloud example techproducts.

2019-02-07 Thread 刘正
i try request this to techproducts collection

{code}
select?fl=id,genre_s={!collapse%20field=genre_s}=on=genre_s:*=json
{code}

and i get response

{code:json}
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":6,
"params":{
  "q":"genre_s:*",
  "indent":"on",
  "fl":"id,genre_s",
  "fq":"{!collapse field=genre_s}",
  "wt":"json"}},
  "response":{"numFound":3,"start":0,"maxScore":1.0,"docs":[
  {
"id":"0812521390",
"genre_s":"fantasy"},
  {
"id":"0553573403",
"genre_s":"fantasy"},
  {
"id":"0553293354",
"genre_s":"scifi"}]
  }}
{code}

when i request in grouping

{code}
select?fl=id,genre_s=genre_s=1=true=on=genre_s:*=json
{code}

i get the response

{code:json}
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":9,
"params":{
  "q":"genre_s:*",
  "indent":"on",
  "fl":"id,genre_s",
  "group.limit":"1",
  "wt":"json",
  "group.field":"genre_s",
  "group":"true"}},
  "grouped":{
"genre_s":{
  "matches":10,
  "groups":[{
  "groupValue":"fantasy",
  "doclist":{"numFound":8,"start":0,"maxScore":1.0,"docs":[
  {
"id":"0553573403",
"genre_s":"fantasy"}]
  }},
{
  "groupValue":"scifi",
  "doclist":{"numFound":2,"start":0,"docs":[
  {
"id":"0553293354",
"genre_s":"scifi"}]
  }}]}}}
{code}


Re: Why solr sends a request for a metrics every minute?

2019-02-07 Thread levtannen
Jan, 

After I suppress the metrics messages, 
I found that there are other messages. They come also once a minute, but
only on the one out of 3 computers. Could you please explain me what do
these messages mean and why they are produced by only one computer?

Best wishes.

2019-02-07 20:18:37.089 INFO  (commitScheduler-18-thread-1) [   ]
o.a.s.u.SolrIndexWriter Calling setCommitData with
IW:org.apache.solr.update.SolrIndexWriter@28a1b6e5
commitCommandVersion:1624842664242315264
2019-02-07 20:18:37.093 INFO  (commitScheduler-16-thread-1) [   ]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2019-02-07 20:18:37.093 INFO  (qtp689401025-19) [c:.system s:shard1
r:core_node4 x:.system_shard1_replica_n3]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n3] 
webapp=/solr path=/update
params={update.distrib=FROMLEADER=http://usahubslvcvw122.usa.doj.gov:8983/solr/.system_shard1_replica_n1/=javabin=2}{add=[rrd|solr.collection.NJ-A-documents
(1624842664240218112)]} 0 2
2019-02-07 20:18:37.093 INFO  (commitScheduler-16-thread-1) [   ]
o.a.s.u.SolrIndexWriter Calling setCommitData with
IW:org.apache.solr.update.SolrIndexWriter@4107d3ff commitCommandVersion:0
2019-02-07 20:18:37.093 INFO  (qtp689401025-547) [c:.system s:shard1
r:core_node2 x:.system_shard1_replica_n1]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n1] 
webapp=/solr path=/update
params={wt=javabin=2}{add=[rrd|solr.collection.NJ-A-documents
(1624842664240218112)]} 0 7
2019-02-07 20:18:37.096 INFO  (qtp689401025-19) [c:.system s:shard1
r:core_node4 x:.system_shard1_replica_n3]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n3] 
webapp=/solr path=/update
params={update.distrib=FROMLEADER=http://usahubslvcvw122.usa.doj.gov:8983/solr/.system_shard1_replica_n1/=javabin=2}{add=[rrd|solr.collection.OHN-B-cases
(1624842664248606720)]} 0 0
2019-02-07 20:18:37.096 INFO  (qtp689401025-576) [c:.system s:shard1
r:core_node2 x:.system_shard1_replica_n1]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n1] 
webapp=/solr path=/update
params={wt=javabin=2}{add=[rrd|solr.collection.OHN-B-cases
(1624842664248606720)]} 0 2
2019-02-07 20:18:37.099 INFO  (qtp689401025-19) [c:.system s:shard1
r:core_node4 x:.system_shard1_replica_n3]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n3] 
webapp=/solr path=/update
params={update.distrib=FROMLEADER=http://usahubslvcvw122.usa.doj.gov:8983/solr/.system_shard1_replica_n1/=javabin=2}{add=[rrd|solr.collection.NM-A-documents
(1624842664251752448)]} 0 0
2019-02-07 20:18:37.099 INFO  (qtp689401025-580) [c:.system s:shard1
r:core_node2 x:.system_shard1_replica_n1]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n1] 
webapp=/solr path=/update
params={wt=javabin=2}{add=[rrd|solr.collection.NM-A-documents
(1624842664251752448)]} 0 1
2019-02-07 20:18:37.101 INFO  (qtp689401025-19) [c:.system s:shard1
r:core_node4 x:.system_shard1_replica_n3]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n3] 
webapp=/solr path=/update
params={update.distrib=FROMLEADER=http://usahubslvcvw122.usa.doj.gov:8983/solr/.system_shard1_replica_n1/=javabin=2}{add=[rrd|solr.collection.KYE-B-documents
(1624842664253849600)]} 0 0
2019-02-07 20:18:37.101 INFO  (qtp689401025-21) [c:.system s:shard1
r:core_node2 x:.system_shard1_replica_n1]
o.a.s.u.p.LogUpdateProcessorFactory [.system_shard1_replica_n1] 
webapp=/solr path=/update
params={wt=javabin=2}{add=[rrd|solr.collection.KYE-B-documents
(1624842664253849600)]} 0 1


 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr relevancy score different on replicated nodes

2019-02-07 Thread Erick Erickson
Optimization is safe. The large segment is irrelevant, you'll
lose a little parallelization, but on an index with this few
documents I doubt you'll notice.

As of Solr 5, optimize will respect the max segment size
which defaults to 5G, but you're well under that limit.

Best,
Erick

On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht  wrote:
>
> Thanks Erick and everyone.We are checking on stats cache.
>
> I noticed stats skew again and optimized the index to correct the same.As
> per the documents.
>
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> and
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
>
> wanted to check on below points considering we want stats skew to be
> corrected.
>
> 1.When optimized single segment won't be natural merged easily.As we might
> be doing manual optimize every time,what I visualize is at a certain point
> in future we might be having a single large segment.What impact this large
> segment is going to have?
> Our index ~30k documents i.e files with content(Segment size <1Gb as of now)
>
> 1.Do you recommend going for optimize in these situations?Probably it will
> be done only when stats skew.Is it safe?
>
> Regards
> Ashish
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Relevancy Score Calculation

2019-02-07 Thread Erick Erickson
Why do you think that would help? This sounds like an XY
problem, you are asking how to do X because you think
it'll help with problem Y but haven't told us what Y is.

At any rate, it would require a code change, in many places.
It's unlikely to be worth the effort anyway, because the
term _frequencies_ would still include counts from deleted docs.

99% of the time in my experience, worrying excessively about
this kind of detail about score calculations is wasted effort,
but it's hard to recommend one way or the other without knowing
what "Y" is above.

Best,
Erick

On Mon, Feb 4, 2019 at 12:08 AM Ashish Bisht  wrote:
>
> Hi,
>
> Currently score is calculated based on "Max Doc"  instead of "Num Docs".Is
> it possible to change it to "Num Docs"(i.e without deleted docs).Will it
> require a code change or some config change.
>
> Regards
> Ashish
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [CDCR]Unable to locate core

2019-02-07 Thread Tim
So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails. 

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower. 
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader. 
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: AIX platform: Solr goes down with java.lang.OutOfMemoryError with Open JDK 11

2019-02-07 Thread Erick Erickson
Check your ulimit for max processes and max open file handles, those
typically are places where things go weird, and the error message
isn't always that helpful. Usually we want 65K of each...

On Mon, Feb 4, 2019 at 8:25 AM Shawn Heisey  wrote:
>
> On 2/4/2019 5:53 AM, balu...@gmail.com wrote:
> > I am running solr 7.5.0 with Open JDK11 in AIX platform. When i trigger data
> > import operation , solr is going down with below error on AIX platform but,
> > the same thing works in RHEL platform.
> >
> > The same solr 7.5.0 data import operation is success with JDK8 in same AIX
> > platform.
>
> Java 11 is not qualified with any version of Solr yet.  We don't know
> whether it works or not.  Java 9 is known to work with Solr 7.x.  My
> recommendation here is to stick with Java 8 until we can find and fix
> any problems with 11.
>
> > *Error from solr.log  Solr 7.5.0 with Open JDK 11 on AIX platform:*
> >
> > /*ERROR (coreContainerWorkExecutor-2-thread-1) [   ] o.a.s.c.CoreContainer
> > Error waiting for SolrCore to be loaded on startup
> > java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830,
> > errno 11*/
>
> This OutOfMemoryError is not actually due to memory.  Java is saying
> that it failed to create a thread.
>
> Typically this is caused by the OS restricting the number of processes a
> user is allowed to start.  Sometimes the OS might treat threads
> differently than processes so a different limit might need to be
> increased ... I have no idea whether AIX behaves that way or not.
>
> The fact that it works with Java 8 is a little odd.  Maybe Java 11
> itself creates more threads than 8 does.
>
> Thanks,
> Shawn


Re: Ignore accent in a request

2019-02-07 Thread Erick Erickson
exactly _how_ is it "not working"?

Try building your parameters _up_ rather than starting with a lot, e.g.
select?defType=dismax=je suis avarié=title
^^ assumes you expect a match on title. Then:
select?defType=dismax=je suis avarié=title subject

etc.

Because mm=757 looks really wrong. From the docs:
Defines the minimum number of clauses that must match, regardless of
how many clauses there are in total.

edismax is used much more than dismax as it's more flexible, but
that's not germane here.

finally, try adding =query to the url to see exactly how the
query is parsed.

Best,
Erick

On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence  wrote:
>
> Hello,
>
> How can I ignore accent in the query result ?
>
> Request : 
> http://*:8983/solr/***/select?defType=dismax=je+suis+avarié=title%5e20+subject%5e15+category%5e1+content%5e0.5=757
>
> I want to have doc with avarié and avarie.
>
> I have add this in my schema :
>
>   {
> "name": "string",
> "positionIncrementGap": "100",
> "analyzer": {
>   "filters": [
> {
>   "class": "solr.LowerCaseFilterFactory"
> },
> {
>   "class": "solr.ASCIIFoldingFilterFactory"
> },
> {
>   "class": "solr.EdgeNGramFilterFactory",
>   "minGramSize": "3",
>   "maxGramSize": "50"
> }
>   ],
>   "tokenizer": {
> "class": "solr.KeywordTokenizerFactory"
>   }
> },
> "stored": true,
> "indexed": true,
> "sortMissingLast": true,
> "class": "solr.TextField"
>   },
>
> But it not working.
>
> Thanks.


Re: Help needed with Solrcloud error messages

2019-02-07 Thread Erick Erickson
Your solr logs on the server should have more details than just
the bare error in the full stack trace. Those would help
figure out what's happening.

Best,
Erick

On Mon, Feb 4, 2019 at 3:14 PM Webster Homer
 wrote:
>
> We have a number of collections in a Solrcloud.
>
> The cloud has 2 shards each with 2 replicas, 4 nodes. On one of the nodes I 
> am seeing a lot of errors in the log like this:
> 2019-02-04 20:27:11.831 ERROR (qtp1595212853-88527) [c:sial-catalog-product 
> s:shard1 r:core_node4 x:sial-catalog-product_shard1_replica2] 
> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error reading 
> document with docId 417762
> 2019-02-04 20:29:49.779 ERROR (qtp1595212853-87296) [c:sial-catalog-product 
> s:shard1 r:core_node4 x:sial-catalog-product_shard1_replica2] 
> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error reading 
> document with docId 417676
> 2019-02-04 20:23:47.505 ERROR (qtp1595212853-87538) [c:sial-catalog-product 
> s:shard1 r:core_node4 x:sial-catalog-product_shard1_replica2] 
> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error reading 
> document with docId 414871
>
> There are many more than these three. What does this mean?
>
> On the same node I also see problems with 2 other collections:
> ehs-catalog-qmdoc_shard1_replica2: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Error opening new searcher
> sial-catalog-category-180721_shard2_replica_n4: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Error opening new searcher
>
> Yet another replica on this node is down
>
> What could cause the error reading docId problems? Why is there a problem 
> opening a new searcher on   2 unrelated collections which just happen to be 
> on the same node? How do I go about diagnosing the problems?
>
> We've been seeing a lot of problems with solrcloud.
>
> We are on Solr 7.2
>
>


Re: Full index replication upon service restart

2019-02-07 Thread Erick Erickson
bq. We have a heavy indexing load of about 10,000 documents every 150 seconds.
Not so heavy query load.

It's unlikely that changing numRecordsToKeep will help all that much if your
maintenance window is very large. Rather, that number would have to be _very_
high.

7 hours is huge. How big are your indexes on disk? You're essentially
going to get a
full copy from the leader for each replica, so network bandwidth may
be the bottleneck.
Plus, every doc that gets indexed to the leader during sync will be stored
away in the replica's tlog (not limited by numRecordsToKeep) and replayed after
the full index replication is accomplished.

Much of the retry logic for replication has been improved starting
with Solr 7.3 and,
in particular, Solr 7.5. That might address your replicas that just
fail to replicate ever,
but won't help that replicas need to full sync anyway.

That said, by far the simplest thing would be to stop indexing during
your maintenance
window if at all possible.

Best,
Erick

On Tue, Feb 5, 2019 at 9:11 PM Rahul Goswami  wrote:
>
> Hello Solr gurus,
>
> So I have a scenario where on Solr cluster restart the replica node goes
> into full index replication for about 7 hours. Both replica nodes are
> restarted around the same time for maintenance. Also, during usual times,
> if one node goes down for whatever reason, upon restart it again does index
> replication. In certain instances, some replicas just fail to recover.
>
> *SolrCloud 7.2.1 *cluster configuration*:*
> 
> 16 shards - replication factor=2
>
> Per server configuration:
> ==
> 32GB machine - 16GB heap space for Solr
> Index size : 3TB per server
>
> autoCommit (openSearcher=false) of 3 minutes
>
> We have a heavy indexing load of about 10,000 documents every 150 seconds.
> Not so heavy query load.
>
> Reading through some of the threads on similar topic, I suspect it would be
> the disparity between the number of updates(>100) between the replicas that
> is causing this (courtesy our indexing load). One of the suggestions I saw
> was using numRecordsToKeep.
> However as Erick mentioned in one of the threads, that's a bandaid measure
> and I am trying to eliminate some of the fundamental issues that might
> exist.
>
> 1) Is the heap too less for that index size? If yes, what would be a
> recommended max heap size?
> 2) Is there a general guideline to estimate the required max heap based on
> index size on disk?
> 3) What would be a recommended autoCommit and autoSoftCommit interval ?
> 4) Any configurations that would help improve the restart time and avoid
> full replication?
> 5) Does Solr retain "numRecordsToKeep" number of  documents in tlog *per
> replica*?
> 6) The reasons for peersync from below logs are not completely clear to me.
> Can someone please elaborate?
>
> *PeerSync fails with* :
>
> Failure type 1:
> -
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Fingerprint comparison: 1
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync Other fingerprint:
> {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579893816721408, maxInHash=1624579878580912128,
> versionsHash=-8308981502886241345, numVersions=32966082, numDocs=32966165,
> maxDoc=1828452}, Our fingerprint: {maxVersionSpecified=1624579878580912128,
> maxVersionEncountered=1624579975760838656, maxInHash=1624579878580912128,
> versionsHash=4017509388564167234, numVersions=32966066, numDocs=32966165,
> maxDoc=1828452}
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:2_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 r:core_node45)
> [c:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66 s:shard11 r:core_node45
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42]
> org.apache.solr.update.PeerSync PeerSync:
> core=DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42 url=
> http://indexnode1:8983/solr DONE. sync failed
>
> 2019-02-04 20:43:50.018 INFO
> (recoveryExecutor-4-thread-2-processing-n:indexnode1:8983_solr
> x:DataIndex_1C6F947C-6673-4778-847D-2DE0FDE56C66_shard11_replica_n42
> s:shard11 

Re: Allow Join over two sharded collection

2019-02-07 Thread Erick Erickson
This doesn't appear to be being actively pursued, so it's anybody's guess.

Depending on your use-case, the streaming capabilities may be an
OOB solution.

Best,
Erick

On Wed, Feb 6, 2019 at 1:22 AM mganeshs  wrote:
>
> All,
>
> Any idea, whether this will be taken care or addressed in near future ?
>
> https://issues.apache.org/jira/browse/SOLR-8297
>
> Regards,
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to stop a new slave from serving request until it has replicated index the first time.

2019-02-07 Thread Erick Erickson
Unless you have a specific reason to use master/slave, SolrCloud
(in this case only one shard probably) will make this kind of
thing easier. This is the equivalent of ADDREPLICA. Also,
the TLOG and PULL replica types (as of Solr 7) are something of
a hybrid of master/slave and SolrCloud.

FWIW,
Erick

On Wed, Feb 6, 2019 at 10:28 AM Shawn Heisey  wrote:
>
> On 2/6/2019 9:13 AM, Pushkar Raste wrote:
> > In the master/slave setup, as soon as I start a new slave it starts to
> > serve request. Often the searches result in no documents being found as
> > index has not been replicated yet. Is there a way to stop replica from
> > serving request (marking node unhealthy) until the index is replicated for
> > the first time.
>
> There's probably not a way that's built into Solr.  Perhaps there should
> be, but it would have to be implemented, and that might take a release
> or two.
>
> But there is a ping request handler, and if your load balancer is using
> that to determine server health, you can have the core start up with the
> ping health check disabled, and then manually enable it when once it's
> really ready.
>
> Thanks,
> Shawn


Re: Accessing multiValued field from within custom function

2019-02-07 Thread Dariusz Wojtas
Hi,

Any hints on this topic?
How to access String / Text values from a multiValued field inside custom
function?

Best regards,
Dariusz Wojtas

On Thu, Jan 3, 2019 at 6:18 PM Dariusz Wojtas  wrote:

> Hi,
>
> I am using SOLR 7.5 in the cloud mode.
> I want to create a custom function similar to 'strdist' that works on
> multivalued fields (multiValued=true) and finds the highest matching score.
> Yes, I know the potential performance issues, but in my usecase this would
> bring a huge benefit.
>
> There is not much information on how to work with multiValued fields, but
> I have found a piece of code that might be useful. It's how SOLR standard
> functions are registered:
>
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ValueSourceParser.java
>
> The interesting part for me starts in line 424, when the 'field' function
> is registered.
> It optionally accepts a multivalue field for min/max calculation.
> If the 2nd argument is 'min' or 'max' it tries to resolve the field as
> SchemaField.
>   SchemaField f = fp.getReq().getSchema().getField(fieldName);
>
> Now the questions are:
> 1. Is this the path I should follow? If not - are there any other ways?
> 2. How to retrieve all the actual *String *or *Text *values from a
> multivalue field, not just a single value? Some kind of a table or set of
> values. How?
> 3. Does cloud mode change anything here? In my case the whole index is on
> a single machine, but there are several replicas.
>
> Best regards,
> Dariusz Wojtas
>
>


CloudSolrClient getDocCollection

2019-02-07 Thread Hendrik Haddorp

Hi,

when I perform a query using the CloudSolrClient the code first 
retrieves the DocCollection to determine to which instance the query 
should be send [1]. getDocCollection [2] does a lookup in a cache, which 
has a 60s expiration time [3]. When a DocCollection has to be reloaded 
this is guarded by a lock [4]. Per default there are 3 locks, which can 
cause some congestion. The main question though is why does the client 
need that timeout? According to this [5] comment the code does not use a 
watch. Wouldn't it make sense to use a watch? I thought the big 
advantage of the CloudSolrClient is that is knows were to send requests 
to, so that no extra hop needs to be done on the server side. Having to 
query ZooKeeper though for the current state does however take some of 
that advantage.


regards,
Hendrik

[1] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L849
[2] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1180
[3] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L162
[4] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L1200
[5] 
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrClient.java#L821


Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread Zheng Lin Edwin Yeo
Hi Paul,

We have tried it with the space preceeding the \n i.e. (\s*\n){2,}, with the following regex pattern:


   content
   (\s*\n){2,}
   brbr


However, we are also getting the exact same results as the earlier Example
1, 2 and 3.

As for your point 2 on perhaps in the data you have other (non printing)
characters than \n, we have find that there are no non printing characters.
It is just next line with a space. You can refer to the original content in
the same examples below.


Example 1: The sentence that the above regex pattern is working correctly
*Original content in EML file:*
Dear Sir,


I am terminating
*Original content:*Dear Sir,  \n\n \n \n\n I am terminating
*Index content: *Dear Sir,  I am terminating

Example 2: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content in EML file:*

*exalted*

*Psalm 89:17*


3 Choa Chu Kang Avenue 4
*Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
Chu Kang Avenue 4, Singapore
*Index content: *exalted  Psalm 89:17 3 Choa
Chu Kang Avenue 4, Singapore

Example 3: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content in EML file:*

http://www.concordpri.moe.edu.sg/








On Tue, Dec 18, 2018 at 10:07 AM
*Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n \n\n
\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
at 10:07 AM
*Index content: *http://www.concordpri.moe.edu.sg/ On
Tue, Dec 18, 2018 at 10:07 AM


Appreciate any other ideas or suggestions that you may have.

Thank you.

Regards,
Edwin

On Thu, 7 Feb 2019 at 22:49,  wrote:

> Hi Edwin
>
>
>
>   1.  Sorry, the pattern was wrong, the space should preceed the \n i.e.
> (\s*\n){2,}
>   2.  Perhaps in the data you have other (non printing) characters than \n?
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Donnerstag, 7. Februar 2019 15:23
> An: solr-user@lucene.apache.org
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi Paul,
>
> We have tried this suggested regex pattern as follow:
> 
>content
>(\n\s*){2,}
>brbr
> 
>
> But we still have exactly the same problem of Example 1,2 and 3 below.
>
> Example 1: The sentence that the above regex pattern is working correctly
> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> *Index content: *Dear Sir,  I am terminating
>
> Example 2: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
> Chu Kang Avenue 4, Singapore
> *Index content: *exalted  Psalm 89:17 3 Choa
> Chu Kang Avenue 4, Singapore
>
> Example 3: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n
> \n\n
> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
> at 10:07 AM
> *Index content: *http://www.concordpri.moe.edu.sg/ On
> Tue, Dec 18, 2018 at 10:07 AM
>
> Any further suggestion?
>
> Thank you.
>
> Regards,
> Edwin
>
> On Thu, 7 Feb 2019 at 22:20,  wrote:
>
> > To avoid the «\n+\s*» matching too many \n and then failing on the {2,}
> > part you could try
> >
> >
> >
> > (\n\s*){2,}
> >
> >
> >
> > If you also want to match CRLF then
> >
> > (\r?\n\s*){2,}
> >
> >
> >
> >
> >
> > Gesendet von Mail für
> > Windows 10
> >
> >
> >
> > Von: Zheng Lin Edwin Yeo
> > Gesendet: Donnerstag, 7. Februar 2019 15:10
> > An: solr-user@lucene.apache.org
> > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
> >
> >
> >
> > Hi Paul,
> >
> > Thanks for your reply.
> >
> > When I use this pattern:
> > 
> >content
> >(\n+\s*){2,}
> >brbr
> > 
> >
> > It is working for some sentence within the same content and not working
> for
> > some sentences. Please see below for the one that is working and another
> > that is not working (partially working):
> >
> > Example 1: The sentence that the above regex pattern is working correctly
> > *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> > *Index content: *Dear Sir,  I am terminating
> >
> > Example 2: The sentence that the above regex pattern is partially working
> > (as you can see, instead of 2 , there are 4 )
> > *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
> > Chu Kang Avenue 4, Singapore
> > *Index content: *exalted  Psalm 89:17 3 Choa
> > Chu Kang Avenue 4, Singapore
> >
> > Example 3: The sentence that the above regex pattern is 

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
Hi Edwin



  1.  Sorry, the pattern was wrong, the space should preceed the \n i.e. (\s*\n){2,}
  2.  Perhaps in the data you have other (non printing) characters than \n?



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Donnerstag, 7. Februar 2019 15:23
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Paul,

We have tried this suggested regex pattern as follow:

   content
   (\n\s*){2,}
   brbr


But we still have exactly the same problem of Example 1,2 and 3 below.

Example 1: The sentence that the above regex pattern is working correctly
*Original content:*Dear Sir,  \n\n \n \n\n I am terminating
*Index content: *Dear Sir,  I am terminating

Example 2: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
Chu Kang Avenue 4, Singapore
*Index content: *exalted  Psalm 89:17 3 Choa
Chu Kang Avenue 4, Singapore

Example 3: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n \n\n
\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
at 10:07 AM
*Index content: *http://www.concordpri.moe.edu.sg/ On
Tue, Dec 18, 2018 at 10:07 AM

Any further suggestion?

Thank you.

Regards,
Edwin

On Thu, 7 Feb 2019 at 22:20,  wrote:

> To avoid the «\n+\s*» matching too many \n and then failing on the {2,}
> part you could try
>
>
>
> (\n\s*){2,}
>
>
>
> If you also want to match CRLF then
>
> (\r?\n\s*){2,}
>
>
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Donnerstag, 7. Februar 2019 15:10
> An: solr-user@lucene.apache.org
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi Paul,
>
> Thanks for your reply.
>
> When I use this pattern:
> 
>content
>(\n+\s*){2,}
>brbr
> 
>
> It is working for some sentence within the same content and not working for
> some sentences. Please see below for the one that is working and another
> that is not working (partially working):
>
> Example 1: The sentence that the above regex pattern is working correctly
> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> *Index content: *Dear Sir,  I am terminating
>
> Example 2: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
> Chu Kang Avenue 4, Singapore
> *Index content: *exalted  Psalm 89:17 3 Choa
> Chu Kang Avenue 4, Singapore
>
> Example 3: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n
> \n\n
> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
> at 10:07 AM
> *Index content: *http://www.concordpri.moe.edu.sg/ On
> Tue, Dec 18, 2018 at 10:07 AM
>
> We would appreciate your help to see what is wrong?
>
> Thank you.
>
> Regards,
> Edwin
>
> On Thu, 7 Feb 2019 at 21:24,  wrote:
>
> > You don’t say what happens, just that it is not working. I assume nothing
> > is replaced? Perhaps the pattern should be
> >
> >
> >
> >"(\n\s*){2,}"
> >
> >
> >
> > ??
> >
> >
> >
> > Gesendet von Mail für
> > Windows 10
> >
> >
> >
> > Von: Zheng Lin Edwin Yeo
> > Gesendet: Donnerstag, 7. Februar 2019 14:08
> > An: solr-user@lucene.apache.org
> > Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n
> >
> >
> >
> > Hi,
> >
> > I am trying to use the RegexReplaceProcessorFactory to remove more than
> two
> > \n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n
> \n),
> > and replace it with two .
> >
> > I use the following regex pattern and it is working when I test it in
> > regex101.com. But it is not working when I put it inside the
> > RegexReplaceProcessorFactory as below:
> >
> > 
> > 
> >content
> >"(\\n\s*){2,}"
> >brbr
> > 
> >   
> >
> > To explain further about my regex pattern, \s* is instructing the regex
> to
> > match any \n that have space after and {2,} is instructing the regex to
> > match 2 or more occurrence of such pattern (\n).
> >
> > Please kindly let me know what is wrong and how should I do it?
> >
> > I am using Solr 7.6.0.
> >
> > Regards,
> > Edwin
> >
>


Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread Zheng Lin Edwin Yeo
Hi Paul,

We have tried this suggested regex pattern as follow:

   content
   (\n\s*){2,}
   brbr


But we still have exactly the same problem of Example 1,2 and 3 below.

Example 1: The sentence that the above regex pattern is working correctly
*Original content:*Dear Sir,  \n\n \n \n\n I am terminating
*Index content: *Dear Sir,  I am terminating

Example 2: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
Chu Kang Avenue 4, Singapore
*Index content: *exalted  Psalm 89:17 3 Choa
Chu Kang Avenue 4, Singapore

Example 3: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n \n\n
\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
at 10:07 AM
*Index content: *http://www.concordpri.moe.edu.sg/ On
Tue, Dec 18, 2018 at 10:07 AM

Any further suggestion?

Thank you.

Regards,
Edwin

On Thu, 7 Feb 2019 at 22:20,  wrote:

> To avoid the «\n+\s*» matching too many \n and then failing on the {2,}
> part you could try
>
>
>
> (\n\s*){2,}
>
>
>
> If you also want to match CRLF then
>
> (\r?\n\s*){2,}
>
>
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Donnerstag, 7. Februar 2019 15:10
> An: solr-user@lucene.apache.org
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi Paul,
>
> Thanks for your reply.
>
> When I use this pattern:
> 
>content
>(\n+\s*){2,}
>brbr
> 
>
> It is working for some sentence within the same content and not working for
> some sentences. Please see below for the one that is working and another
> that is not working (partially working):
>
> Example 1: The sentence that the above regex pattern is working correctly
> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> *Index content: *Dear Sir,  I am terminating
>
> Example 2: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
> Chu Kang Avenue 4, Singapore
> *Index content: *exalted  Psalm 89:17 3 Choa
> Chu Kang Avenue 4, Singapore
>
> Example 3: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n
> \n\n
> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
> at 10:07 AM
> *Index content: *http://www.concordpri.moe.edu.sg/ On
> Tue, Dec 18, 2018 at 10:07 AM
>
> We would appreciate your help to see what is wrong?
>
> Thank you.
>
> Regards,
> Edwin
>
> On Thu, 7 Feb 2019 at 21:24,  wrote:
>
> > You don’t say what happens, just that it is not working. I assume nothing
> > is replaced? Perhaps the pattern should be
> >
> >
> >
> >"(\n\s*){2,}"
> >
> >
> >
> > ??
> >
> >
> >
> > Gesendet von Mail für
> > Windows 10
> >
> >
> >
> > Von: Zheng Lin Edwin Yeo
> > Gesendet: Donnerstag, 7. Februar 2019 14:08
> > An: solr-user@lucene.apache.org
> > Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n
> >
> >
> >
> > Hi,
> >
> > I am trying to use the RegexReplaceProcessorFactory to remove more than
> two
> > \n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n
> \n),
> > and replace it with two .
> >
> > I use the following regex pattern and it is working when I test it in
> > regex101.com. But it is not working when I put it inside the
> > RegexReplaceProcessorFactory as below:
> >
> > 
> > 
> >content
> >"(\\n\s*){2,}"
> >brbr
> > 
> >   
> >
> > To explain further about my regex pattern, \s* is instructing the regex
> to
> > match any \n that have space after and {2,} is instructing the regex to
> > match 2 or more occurrence of such pattern (\n).
> >
> > Please kindly let me know what is wrong and how should I do it?
> >
> > I am using Solr 7.6.0.
> >
> > Regards,
> > Edwin
> >
>


AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
To avoid the «\n+\s*» matching too many \n and then failing on the {2,} part 
you could try



(\n\s*){2,}



If you also want to match CRLF then

(\r?\n\s*){2,}





Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Donnerstag, 7. Februar 2019 15:10
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Paul,

Thanks for your reply.

When I use this pattern:

   content
   (\n+\s*){2,}
   brbr


It is working for some sentence within the same content and not working for
some sentences. Please see below for the one that is working and another
that is not working (partially working):

Example 1: The sentence that the above regex pattern is working correctly
*Original content:*Dear Sir,  \n\n \n \n\n I am terminating
*Index content: *Dear Sir,  I am terminating

Example 2: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
Chu Kang Avenue 4, Singapore
*Index content: *exalted  Psalm 89:17 3 Choa
Chu Kang Avenue 4, Singapore

Example 3: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n \n\n
\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
at 10:07 AM
*Index content: *http://www.concordpri.moe.edu.sg/ On
Tue, Dec 18, 2018 at 10:07 AM

We would appreciate your help to see what is wrong?

Thank you.

Regards,
Edwin

On Thu, 7 Feb 2019 at 21:24,  wrote:

> You don’t say what happens, just that it is not working. I assume nothing
> is replaced? Perhaps the pattern should be
>
>
>
>"(\n\s*){2,}"
>
>
>
> ??
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Donnerstag, 7. Februar 2019 14:08
> An: solr-user@lucene.apache.org
> Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi,
>
> I am trying to use the RegexReplaceProcessorFactory to remove more than two
> \n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n \n),
> and replace it with two .
>
> I use the following regex pattern and it is working when I test it in
> regex101.com. But it is not working when I put it inside the
> RegexReplaceProcessorFactory as below:
>
> 
> 
>content
>"(\\n\s*){2,}"
>brbr
> 
>   
>
> To explain further about my regex pattern, \s* is instructing the regex to
> match any \n that have space after and {2,} is instructing the regex to
> match 2 or more occurrence of such pattern (\n).
>
> Please kindly let me know what is wrong and how should I do it?
>
> I am using Solr 7.6.0.
>
> Regards,
> Edwin
>


Re: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread Zheng Lin Edwin Yeo
Hi Paul,

Thanks for your reply.

When I use this pattern:

   content
   (\n+\s*){2,}
   brbr


It is working for some sentence within the same content and not working for
some sentences. Please see below for the one that is working and another
that is not working (partially working):

Example 1: The sentence that the above regex pattern is working correctly
*Original content:*Dear Sir,  \n\n \n \n\n I am terminating
*Index content: *Dear Sir,  I am terminating

Example 2: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
Chu Kang Avenue 4, Singapore
*Index content: *exalted  Psalm 89:17 3 Choa
Chu Kang Avenue 4, Singapore

Example 3: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n \n\n
\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
at 10:07 AM
*Index content: *http://www.concordpri.moe.edu.sg/ On
Tue, Dec 18, 2018 at 10:07 AM

We would appreciate your help to see what is wrong?

Thank you.

Regards,
Edwin

On Thu, 7 Feb 2019 at 21:24,  wrote:

> You don’t say what happens, just that it is not working. I assume nothing
> is replaced? Perhaps the pattern should be
>
>
>
>"(\n\s*){2,}"
>
>
>
> ??
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Donnerstag, 7. Februar 2019 14:08
> An: solr-user@lucene.apache.org
> Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi,
>
> I am trying to use the RegexReplaceProcessorFactory to remove more than two
> \n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n \n),
> and replace it with two .
>
> I use the following regex pattern and it is working when I test it in
> regex101.com. But it is not working when I put it inside the
> RegexReplaceProcessorFactory as below:
>
> 
> 
>content
>"(\\n\s*){2,}"
>brbr
> 
>   
>
> To explain further about my regex pattern, \s* is instructing the regex to
> match any \n that have space after and {2,} is instructing the regex to
> match 2 or more occurrence of such pattern (\n).
>
> Please kindly let me know what is wrong and how should I do it?
>
> I am using Solr 7.6.0.
>
> Regards,
> Edwin
>


AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
You don’t say what happens, just that it is not working. I assume nothing is 
replaced? Perhaps the pattern should be



   "(\n\s*){2,}"



??



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Donnerstag, 7. Februar 2019 14:08
An: solr-user@lucene.apache.org
Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi,

I am trying to use the RegexReplaceProcessorFactory to remove more than two
\n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n \n),
and replace it with two .

I use the following regex pattern and it is working when I test it in
regex101.com. But it is not working when I put it inside the
RegexReplaceProcessorFactory as below:



   content
   "(\\n\s*){2,}"
   brbr

  

To explain further about my regex pattern, \s* is instructing the regex to
match any \n that have space after and {2,} is instructing the regex to
match 2 or more occurrence of such pattern (\n).

Please kindly let me know what is wrong and how should I do it?

I am using Solr 7.6.0.

Regards,
Edwin


RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread Zheng Lin Edwin Yeo
Hi,

I am trying to use the RegexReplaceProcessorFactory to remove more than two
\n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n \n),
and replace it with two .

I use the following regex pattern and it is working when I test it in
regex101.com. But it is not working when I put it inside the
RegexReplaceProcessorFactory as below:



   content
   "(\\n\s*){2,}"
   brbr

  

To explain further about my regex pattern, \s* is instructing the regex to
match any \n that have space after and {2,} is instructing the regex to
match 2 or more occurrence of such pattern (\n).

Please kindly let me know what is wrong and how should I do it?

I am using Solr 7.6.0.

Regards,
Edwin


Re: RESTORE does not create replica as defined

2019-02-07 Thread Ganesh Sethuraman
Any help on this is much appreciated.

On Wed, Feb 6, 2019 at 11:10 AM Ganesh Sethuraman 
wrote:

> Hi
>
> We are using Solr Cloud 7.2.1. We are using the backup and restore
> features, and we finding that the restore is not working as expected. The
> restore is successful, but does not create the desired replica even though "
> replicationFactor" is set to 2, during restore.
>
> 1. Create a "test" collection with 8 shards and 2 replica and add some
> data.
> 2. Create a backup for the collection with the following command:
>
>
> http://localhost:6010/solr/admin/collections?action=BACKUP=test_bkup=test=/share/solrbackup/data/=1234
>
> 3. Then backed up collection is restored with replication factor 2:
> time curl '
> http://localhost:6010/solr/admin/collections?action=RESTORE=test_bkup=/share/solrbackup/data/=test_restored=2
> '
>
> Restore completes successfully. We see that replication factor updated
> correction in the collection view, but in the Admin UI cloud view we see
> only one replica is created.
>
> 4. Try restart of Solr, still the same issue, the collection is restored
> but replica is not automatically created.
>
> Is this a known issue in Solr Restore?
>
> Regards,
> Ganesh
>


Re: What is the benefit of stored="true" in *PointFields

2019-02-07 Thread Toke Eskildsen
On Thu, 2019-02-07 at 11:24 +0900, Yasufumi Mizoguchi wrote:
> Actually, stored is compressed but I believed that docValues was
> compressed
> in some strategies depending on
> field's values/density as following java doc says.
> 
https://lucene.apache.org/core/7_6_0/core/org/apache/lucene/codecs/lucene70/Lucene70DocValuesFormat.html

In scenarios with low diversity in Strings (city names for example),
DocValues de-duplication can work very well. It is hard to generally
compare the size of stored vs. doc values as the strategies are very
different and the relative difference is highly dependent on content.

As for query performance, Shawn is technically correct that there will
be no impact on query performance (as long as you don't use
indexed=false, docvalues=true). But it does influence document
retrieval time. Under most circumstances the difference will be small,
but if you retrieve a large number of documents or your corpus is large
(measured in documents), it can be significant:


https://lucene.apache.org/solr/guide/7_6/docvalues.html#retrieving-docvalues-during-search

Specifically, the Solr 7 series has poor random access (used for
document retrieval) doc values performance for indexes with many
documents.

- Toke Eskildsen, royal Danish Library




Re: change in White Space when upgrading 6.6 to 7.4

2019-02-07 Thread Matt Pearce



sow defaulting to false changed between 6.x and 7.x, which is why the 
problem has appeared for you, and is solved by setting sow=true in your 
defaults.


With sow=true, I would expect your query to be broken into three parts, 
and then tokenised:

ABC4856.21
AND
-field1:ABC4856.21
With sow=false, the whole query will be tokenised in one go, so one of 
the query analysers on the fields being searched is behaving differently 
depending on the string passed.


Does the parsed query (in the debugQuery=true output) give any 
indication of the differences between the two versions? What analysis is 
done on the fields being queried?


Thanks,
Matt


On 01/02/2019 16:55, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:

We had a problem when upgrading from Solr 6.6 to Solr 7.4 in that a query 
ceased to work.


The query was of the form 
http://localhost:8983/solr/collection/select?indent=on=ABC4856.21%20AND%20-field1:ABC4856.21=json=0

Basically finding a count of those records where there is some field which has 
"ABC4856.21", but where the field field1 does not have that string (in other words, where 
there is some field other than field1 which has "ABC4856.21")

For this particular collection, running the query against Solr 6.6 resulted in "response":{"numFound":0" 
(which was correct), but running it against Solr 7.4 resulted in ""response":{"numFound":21322074"

After some investigation, it seemed to be a problem with the initial "ABC4856.21" being tokenized 
as "ABC4856" and "21"

We found various work-arounds such as putting quotation marks around the string or adding 
"*:" after the "q="; but the user wanted the exact same query to work in Solr 
7.4 as it had in Solr 6.6

Eventually, we found a solution by adding "true" to the Select 
handler in solrconfig.xml (for "Separate On Whitespace").

This solution seems to be sufficient; but we would like to be sure we 
understand the solution.

Looking at lucene.apache.org/solr/guide/7_4/tokenizers.html#standard-tokenizer 
it would seem that the period should not split the string into two tokens.

Could someone clarify how we can know which Tokenize is used when, and which 
definition of White Space is used when?

Thanks



--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk


Re: Remove my mail from subscriptions

2019-02-07 Thread Gora Mohanty
On Thu, 7 Feb 2019 at 12:29, manohar c  wrote:

> Hi,
>  Please Remove my mail from the subscription list.
>

 Please see http://lucene.apache.org/solr/community.html#mailing-lists-irc

In a manner similar to how you subscribed, you need to send an email from
your subscribed account to solr-user-unsubscr...@lucene.apache.org n order
to unsubscribe.

Regards,
Gora