Re: NRT Real time Get with documentCache

2020-02-03 Thread Karl Stoney
Great stuff thank you Erick

On 04/02/2020, 00:17, "Erick Erickson"  wrote:

The documentCache shouldn’t matter at all. RTG should return the latest doc 
by maintaining a pointer into the tlogs and returning that version.

> On Feb 3, 2020, at 6:43 PM, Karl Stoney 
 wrote:
>
> Hi,
> Could anyone let me know if a real time get would return a cached, up to 
date version of a document if we enabled documentCache?
>
> Thanks
> Karl
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered 
Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in 
England No. 9439967). This email and any files transmitted with it are 
confidential and may be legally privileged, and intended solely for the use of 
the individual or entity to whom they are addressed. If you have received this 
email in error please notify the sender. This email message has been swept for 
the presence of computer viruses.



This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


RE: Solr 8.4.1 error

2020-02-03 Thread Srinivas Kashyap
Sorry for the interruption, This error was due to wrong context path mentioned 
in solr-jetty-context.xml



And in jetty.xml it was referring /solr. So index was locked.

Thanks,
Srinivas

-Original Message-
From: Srinivas Kashyap  
Sent: 04 February 2020 11:04
To: solr-user@lucene.apache.org
Subject: RE: Solr 8.4.1 error

Hi Shawn,

I did delete the data folder of the core and also did in windows command: solr 
stop -all. I see only one solr server is running in this machine which gets 
started and stopped when I do so. To confirm, I even copied my folders to 
another system and tried there but facing same issue.

In solr-config.xml if I replace ${solr.lock.type:native} 
with ${solr.lock.type:single}. It starts without any error.

Please let me know how to find if other servers are running or if it is an 
issue with solr 8.4.1 version.

Thanks,
Srinivas

-Original Message-
From: Shawn Heisey 
Sent: 04 February 2020 02:24
To: solr-user@lucene.apache.org
Subject: Re: Solr 8.4.1 error

On 2/3/2020 5:16 AM, Srinivas Kashyap wrote:
> I'm trying to upgrade to solr 8.4.1 and facing below error while start up and 
> my cores are not being listed in solr admin screen. I need your help.



> Caused by: java.nio.channels.OverlappingFileLockException
>  at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.SharedFileLockTable.add(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) 
> ~[?:1.8.0_221]
>  at java.nio.channels.FileChannel.tryLock(Unknown
> Source) ~[?:1.8.0_221]

This appears to be saying that the index in that directory is already locked.  
Lucene can detect when the index is already locked by the same program that 
tries to lock it again, and it will say so when that happens.  The message did 
not indicate that it was the same program, so in this case, it is likely that 
you already have another copy of Solr running and that copy has the index 
directory locked.  You cannot access the same index directory from multiple 
copies of Solr unless you disable locking, and that would be a REALLY bad idea.

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: how splitting more shards impact performance

2020-02-03 Thread Shawn Heisey

On 2/3/2020 5:17 PM, ChienHua wrote:

What should we expect the query performance impacted by splitting one
collection into more shards?

We expect the query performance would degrade by splitting more shards since
the overhead of merging results from several shards.

However, the test result seems not as we expect. Any idea or experience for
the performance impact?


This is a often misunderstood aspect of Solr performance.

In situations with a very high query rate, splitting into shards is 
generally going to reduce performance.  This happens because as you 
mentioned, there is overhead from merging the results.  A high query 
rate will keep all the CPUs very busy.


But in situations with a low query rate, more shards can actually make 
things faster.  This is a possibility when there is a significant 
surplus of available CPU capacity ... the subqueries for one query can 
complete concurrently, so even with the overhead of merging, the overall 
result is faster.


The size of the index can also affect this dynamic.  If you take an 
index that is way too big for a single machine and split it so it has 
shards on multiple machines, that can improve query performance 
dramatically.


Thanks,
Shawn


RE: Solr 8.4.1 error

2020-02-03 Thread Srinivas Kashyap
Hi Shawn,

I did delete the data folder of the core and also did in windows command: solr 
stop -all. I see only one solr server is running in this machine which gets 
started and stopped when I do so. To confirm, I even copied my folders to 
another system and tried there but facing same issue.

In solr-config.xml if I replace ${solr.lock.type:native} 
with ${solr.lock.type:single}. It starts without any error.

Please let me know how to find if other servers are running or if it is an 
issue with solr 8.4.1 version.

Thanks,
Srinivas

-Original Message-
From: Shawn Heisey 
Sent: 04 February 2020 02:24
To: solr-user@lucene.apache.org
Subject: Re: Solr 8.4.1 error

On 2/3/2020 5:16 AM, Srinivas Kashyap wrote:
> I'm trying to upgrade to solr 8.4.1 and facing below error while start up and 
> my cores are not being listed in solr admin screen. I need your help.



> Caused by: java.nio.channels.OverlappingFileLockException
>  at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.SharedFileLockTable.add(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) 
> ~[?:1.8.0_221]
>  at java.nio.channels.FileChannel.tryLock(Unknown
> Source) ~[?:1.8.0_221]

This appears to be saying that the index in that directory is already locked.  
Lucene can detect when the index is already locked by the same program that 
tries to lock it again, and it will say so when that happens.  The message did 
not indicate that it was the same program, so in this case, it is likely that 
you already have another copy of Solr running and that copy has the index 
directory locked.  You cannot access the same index directory from multiple 
copies of Solr unless you disable locking, and that would be a REALLY bad idea.

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


how splitting more shards impact performance

2020-02-03 Thread ChienHua
What should we expect the query performance impacted by splitting one
collection into more shards? 

We expect the query performance would degrade by splitting more shards since
the overhead of merging results from several shards. 

However, the test result seems not as we expect. Any idea or experience for
the performance impact? 




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Query Elevation Component

2020-02-03 Thread Sidharth Negi
Hi,

I want to use the Solr query elevation component. Let's say I want to
elevate "doc_id" when a user inputs the query "qwerty". I am able to get a
prototype to work by filling these values in elevate.xml and hitting the
Solr API with q="qwerty".

However, in our service, where I want to plug this in, the 'q' parameter
isn't as pure and looks more like q="'qwerty' (field1:value1)
(field2:value2)".

Any suggestions on the best way to go about this?

Thanks


Re: NRT Real time Get with documentCache

2020-02-03 Thread Erick Erickson
The documentCache shouldn’t matter at all. RTG should return the latest doc by 
maintaining a pointer into the tlogs and returning that version.

> On Feb 3, 2020, at 6:43 PM, Karl Stoney 
>  wrote:
> 
> Hi,
> Could anyone let me know if a real time get would return a cached, up to date 
> version of a document if we enabled documentCache?
> 
> Thanks
> Karl
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.



NRT Real time Get with documentCache

2020-02-03 Thread Karl Stoney
Hi,
Could anyone let me know if a real time get would return a cached, up to date 
version of a document if we enabled documentCache?

Thanks
Karl
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Blocking certain queries

2020-02-03 Thread John Davis
Hello,

Is there a way to block certain queries in solr? For eg a delete for *:* or
if there is a known query that causes problems, can these be blocked at the
solr server layer.


Graph Query Bug ?

2020-02-03 Thread sambasivarao giddaluri
Hi All ,
Solr 8.2
Database structure .
Parent -> Children
Each child  has parent referenceId

Query: Get Parent doc based on child query

Method 1: {!graph from=parentId to=parentId
traversalFilter='docType:parent' returnRoot=false}child.name:foo AND
child.type:name

Result : 1
Debug:
 "rawquerystring": "{!graph from=parentId to=parentId
traversalFilter='docType:parent' returnRoot=false}child.name:foo AND
child.type:name",
"querystring": "{!graph from=parentId to=parentId
traversalFilter='docType:parent' returnRoot=false}child.name:foo AND
child.type:name",
"parsedquery": "GraphQuery([[+child.name:foo
+child.type:name],parentId=parentId] [TraversalFilter:
docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false])",
"parsedquery_toString": "[[+child.name:foo
+child.type:name],parentId=parentId] [TraversalFilter:
docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false]",

Method 2: ({!graph from=parentId to=parentId
traversalFilter='docType:parent' returnRoot=false}child.name:foo AND
child.type:name)
Result : 0
Debug:
"rawquerystring": "({!graph from=parentId to=parentId
traversalFilter='docType:parent' returnRoot=false}child.name:foo AND
child.type:name)",
"querystring": "({!graph from=parentId to=parentId
traversalFilter='docType:parent' returnRoot=false}child.name:foo AND
child.type:name)",
"parsedquery": "+GraphQuery([[child.name:foo],parentId=parentId]
[TraversalFilter:
docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false])
+child.type:name",
"parsedquery_toString": "+[[child.name:foo],parentId=parentId]
[TraversalFilter:
docType:parent][maxDepth=-1][returnRoot=false][onlyLeafNodes=false][useAutn=false]
+child.type:name",


Any reason why it works differently .

Regards
sam


Connection spike when slight solr latency spike

2020-02-03 Thread Karl Stoney
Hey all,
When our searcher refreshes on a soft-commit, we get a slight latency spike 
(p99th response times can jump up to about 200ms from 100ms), however what we 
see in the upstream clients using org.apache.solr.client.solrj SolrClient is a 
big spike in connections outbound (70-80 per client, from usually around 25/26) 
and a much higher response time (orders of magnitude).

Naturally the increase in connections could be a symptom of the slight increase 
in latency (to maintain throughput more connections are needed), but it feels 
like we're hitting some sort of limit causing some requests to stall/get 
blocked.

Has anyone seen any behaviour like this before?

SolrClient 8.4.1
Solr 7.7

Thanks in advance
Karl
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Performance comparison for wildcard searches

2020-02-03 Thread Shawn Heisey

On 2/3/2020 12:06 PM, Rahul Goswami wrote:

I am working with Solr 7.2.1 and had a question regarding the performance
of wildcard searches.

q=*:*
vs
q=id:*
vs
q=id:[* TO *]

Can someone please rank them in the order of performance with the
underlying reason?


The only one of those that is an actual wildcard search is the middle 
one.  The others are special syntax.  The first one is special syntax 
that means "all documents."  The third one is a range query with special 
syntax that means "any value to any value".


If "id" is your uniqueKey field, which seems likely, then all three of 
those queries will produce identical results, and the likely speed 
ranking will be:


*:*
id:[* TO *]
id:*

The first two are going to complete pretty quickly, and the third will 
be a LOT slower.


What Solr must do for a wildcard query is first look at the index to 
determine what terms in the index match the wildcard string.  And then 
it will construct a Lucene query internally that quite literally 
includes every single one of those terms.  Which means that if the field 
contains 10 million unique values for the id field, the constructed 
query for id:* will contain ten million values.  And each and every one 
of them will be matched individually against the index.  Getting the 
list of matching terms in the first place will probably be pretty slow, 
and then the individual matches against the index will add up quickly.


Thanks,
Shawn


Re: Solr 8.4.1 error

2020-02-03 Thread Shawn Heisey

On 2/3/2020 5:16 AM, Srinivas Kashyap wrote:

I'm trying to upgrade to solr 8.4.1 and facing below error while start up and 
my cores are not being listed in solr admin screen. I need your help.





Caused by: java.nio.channels.OverlappingFileLockException
 at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) 
~[?:1.8.0_221]
 at sun.nio.ch.SharedFileLockTable.add(Unknown Source) 
~[?:1.8.0_221]
 at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) 
~[?:1.8.0_221]
 at java.nio.channels.FileChannel.tryLock(Unknown Source) 
~[?:1.8.0_221]


This appears to be saying that the index in that directory is already 
locked.  Lucene can detect when the index is already locked by the same 
program that tries to lock it again, and it will say so when that 
happens.  The message did not indicate that it was the same program, so 
in this case, it is likely that you already have another copy of Solr 
running and that copy has the index directory locked.  You cannot access 
the same index directory from multiple copies of Solr unless you disable 
locking, and that would be a REALLY bad idea.


Thanks,
Shawn


Performance comparison for wildcard searches

2020-02-03 Thread Rahul Goswami
Hello,

I am working with Solr 7.2.1 and had a question regarding the performance
of wildcard searches.

q=*:*
vs
q=id:*
vs
q=id:[* TO *]

Can someone please rank them in the order of performance with the
underlying reason?

Thanks,
Rahul


Re: KeeperErrorCode= BadVersion

2020-02-03 Thread Rajeswari Natarajan
Any thoughts on this?. We are continuously publishing and have disabled
schemaless mode.

Thanks,
Rajeswari

On Wed, Jan 29, 2020 at 9:18 AM Rajeswari Natarajan 
wrote:

> Hi,
>
> Getting below exception. We have solrcloud 7.6 installed and have
> commented off the below in solrconfig.xml
>
> 
>
> what could be the reason.
>
> Thanks,
> Rajeswari
>
>
> 2020-01-17T13:03:40.84206185Z 2020-01-17 13:03:40,841 [myid:5] - INFO
> [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got
> user-level KeeperException when processing sessionid:0x4065b74bfde04ef
> type:setData cxid:0x559 zxid:0x500551395 txntype:-1 reqpath:n/a Error
> Path:/collections/testcollection/terms/shard1 Error:KeeperErrorCode =
> BadVersion for /collections/testcollection/terms/shard1
>


Reading authenticated user value inside custom DocTransformer

2020-02-03 Thread mosheB
We are using Solr's kerberos authentication plugin and we are trying to
implement field-level filtering based on the authenticated user and
DocTransformer class:

public class FieldAclTransformerFactory extends TransformerFactory {
@Override
public DocTransformer create(String field, SolrParams params,
SolrQueryRequest req) {
String user = req.getUserPrincipal().getName();
return new FieldAclTransformer(user);
}
}

public class FieldAclTransformer extends DocTransformer {
String user;
public FieldAclTransformer(String user) {
this.user = user;
}

@Override
public void transform(SolrDocument doc, int docid, float score) {
//filter fields according to applicative logic, based on the 
authenticated
user.
}
}

For simplicity, we do not use authorization plugin (here is our complete
security.json file):
{
"authentication":{
"class": "org.apache.solr.security.KerberosPlugin"
}
}

During develop phase plugin was tested against collection with single shard
and everything worked as expected (Solr 8.3.1).
After moving to production, plugin failed. During debug we saw that the
reason is that SOME shards were getting incorrect user from
/req.getUserPrincipal().getName()/: instead of the ORIGINAL user, Solr's SPN
is returned.
Our best guess is that failing requests are the distributed requests (the
requests the are routed from the node that received the original request),
and indeed, if we add `/distrib=false/` to our request plugin wasnt failing.

So, back to the question... is this a bug in solr, or is that just not way
we suppose to get the authenticated user?
Thanks.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Getting authenticated user inside DocTransformer plugin

2020-02-03 Thread mosheB
We are using Solr's kerberos authentication plugin and we are trying to
implement field-level filtering based on the authenticated user and
DocTransformer class:

public class FieldAclTransformerFactory extends TransformerFactory {
@Override
public DocTransformer create(String field, SolrParams params,
SolrQueryRequest req) {
String user = req.getUserPrincipal().getName();
return new FieldAclTransformer(user);
}
}
//
public class FieldAclTransformer extends DocTransformer {
String user;
public FieldAclTransformer(String user) {
this.user = user;
}

@Override
public void transform(SolrDocument doc, int docid, float score) {
//filter fields according to applicative logic, based on the 
authenticated
user.
}
}

For simplicity, we do not use authorization plugin (here is our complete
security.json file):
{
"authentication":{
"class": "org.apache.solr.security.KerberosPlugin"
}
}

During develop phase plugin was tested against collection with single shard
and everything worked as expected (Solr 8.3.1).
After moving to production, plugin failed. During debug we saw that the
reason is that SOME shards were getting incorrect user from
/req.getUserPrincipal().getName()/: instead of the ORIGINAL user, Solr's SPN
is returned.
Our best guess is that failing requests are the distributed requests (the
requests the are routed from the node that received the original request),
and indeed, if we add `/distrib=false/` to our request plugin wasnt failing.

So, back to the question... is this a bug in solr, or is that just not way
we suppose to get the authenticated user?
Thanks.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replica type affinity

2020-02-03 Thread Jason Gerlowski
This is a bit of a guess - I haven't used this functionality before.
But to a novice the "tag" Rule Condition for "Rule Based Replica
Placement" sounds similar to the requirements you mentioned above.

https://lucene.apache.org/solr/guide/8_3/rule-based-replica-placement.html#rule-conditions

Good luck,

Jason

On Thu, Jan 30, 2020 at 1:00 PM Karl Stoney
 wrote:
>
> Hey,
> Thanks for the reply but I'm trying to have something fully automated and 
> dynamic.  For context I run solr on kubernetes, and at the moment it works 
> beautifully with autoscaling (i can scale up the kubernetes deployment and 
> solr adds replicas and removes them).
>
> I'm trying to add a new type of node though, backed by very fast but 
> ephemeral disks and the idea was to have only PULL replicas running on those 
> nodes automatically and NRT on the persistent disk instances.
>
> Might be a pipe dream but I'm striving for no manual configuration.
> 
> From: Edward Ribeiro 
> Sent: 30 January 2020 16:56
> To: solr-user@lucene.apache.org 
> Subject: Re: Replica type affinity
>
> Hi Karl,
>
> During collection creation you can specify the `createNodeSet` parameter as
> specified by the Solr Reference Guide snippet below:
>
> "createNodeSet
> Allows defining the nodes to spread the new collection across. The format
> is a comma-separated list of node_names, such as
> localhost:8983_solr,localhost:8984_solr,localhost:8985_solr.
> If not provided, the CREATE operation will create shard-replicas spread
> across all live Solr nodes.
> Alternatively, use the special value of EMPTY to initially create no
> shard-replica within the new collection and then later use the ADDREPLICA
> operation to add shard-replicas when and where required."
>
>
> There's also Collections API that you can use the node parameter of
> ADDREPLICA to specify the node that replica shard should be created on.
> See:
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fcollections-api.html%23CollectionsAPI-Input.9data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7Ce6f81aab85274cd0081408d7a5a56464%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637160002076345528sdata=3pFUtr6o7vK0srGR60lIUc%2Fo9QSftmAcnQDkcx5z%2Bl8%3Dreserved=0
> Other
> commands that can be useful are REPLACENODE, MOVEREPLICA.
>
> Edward
>
>
> On Thu, Jan 30, 2020 at 1:00 PM Karl Stoney
>  wrote:
>
> > Hey everyone,
> > Does anyone know of a way to have solr replicas assigned to specific nodes
> > by some sort of identifying value (in solrcloud).
> >
> > In summary I’m trying to have some Read only replicas only every be
> > assigned to nodes named “solr-ephemeral-x” and my nrt and masters assigned
> > to “solr-index”.
> >
> > Kind of like rack affinity in elasticsearch!
> >
> > Get Outlook for 
> > iOS
> > This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office:
> > 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England
> > No. 9439967). This email and any files transmitted with it are confidential
> > and may be legally privileged, and intended solely for the use of the
> > individual or entity to whom they are addressed. If you have received this
> > email in error please notify the sender. This email message has been swept
> > for the presence of computer viruses.
> >
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.


Re: How to compute index size

2020-02-03 Thread David Hastings
Yup, I find the right calculation to be as much ram as the server can take,
and as much SSD space as it will hold, when you run out, buy another server
and repeat.  machines/ram/SSD's are cheap.  just get as much as you can.

On Mon, Feb 3, 2020 at 11:59 AM Walter Underwood 
wrote:

> What he said.
>
> But if you must have a number, assume that the index will be as big as
> your (text) data. It might be 2X bigger or 2X smaller. Or 3X or 4X, but
> that is a starting point. Once you start updating, the index might get as
> much as 2X bigger before merges.
>
> Do NOT try to get by with the smallest possible RAM or disk.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 3, 2020, at 5:28 AM, Erick Erickson 
> wrote:
> >
> > I’ve always had trouble with that advice, that RAM size should be JVM +
> index size. I’ve seen 300G indexes (as measured by the size of the
> data/index directory) run in 128G of memory.
> >
> > Here’s the long form:
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > But the short form is “stress test and see”.
> >
> > To answer your question, though, when people say “index size” they’re
> usually referring to the size on disk as I mentioned above.
> >
> > Best,
> > Erick
> >
> >> On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz 
> wrote:
> >>
> >> Hello All,
> >>
> >> I want to size the RAM for my Solr cloud instance. The thumb rule is
> your
> >> total RAM size should be = (JVM size + index size)
> >>
> >> Now I have a simple question, How do I know my index size? A simple
> method,
> >> perhaps from the Solr cloud admin UI or an API?
> >>
> >> My assumption so far is the total segment info size is the same as the
> >> index size.
> >>
> >> Thanks & Regards
> >> Farhan
> >
>
>


Re: How to compute index size

2020-02-03 Thread Walter Underwood
What he said.

But if you must have a number, assume that the index will be as big as your 
(text) data. It might be 2X bigger or 2X smaller. Or 3X or 4X, but that is a 
starting point. Once you start updating, the index might get as much as 2X 
bigger before merges.

Do NOT try to get by with the smallest possible RAM or disk.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 3, 2020, at 5:28 AM, Erick Erickson  wrote:
> 
> I’ve always had trouble with that advice, that RAM size should be JVM + index 
> size. I’ve seen 300G indexes (as measured by the size of the data/index 
> directory) run in 128G of memory. 
> 
> Here’s the long form: 
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> But the short form is “stress test and see”.
> 
> To answer your question, though, when people say “index size” they’re usually 
> referring to the size on disk as I mentioned above.
> 
> Best,
> Erick
> 
>> On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz  
>> wrote:
>> 
>> Hello All,
>> 
>> I want to size the RAM for my Solr cloud instance. The thumb rule is your
>> total RAM size should be = (JVM size + index size)
>> 
>> Now I have a simple question, How do I know my index size? A simple method,
>> perhaps from the Solr cloud admin UI or an API?
>> 
>> My assumption so far is the total segment info size is the same as the
>> index size.
>> 
>> Thanks & Regards
>> Farhan
> 



Alternative of ChildDocTransformerFactory

2020-02-03 Thread kumar gaurav
HI Mikhail/ All

Do we have any alternative of ChildDocTransformerFactory i.e. fl=id,[child
parentFilter=doc_type:book childFilter=doc_type:chapter limit=100]

I am facing high performance impact because of this . Any suggestions?

Thanks



Regards
Kumar Gaurav


Auto-Suggest within Tier Architecture

2020-02-03 Thread Moyer, Brett
Hello,

Looking to see how others accomplished this goal. We have a 3 Tier 
architecture, Solr is down deep in T3 far from the end user. How do you make 
Auto-Suggest calls from the Internet Browser through the Tiers down to Solr in 
T3? We essentially created steps down each tier, but I'm looking to know what 
other approaches people have created. Did you put your solr in T1, I assume 
not, that would put it at risk. Thanks!

Brett Moyer
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request

2020-02-03 Thread Erick Erickson
I don’t quite know how TolerantUpdateProcessor works with importing CSV
files, see: https://issues.apache.org/jira/browse/SOLR-445. That is about
sending batches of docs to Solr and frankly I don’t know what path your
process will take. It’s worth a try though.

Otherwise, I typically go with SolrJ and send batches. That does combine with
TolerantUpdateProcessor.

Best,
Erick

> On Feb 3, 2020, at 10:16 AM, Joseph Lorenzini  wrote:
> 
> Hi Shawn/Erick,
> 
> This information has been very helpful. Thank you.
> 
> So I did some more investigation into our ETL process and I verified that
> with the exception of the text I sent above they are all obviously invalid
> dates. For example, one field value had 00 for a day so would guess that
> field had a non-printable character in it. S at least in the case of a
> record where a field has invalid date, the entire import process is
> aborted. I'll adjust the ETL process to stop passing invalid dates but this
> does lead me to question about failure modes for importing large data sets
> into a collection. Is there any way to specify a "continue on failure" mode
> such that solr logs that it was unable to parse a record and why and then
> continues onto the next node?
> 
> Thanks,
> Joe
> 
> On Sun, Feb 2, 2020 at 4:46 PM Shawn Heisey  wrote:
> 
>> On 2/2/2020 8:47 AM, Joseph Lorenzini wrote:
>>> 
>>> 1000
>>> 1
>>> 
>> 
>> That autoSoftCommit setting is far too aggressive, especially for bulk
>> indexing.  I don't know whether it's causing the specific problem you're
>> asking about here, but it's still a setting that will cause problems,
>> because Solr will constantly be doing commit operations while bulk
>> indexing is underway.
>> 
>> Erick mentioned this as well.  Greatly increasing the maxTime, and
>> removing maxDocs, is recommended.  I would recommend starting at one
>> minute.  The maxDocs setting should be removed from autoCommit as well.
>> 
>>> So I turned off two solr nodes, leaving a single solr node up. When I ran
>>> curl again, I noticed the import aborted with this exception.
>>> 
>>> Error adding field 'primary_dob'='1983-12-21T00:00:00Z' msg=Invalid Date
>> in
>>> Date Math String:'1983-12-21T00:00:00Z
>>> caused by: java.time.format.DateTimeParseException: Text
>>> '1983-12-21T00:00:00Z' could not be parsed at index 0'
>> 
>> That date string looks OK.  Which MIGHT mean there are characters in it
>> that are not visible.  Erick said that the single quote is balanced in
>> his message, which COULD mean that the character causing the problem is
>> one that deletes things when it is printed.
>> 
>> Thanks,
>> Shawn
>> 



Re: Importing Large CSV File into Solr Cloud Fails with 400 Bad Request

2020-02-03 Thread Joseph Lorenzini
Hi Shawn/Erick,

This information has been very helpful. Thank you.

So I did some more investigation into our ETL process and I verified that
with the exception of the text I sent above they are all obviously invalid
dates. For example, one field value had 00 for a day so would guess that
field had a non-printable character in it. S at least in the case of a
record where a field has invalid date, the entire import process is
aborted. I'll adjust the ETL process to stop passing invalid dates but this
does lead me to question about failure modes for importing large data sets
into a collection. Is there any way to specify a "continue on failure" mode
such that solr logs that it was unable to parse a record and why and then
continues onto the next node?

Thanks,
Joe

On Sun, Feb 2, 2020 at 4:46 PM Shawn Heisey  wrote:

> On 2/2/2020 8:47 AM, Joseph Lorenzini wrote:
> >  
> >  1000
> >  1
> >  
>
> That autoSoftCommit setting is far too aggressive, especially for bulk
> indexing.  I don't know whether it's causing the specific problem you're
> asking about here, but it's still a setting that will cause problems,
> because Solr will constantly be doing commit operations while bulk
> indexing is underway.
>
> Erick mentioned this as well.  Greatly increasing the maxTime, and
> removing maxDocs, is recommended.  I would recommend starting at one
> minute.  The maxDocs setting should be removed from autoCommit as well.
>
> > So I turned off two solr nodes, leaving a single solr node up. When I ran
> > curl again, I noticed the import aborted with this exception.
> >
> > Error adding field 'primary_dob'='1983-12-21T00:00:00Z' msg=Invalid Date
> in
> > Date Math String:'1983-12-21T00:00:00Z
> > caused by: java.time.format.DateTimeParseException: Text
> > '1983-12-21T00:00:00Z' could not be parsed at index 0'
>
> That date string looks OK.  Which MIGHT mean there are characters in it
> that are not visible.  Erick said that the single quote is balanced in
> his message, which COULD mean that the character causing the problem is
> one that deletes things when it is printed.
>
> Thanks,
> Shawn
>


Re: How to compute index size

2020-02-03 Thread Erick Erickson
I’ve always had trouble with that advice, that RAM size should be JVM + index 
size. I’ve seen 300G indexes (as measured by the size of the data/index 
directory) run in 128G of memory. 

Here’s the long form: 
https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

But the short form is “stress test and see”.

To answer your question, though, when people say “index size” they’re usually 
referring to the size on disk as I mentioned above.

Best,
Erick

> On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz  
> wrote:
> 
> Hello All,
> 
> I want to size the RAM for my Solr cloud instance. The thumb rule is your
> total RAM size should be = (JVM size + index size)
> 
> Now I have a simple question, How do I know my index size? A simple method,
> perhaps from the Solr cloud admin UI or an API?
> 
> My assumption so far is the total segment info size is the same as the
> index size.
> 
> Thanks & Regards
> Farhan



Solr 8.4.1 error

2020-02-03 Thread Srinivas Kashyap
Hello,

I'm trying to upgrade to solr 8.4.1 and facing below error while start up and 
my cores are not being listed in solr admin screen. I need your help.

2020-02-03 12:12:35.622 ERROR (coreContainerWorkExecutor-2-thread-1) [   ] 
o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup => 
org.apache.solr.common.SolrException: Unable to create core [businesscase]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313)
org.apache.solr.common.SolrException: Unable to create core [businesscase]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313)
 ~[?:?]
at 
org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:788) ~[?:?]
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
 ~[metrics-core-4.0.5.jar:4.0.5]
at java.util.concurrent.FutureTask.run(Unknown Source) 
~[?:1.8.0_221]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
 ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source) ~[?:1.8.0_221]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source) ~[?:1.8.0_221]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_221]
Caused by: org.apache.solr.common.SolrException
at org.apache.solr.core.SolrCore.(SolrCore.java:1072) 
~[?:?]
at org.apache.solr.core.SolrCore.(SolrCore.java:901) 
~[?:?]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292)
 ~[?:?]
... 7 more
Caused by: java.nio.channels.OverlappingFileLockException
at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) 
~[?:1.8.0_221]
at sun.nio.ch.SharedFileLockTable.add(Unknown Source) 
~[?:1.8.0_221]
at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) 
~[?:1.8.0_221]
at java.nio.channels.FileChannel.tryLock(Unknown Source) 
~[?:1.8.0_221]
at 
org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:126)
 ~[?:?]
at 
org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[?:?]
at 
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[?:?]
at 
org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) 
~[?:?]
at 
org.apache.solr.core.SolrCore.isWriterLocked(SolrCore.java:757) ~[?:?]
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:778) 
~[?:?]
at org.apache.solr.core.SolrCore.(SolrCore.java:989) 
~[?:?]
at org.apache.solr.core.SolrCore.(SolrCore.java:901) 
~[?:?]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292)
 ~[?:?]
... 7 more

Any pointers would be helpful.

Thanks and regards,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Solr fact response strange behaviour

2020-02-03 Thread Kaminski, Adi
Hi Mikhail,

Here is the code, where basically we are trying to retrieve the value of facet 
counts, that sometimes returned as Integer and sometime as Long,
where we've got the ClassCast exception, until the W/A of Numeric casting was 
applied.



if (resList != null) {

List terms = new ArrayList();

resList.stream().forEach(entry -> {

String term = 
entry.get(BUCKET_VAL).toString().replace("_",SPACE);

terms.add(new Term(term, (Long)entry.get(BUCKET_COUNT))); // 
line 170

});



Regards,

Adi



-Original Message-
From: Mikhail Khludnev 
Sent: Thursday, January 30, 2020 8:35 AM
To: solr-user 
Subject: Re: Solr fact response strange behaviour



What's happen at AutoCompleteAPI.java:170 ?



On Wed, Jan 29, 2020 at 9:28 PM Kaminski, Adi 
mailto:adi.kamin...@verint.com>>

wrote:



> Sure, thanks for the guidance and the assistance anyway.

>

> Here is the stack trace:

> Here is the stack trace:

> [29/01/20 08:09:41:041 IST] [http-nio-8080-exec-2] ERROR api.BaseAPI:

> There was an Exception calling Solr

> java.lang.ClassCastException: java.lang.Integer cannot be cast to

> java.lang.Long at

> com.productcore.analytics.api.AutoCompleteAPI.lambda$mapSolrResponse$0

> (AutoCompleteAPI.java:170)

> ~[classes/:?]

> at

> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.ja

> va:1382)

> ~[?:1.8.0_201]

> at

> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java

> :580)

> ~[?:1.8.0_201]

> at

> com.productcore.analytics.api.AutoCompleteAPI.mapSolrResponse(AutoComp

> leteAPI.java:167)

> ~[classes/:?]

> at com.productcore.analytics.api.BaseAPI.execute(BaseAPI.java:48)

> [classes/:?]

> at

> com.productcore.analytics.controllers.DalController.getAutocomplete(Da

> lController.java:205)

> [classes/:?]

> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

> ~[?:1.8.0_201] at

> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j

> ava:62)

> ~[?:1.8.0_201]

> at

> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess

> orImpl.java:43)

> ~[?:1.8.0_201]

> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201] at

> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke

> (InvocableHandlerMethod.java:189)

> [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.method.support.InvocableHandlerMethod.invokeFo

> rRequest(InvocableHandlerMethod.java:138)

> [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.mvc.method.annotation.ServletInvocable

> HandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)

> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHa

> ndlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:892

> ) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHa

> ndlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797)

> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapte

> r.handle(AbstractHandlerMethodAdapter.java:87)

> [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.DispatcherServlet.doDispatch(Dispatche

> rServlet.java:1038) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.DispatcherServlet.doService(Dispatcher

> Servlet.java:942) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.FrameworkServlet.processRequest(Framew

> orkServlet.java:1005) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at

> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServl

> et.java:908) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at javax.servlet.http.HttpServlet.service(HttpServlet.java:660)

> [tomcat-embed-core-9.0.17.jar:9.0.17]

> at

> org.springframework.web.servlet.FrameworkServlet.service(FrameworkServ

> let.java:882) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]

> at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)

> [tomcat-embed-core-9.0.17.jar:9.0.17]

> at

> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli

> cationFilterChain.java:231)

> [tomcat-embed-core-9.0.17.jar:9.0.17]

> at

> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi

> lterChain.java:166)

> [tomcat-embed-core-9.0.17.jar:9.0.17]

> at

> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)

> [tomcat-embed-websocket-9.0.17.jar:9.0.17]

> at

> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli

> cationFilterChain.java:193)

> [tomcat-embed-core-9.0.17.jar:9.0.17]

> at

> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi

> lterChain.java:166)

> 

How to compute index size

2020-02-03 Thread Mohammed Farhan Ejaz
Hello All,

I want to size the RAM for my Solr cloud instance. The thumb rule is your
total RAM size should be = (JVM size + index size)

Now I have a simple question, How do I know my index size? A simple method,
perhaps from the Solr cloud admin UI or an API?

My assumption so far is the total segment info size is the same as the
index size.

Thanks & Regards
Farhan


SOLR Data Import Handler : A command is still running...

2020-02-03 Thread Doss
We are doing hourly data import to our index, per day one or two requests
are getting failed with the message "A command is still running...".

1. Does it mean, the data import not happened for the last hour?
2. If you look at the "Full Dump Started" time has an older data, in the
below log all most 13 days, why is that so?

userinfoindex start - Wed Jan 22 05:12:01 IST 2020 {
"responseHeader":{ "status":0, "QTime":0},   "initArgs":[
"defaults",[   "config","data-import.xml"]],
"command":"full-import",   "status":"busy",   "importResponse":"A command
is still running...",   "statusMessages":{ "Time
Elapsed":"298:1:59.986", "Total Requests made to DataSource":"1",
"Total Rows Fetched":"17426", "Total Documents Processed":"17425",
"Total Documents Skipped":"0", "Full Dump Started":"2020-01-09
19:10:02"}}

Thanks,
Doss.