Re: Search for term except within phrase

2020-07-06 Thread Emir Arnautović
Hi Stavros,
I didn’t check what’s supported in ComplexPhraseQueryParser but that is wrapper 
around span queries, so you should be able to do what you need: 
https://lucene.apache.org/solr/guide/7_6/other-parsers.html#complex-phrase-query-parser
 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 Jul 2020, at 03:11, Stavros Macrakis  wrote:
> 
> (Sorry for sending this with the wrong subject earlier.)
> 
> How can I search for a term except when it's part of certain phrases?
> 
> For example, I might want to find documents mentioning "pepper" where it is
> not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
> 
> It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
> OR "pepper sauce")] because that excludes all documents which mention
> "chili pepper" even if they also mention "black pepper" or the unmodified
> word "pepper". Maybe some way using synonyms?
> 
> Thanks!
> 
> -s



SSL + Solr 8.5.1 in cloud mode + Java 8

2020-07-06 Thread Natarajan, Rajeswari
Hi,

We are using Solr 8.5.1 in cloud mode  with Java 8. We are enabling  TLS  with 
http1  (as we get a warning java 8 + solr 8.5 SSL can’t be enabled) and we get 
below exception



2020-07-07 03:58:53.078 ERROR (main) [   ] o.a.s.c.SolrCore 
null:org.apache.solr.common.SolrException: Error instantiating 
shardHandlerFactory class [HttpShardHandlerFactory]: 
java.lang.UnsupportedOperationException: X509ExtendedKeyManager only supported 
on Server
  at 
org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:56)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:647)
  at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:263)
  at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:183)
  at 
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:134)
  at 
org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:751)
  at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
  at 
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
  at 
java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
  at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
  at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
  at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:360)
  at 
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1445)
  at 
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1409)
  at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:822)
  at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:275)
  at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:524)
  at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
  at 
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
  at 
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
  at 
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:513)
  at 
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:154)
  at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:173)
  at 
org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:447)
  at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:66)
  at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:784)
  at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:753)
  at org.eclipse.jetty.util.Scanner.scan(Scanner.java:641)
  at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:540)
  at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
  at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:146)
  at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
  at 
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:599)
  at 
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:249)
  at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
  at 
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
  at org.eclipse.jetty.server.Server.start(Server.java:407)
  at 
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117)
  at 
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:100)
  at org.eclipse.jetty.server.Server.doStart(Server.java:371)
  at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
  at 
org.eclipse.jetty.xml.XmlConfiguration.lambda$main$0(XmlConfiguration.java:1888)
  at java.security.AccessController.doPrivileged(Native Method)
  at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1837)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.eclipse.jetty.start.Main.invokeMain(Main.java:218)
  at org.eclipse.jetty.start.Main.start(Main.java:491)
  at org.eclipse.jetty.start.Main.main(Main.java:77)
Caused by: java.lang.RuntimeException: java.lang.UnsupportedOperationException: 
X509ExtendedKeyManager only supported on Server
  at 
org.apache.solr.

Re: Tokenizing managed synonyms

2020-07-06 Thread Koji Sekiguchi

I think the question makes sense as SynonymGraphFilterFactory accepts 
tokenizerFactory,
he asked the managed version of SynonymGraphFilter could accept it as well.

https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter

The answer seems to be NO.

Koji


On 2020/07/07 8:18, Erick Erickson wrote:

This question doesn’t really make sense. You don’t specify tokenizers on
filters, they’re specified at the _field_ level.

You can certainly define as many field(type)s as you want, each with a different
analysis chain and those chains can be made up of whatever you want to use, and
there are lots of choices.

If you are asking to do _additional_ tokenization on the output of a synonym
filter, no.

Perhaps if you defined the problem you’re trying to solve we could make some
suggestions.

Best,
Erick


On Jul 6, 2020, at 6:43 PM, Thomas Corthals  wrote:

Hi,

Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
some fields.

Best,

Thomas





Search for term except within phrase

2020-07-06 Thread Stavros Macrakis
(Sorry for sending this with the wrong subject earlier.)

How can I search for a term except when it's part of certain phrases?

For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".

It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they also mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?

Thanks!

 -s


Re: Tokenizing managed synonyms

2020-07-06 Thread Erick Erickson
This question doesn’t really make sense. You don’t specify tokenizers on
filters, they’re specified at the _field_ level.

You can certainly define as many field(type)s as you want, each with a different
analysis chain and those chains can be made up of whatever you want to use, and
there are lots of choices.

If you are asking to do _additional_ tokenization on the output of a synonym
filter, no.

Perhaps if you defined the problem you’re trying to solve we could make some
suggestions.

Best,
Erick

> On Jul 6, 2020, at 6:43 PM, Thomas Corthals  wrote:
> 
> Hi,
> 
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
> 
> Best,
> 
> Thomas



Re: Tokenizing managed synonyms

2020-07-06 Thread Erick Erickson
Please don’t hijack threads, start a new one when you switch topics.

> On Jul 6, 2020, at 6:52 PM, Stavros Macrakis  wrote:
> 
> How can I search for a term *except *when it's part of certain phrases?
> 
> For example, I might want to find documents mentioning "pepper" where it is
> not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".
> 
> It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
> OR "pepper sauce")] because that excludes all documents which mention
> "chili pepper" even if they *also* mention "black pepper" or the unmodified
> word "pepper". Maybe some way using synonyms?
> 
> Thanks!
> 
> -s
> 
> On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals 
> wrote:
> 
>> Hi,
>> 
>> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
>> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
>> some fields.
>> 
>> Best,
>> 
>> Thomas
>> 



Re: Tokenizing managed synonyms

2020-07-06 Thread Stavros Macrakis
How can I search for a term *except *when it's part of certain phrases?

For example, I might want to find documents mentioning "pepper" where it is
not part of the phrases "chili pepper", "hot pepper", or "pepper sauce".

It does not work to search for [pepper NOT ("chili pepper" OR "hot pepper"
OR "pepper sauce")] because that excludes all documents which mention
"chili pepper" even if they *also* mention "black pepper" or the unmodified
word "pepper". Maybe some way using synonyms?

Thanks!

 -s

On Mon, Jul 6, 2020 at 6:43 PM Thomas Corthals 
wrote:

> Hi,
>
> Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
> Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
> some fields.
>
> Best,
>
> Thomas
>


Tokenizing managed synonyms

2020-07-06 Thread Thomas Corthals
Hi,

Is it possible to specify a Tokenizer Factory on a Managed Synonym Graph
Filter? I would like to use a Standard Tokenizer or Keyword Tokenizer on
some fields.

Best,

Thomas


Re: Out of memory errors with Spatial indexing

2020-07-06 Thread David Smiley
I believe you are experiencing this bug: LUCENE-5056

The fix would probably be adjusting code in here
org.apache.lucene.spatial.query.SpatialArgs#calcDistanceFromErrPct

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jul 6, 2020 at 5:18 AM Sunil Varma  wrote:

> Hi David
> Thanks for your response. Yes, I noticed that all the data causing issue
> were at the poles. I tried the "RptWithGeometrySpatialField" field type
> definition but get a "Spatial context does not support S2 spatial
> index"error. Setting "spatialContextFactory="Geo3D" I still see the
> original OOM error .
>
> On Sat, 4 Jul 2020 at 05:49, David Smiley  wrote:
>
> > Hi Sunil,
> >
> > Your shape is at a pole, and I'm aware of a bug causing an exponential
> > explosion of needed grid squares when you have polygons super-close to
> the
> > pole.  Might you try S2PrefixTree instead?  I forget if this would fix it
> > or not by itself.  For indexing non-point data, I recommend
> > class="solr.RptWithGeometrySpatialField" which internally is based off a
> > combination of a course grid and storing the original vector geometry for
> > accurate verification:
> >  > class="solr.RptWithGeometrySpatialField"
> >   prefixTree="s2" />
> > The internally coarser grid will lessen the impact of that pole bug.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Jul 3, 2020 at 7:48 AM Sunil Varma 
> > wrote:
> >
> > > We are seeing OOM errors  when trying to index some spatial data. I
> > believe
> > > the data itself might not be valid but it shouldn't cause the Server to
> > > crash. We see this on both Solr 7.6 and Solr 8. Below is the input that
> > is
> > > causing the error.
> > >
> > > {
> > > "id": "bad_data_1",
> > > "spatialwkt_srpt": "LINESTRING (-126.86037681029909 -90.0
> > > 1.000150474662E30, 73.58164711175415 -90.0 1.000150474662E30,
> > > 74.52836551959528 -90.0 1.000150474662E30, 74.97006811540834 -90.0
> > > 1.000150474662E30)"
> > > }
> > >
> > > Above dynamic field is mapped to field type "location_rpt" (
> > > solr.SpatialRecursivePrefixTreeFieldType).
> > >
> > >   Any pointers to get around this issue would be highly appreciated.
> > >
> > > Thanks!
> > >
> >
>


Re: Null pointer exception in QueryComponent.MergeDds method

2020-07-06 Thread Mikhail Khludnev
Hi,
What's the version? What's uniqueKey? is it stored? what's fl param?

On Mon, Jul 6, 2020 at 5:12 PM Jae Joo  wrote:

> I am seeing the nullPointerException in the list below and I am
> looking for how to fix the exception.
>
> Thanks,
>
>
> NamedList sortFieldValues =
> (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
> if (sortFieldValues.size()==0 && // we bypass merging this response
> only if it's partial itself
> thisResponseIsPartial) { // but not the previous one!!
>   continue; //fsv timeout yields empty sort_vlaues
> }
>
>
>
> 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]]
> o.a.s.h.RequestHandlerBase java.lang.NullPointerException
> at
>
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914)
> at
>
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
> at
>
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
> at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Performance in solr7.5

2020-07-06 Thread Sankar Panda
Hi Eric,
Thanks for your mail. I am seeing that 91`% time cpu is idle.

if you see below some of the stats ,still i am not able to visualize where
it is wrong

FilenameTypeSizeUsed
 Priority
/mnt/resource/swapfile  file2097148 1119016 -2

 totalusedfree  shared  buff/cache   available
Mem:251  51  16   2 184
196
Swap: 1   1   0

 vmstat
procs ---memory-- ---swap-- -io -system--
--cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa st
 0  0 1119016 16969144 200036 19279326400   177   21700  2
 2 92  4  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.490.001.604.140.00   91.77

Can you please help me where i need to see

Thanks
Sankar Panda



On Mon, Jul 6, 2020 at 12:19 AM Erick Erickson 
wrote:

> Look at your I/O stats. My bet is that you’re swapping like crazy and your
> CPU is relatively idle.
>
> 2T of index on two machines is probably simply too much data on too little
> hardware.
>
> Consider stress testing your hardware gradually, see:
>
>
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> > On Jul 5, 2020, at 9:38 AM, Sankar Panda  wrote:
> >
> > Hi All,
> >
> > I am facing performance issue while searching. It took 10mins to get the
> > results.i have 2 shard and each shard having 2 replicas.80M documents and
> > index size in each shard is  2T.
> >
> > Any suggestions?
> >
> > Thanks
> > Sankar panda
>
>


Null pointer exception in QueryComponent.MergeDds method

2020-07-06 Thread Jae Joo
I am seeing the nullPointerException in the list below and I am
looking for how to fix the exception.

Thanks,


NamedList sortFieldValues =
(NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
if (sortFieldValues.size()==0 && // we bypass merging this response
only if it's partial itself
thisResponseIsPartial) { // but not the previous one!!
  continue; //fsv timeout yields empty sort_vlaues
}



2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]]
o.a.s.h.RequestHandlerBase java.lang.NullPointerException
at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914)
at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)


Replica goes into recovery mode in Solr 6.1.0

2020-07-06 Thread vishal patel
I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 
shards and each shard has 1 replica. We have 3 collection.
We do not use any cache and also disable in Solr config.xml. Search and Update 
requests are coming frequently in our live platform.

*Our commit configuration in solr.config are below

60
   2
   false


   ${solr.autoSoftCommit.maxTime:-1}


*We used Near Real Time Searching So we did below configuration in solr.in.cmd
set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100

*Our collections details are below:

Collection  Shard1  Shard1 Replica  Shard2  Shard2 Replica
Number of Documents Size(GB)Number of Documents Size(GB)
Number of Documents Size(GB)Number of Documents Size(GB)
collection1 26913364201 26913379202 26913380
198 26913379198
collection2 13934360310 13934367310 13934368
219 13934367219
collection3 351539689   73.5351540040   73.5351540136   
75.2351539722   75.2

*My server configurations are below:

Server1 Server2
CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 
Logical Processor(s)Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 
Mhz, 10 Core(s), 20 Logical Processor(s)
HardDisk(GB)3845 ( 3.84 TB) 3485 GB (3.48 TB)
Total memory(GB)320 320
Shard1 Allocated memory(GB) 55
Shard2 Replica Allocated memory(GB) 55
Shard2 Allocated memory(GB) 55
Shard1 Replica Allocated memory(GB) 55
Other Applications Allocated Memory(GB) 60  22
Other Number Of Applications11  7


Sometimes, any one replica goes into recovery mode. Why replica goes into 
recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If 
any one of them then what should we do in configuration?
Should we increase the shard for recovery issue?

Regards,
Vishal Patel



RE: Time-out errors while indexing (Solr 7.7.1)

2020-07-06 Thread Kommu, Vinodh K.
Thanks Eric & Toke for your response over this.





Just wanted to correct few things here about number of docs:



Total number of documents exists in the entire cluster (all collections) = 
6393876826 (6.3B)

Total number of documents exists on 2 bigger collections (3749389864 & 
1780147848) = 5529537712 (5.5B)

Total number of documents exists on remaining collections = 864339114 (864M)



So all collections docs altogether do not have 13B. If you see above numbers, 
the biggest collection in the cluster holds close to 3.7B docs and second 
biggest collection holds upto 1.7B docs whereas remaining 20 collections in the 
cluster holds 864M docs only which gives the total docs in the cluster is 6.3B 
docs



On hardware side, cluster sits on 6 solr VMs, each VMs has 170G total memory 
(with 2 solr instances running per VM), 16 vCPUs and each solr JVM runs with 
31G heap. Remaining memory is allocated to OS disk cache and other OS related 
operations. Vm.swapiness on each VM is set to 0 so swap memory will be never 
used. Each collection is created using rule based replica placement API with 6 
shards and replicas factor as 3.



One other observation with collections cores placement, as mentioned above we 
create collections using rule based replica placement i.e. rule to ensure no 
same shard’s replica should sit on same VM using following command.



curl -s -k user:password 
"https://localhost:22010/solr/admin/collections?action=CREATE&name=$SOLR_COLLECTION&numShards=${SHARDS_NO?}&replicationFactor=${REPLICATION_FACTOR?}&maxShardsPerNode=${MAX_SHARDS_PER_NODE?}&collection.configName=$SOLR_COLLECTION&rule=shard:*,replica:<2,host:*"



Variable values in above command:



SOLR_COLLECTION = collection name

SHARDS_NO = 6

REPLICATION_FACTOR = 3

MAX_SHARDS_PER_NODE = (a math logic will work based on number of solr VMs, 
number of nodes per VM and total number of replicas i.e total number of 
replicas / number of VMs. Here in this cluster the number would be 18/6 = 3 max 
shards per machine)





Ideally it is supposed to create 3 cores per VM for each collection based on 
rule based replica placement but from below snippet, there were 2, 3 & 4 cores 
for each collection are placed differently on each VMs. So apparently VM2 and 
VM6 have more cores than other VMs so I presume this could be one of the reason 
to see more IO operations than remaining 4 VMs.





That said, I believe solr does this replica placement considering other factors 
like free disk on each VM etc while creating a new collection correct? If so, 
is this replica placement across the VMs are fine? If not, what's needed to 
correct this? Can an additional core with 210G size can create more disk IO 
operations? If yes, can move the additional core from these VMs to other VM 
where the cores are less make any difference? (like ensuring each VM has only 
max of 3 shards)



Also we have been noticing significant surge in IO operations at storage level 
too. Wondering to understand if storage has IOPS limit could make solr crave 
for IO operations or other way around which is solr make more read write 
operations leading storage IOPS to reach its higher limit?



VM1:



176G  node1/solr/Collection2_shard5_replica_n30



176G  node2/solr/Collection2_shard2_replica_n24



176G  node2/solr/Collection2_shard3_replica_n2



177G  node1/solr/Collection2_shard6_replica_n10



208G  node1/solr/Collection1_shard5_replica_n18



208G  node2/solr/Collection1_shard2_replica_n1



1.1T  total



VM2:



176G  node2/solr/Collection2_shard4_replica_n16



176G  node2/solr/Collection2_shard6_replica_n34



177G  node1/solr/Collection2_shard5_replica_n6



207G  node2/solr/Collection1_shard6_replica_n10



208G  node1/solr/Collection1_shard1_replica_n32



208G  node2/solr/Collection1_shard5_replica_n30



210G  node1/solr/Collection1_shard3_replica_n14



1.4T  total



VM3:



175G  node2/solr/Collection2_shard2_replica_n12



177G  node1/solr/Collection2_shard1_replica_n20



208G  node1/solr/Collection1_shard1_replica_n8



208G  node2/solr/Collection1_shard2_replica_n12



209G  node1/solr/Collection1_shard4_replica_n28



976G  total



VM4:



176G  node1/solr/Collection2_shard4_replica_n28



177G  node1/solr/Collection2_shard1_replica_n8



207G  node2/solr/Collection1_shard6_replica_n22



208G  node1/solr/Collection1_shard5_replica_n6



210G  node1/solr/Collection1_shard3_replica_n26



975G  total



VM5:



176G  node2/solr/Collection2_shard3_replica_n14



177G  node1/solr/Collection2_shard5_replica_n18



177G  node2/solr/Collection2_shard1_replica_n32



208G  node1/solr/Collection1_shard2_replica_n24



210G  node1/solr/Collection1_shard3_replica_n2



210G  node2/solr/Collection1_shard4_replica_n4



1.2T  total



VM6:



177G  node1/solr/Collection2_shard3_replica_n26



177G  node1/solr/Collection2_shard4_replica_n4



177G  node2/solr/Collection2_shard2_replica_n1



178G  node2/solr/Collection2_shard6_replica_n22



207G  node2/sol

Re: Java - setting multi-valued fields

2020-07-06 Thread kumar gaurav
If this approach did not work , that means there is something wrong in Solr
schema .

Can you share a field schema ?


Regards
Kumar Gaurav


On Wed, Jun 24, 2020 at 2:29 PM Eivind Hodneland <
eivind.hodnel...@uptimeconsulting.no> wrote:

> Hi,
>
> Thanks for your input.
> However, this approach did not work either, it gave the same result as
> previously.
>
> Is there perhaps a different approach that could be used, other methods
> etc. ?
>
>
> Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate 7,
> NO-0165 Oslo, Norway
> Tel: +47 22 33 71 00 | Mob: +47 971 76 083 |
> eivind.hodnel...@uptimeconsulting.no  | www.uptimeconsulting.no
> --
> Search and Big Data solutions
> Software Development
> IT outsourcing services and consultancy
>
>
>
>
>
> -Original Message-
> From: kumar gaurav 
> Sent: onsdag 17. juni 2020 19:02
> To: solr-user@lucene.apache.org
> Subject: Re: Java - setting multi-valued fields
>
> HI
>
> Example:
>
> String[] values = new String[] {“value 1”, “value 2” };
>
> inputDoc.setField (multiFieldName, values);
>
>
> Can you try once to change the array to list ?
>
> List values = new ArrayList<>();
>
> values.add("value 1");
>
> values.add("value 2");
>
> inputDoc.setField (multiFieldName, values);
>
>
>
> regards
>
> Kumar Gaurav
>
>
>
>
>
>
>
> On Wed, Jun 17, 2020 at 8:33 PM Eivind Hodneland <
> eivind.hodnel...@uptimeconsulting.no> wrote:
>
> > Hi,
> >
> >
> >
> > My customer has a Solr index with a large amount of fields, many of
> > these are multivalued (type=”string”, multiValued=”true”).
> >
> >
> >
> > I am having problems with setting the values for these fields in my
> > Java update processors.
> >
> > Example:
> >
> > String[] values = new String[] {“value 1”, “value 2” };
> >
> > inputDoc.setField (multiFieldName, values);
> >
> >
> >
> > However, only “value 1” is present in the index after updating.
> >
> > What is the best / correct way to make this work?
> >
> >
> >
> >
> >
> >
> >
> > Uptime Consulting | Eivind Hodneland | Senior Consultant | Munchs gate
> > 7,
> > NO-0165 Oslo, Norway
> >
> > Tel: +47 22 33 71 00 | Mob: +47 971 76 083 |
> > eivind.hodnel...@uptimeconsulting.no  | www.uptimeconsulting.no
> >
> > --
> >
> > Search and Big Data solutions
> >
> > Software Development
> >
> > IT outsourcing services and consultancy
> >
> >
> >
> > [image: 4180EEB7]
> >
> >
> >
>


Re: Out of memory errors with Spatial indexing

2020-07-06 Thread Sunil Varma
Hi David
Thanks for your response. Yes, I noticed that all the data causing issue
were at the poles. I tried the "RptWithGeometrySpatialField" field type
definition but get a "Spatial context does not support S2 spatial
index"error. Setting "spatialContextFactory="Geo3D" I still see the
original OOM error .

On Sat, 4 Jul 2020 at 05:49, David Smiley  wrote:

> Hi Sunil,
>
> Your shape is at a pole, and I'm aware of a bug causing an exponential
> explosion of needed grid squares when you have polygons super-close to the
> pole.  Might you try S2PrefixTree instead?  I forget if this would fix it
> or not by itself.  For indexing non-point data, I recommend
> class="solr.RptWithGeometrySpatialField" which internally is based off a
> combination of a course grid and storing the original vector geometry for
> accurate verification:
>  class="solr.RptWithGeometrySpatialField"
>   prefixTree="s2" />
> The internally coarser grid will lessen the impact of that pole bug.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Jul 3, 2020 at 7:48 AM Sunil Varma 
> wrote:
>
> > We are seeing OOM errors  when trying to index some spatial data. I
> believe
> > the data itself might not be valid but it shouldn't cause the Server to
> > crash. We see this on both Solr 7.6 and Solr 8. Below is the input that
> is
> > causing the error.
> >
> > {
> > "id": "bad_data_1",
> > "spatialwkt_srpt": "LINESTRING (-126.86037681029909 -90.0
> > 1.000150474662E30, 73.58164711175415 -90.0 1.000150474662E30,
> > 74.52836551959528 -90.0 1.000150474662E30, 74.97006811540834 -90.0
> > 1.000150474662E30)"
> > }
> >
> > Above dynamic field is mapped to field type "location_rpt" (
> > solr.SpatialRecursivePrefixTreeFieldType).
> >
> >   Any pointers to get around this issue would be highly appreciated.
> >
> > Thanks!
> >
>


Re: Searching document content and mult-valued fields

2020-07-06 Thread Emir Arnautović
Hi Shaun,
If project content is relatively static, you could use nested documents 
 or 
you could plain with join query parser 
.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Jul 2020, at 18:19, Shaun Campbell  wrote:
> 
> Hi
> 
> Been using Solr on a project now for a couple of years and is working well.
> It's just a simple index of about 20 - 25 fields and 7,000 project records.
> 
> Now there's a requirement to be able to search on the content of documents
> (web pages, Word, pdf etc) related to those projects.  My initial thought
> was to just create a new index to store the Tika'd content and just search
> on that. However, the requirement is to somehow search through both the
> project records and the content records at the same time and list the main
> project with perhaps some info on the matching content data. I tried to
> explain that you may find matching main project records but no content, and
> vice versa.
> 
> My only solution to this search problem is to either concatenate all the
> document content into one field on the main project record, and add that to
> my dismax search, and use boosting etc or to use a multi-valued field to
> store the content of each project document.  I'm a bit reluctant to do this
> as the application is running well and I'm a bit nervous about a change to
> the schema and the indexing process.  I just wondered what you thought
> about adding a lot of content to an existing schema (single or multivalued
> field) that doesn't normally store big amounts of data.
> 
> Or does anyone know of any way, I can join two searches like this together
> and two separate indexes?
> 
> Thanks
> Shaun



Re: Corrupted .cfs file

2020-07-06 Thread nettadalet
Sorry to reply just now, but you were right - the problem was that the disk
got full.
Thank you very much!



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Float/Double multivalues fields

2020-07-06 Thread Vincenzo D'Amore
Thanks for sharing the post, finally I had the time to read it :)
It is really illuminating

On Fri, Jul 3, 2020 at 1:28 PM Toke Eskildsen  wrote:

> On Fri, 2020-07-03 at 10:00 +0200, Vincenzo D'Amore wrote:
> > Hi Erick, not sure I got.
> > Does this mean that the order of values within a multivalued field:
> > - docValues=true the result will be both re-ordered and deduplicated.
> > - docValues=false the result order is guaranteed to be maintained for
> > values in the insertion-order.
> >
> > Is this correct?
>
> Sorta, but it is not the complete picture. Things gets complicated when
> you mix it with stored, so that you have "stored=true docValues=true".
> There's an article about that at
>
>
> https://sease.io/2020/03/docvalues-vs-stored-fields-apache-solr-features-and-performance-smackdown.html
>
> BTW: The documentation should definitely mention that stored preserves
> order & duplicates. It is not obvious.
>
> - Toke Eskildsen, Royal Danish Library
>
>
>

-- 
Vincenzo D'Amore