Re: [CAUTION] Converting graph query to stream graph query

2019-10-15 Thread Natarajan, Rajeswari
I need to gather all the children of docid  1 . Root item has parent as null. 
(Sample data below)

Tried as below

nodes(graphtest,
  walk="1->parent",
  gather="docid",
  scatter="branches, leaves")
  
Response :
{
  "result-set": {
"docs": [
  {
"node": "1",
"collection": "graphtest,",
"field": "node",
"level": 0
  },
  {
"EOF": true,
"RESPONSE_TIME": 5
  }
]
  }
}

Query just gets the  root item and not it's children. Looks like I am missing 
something obvious . Any pointers , please.

As I said earlier the below graph query gets all the children of docid 1.

fq={!graph from=parent to=docid}docid:"1" 

Thanks,
Rajeswari



On 10/15/19, 12:04 PM, "Natarajan, Rajeswari"  
wrote:

Hi,


curl -XPOST -H 'Content-Type: application/json' 
'http://localhost:8983/solr/ggg/update' --data-binary '{
"add" : { "doc" : { "id" : "a", "docid" : "1", "name" : "Root document one" 
} },
"add" : { "doc" : { "id" : "b", "docid" : "2", "name" : "Root document two" 
} },
"add" : { "doc" : {  "id" : "c", "docid" : "3", "name" : "Root document 
three" } },
"add" : { "doc" : {  "id" : "d", "docid" : "11", "parent" : "1", "name" : 
"First level document 1, child one" } },
"add" : { "doc" : {  "id" : "e", "docid" : "12", "parent" : "1", "name" : 
"First level document 1, child two" } },
"add" : { "doc" : {  "id" : "f", "docid" : "13", "parent" : "1", "name" : 
"First level document 1, child three" } },
"add" : { "doc" : {  "id" : "g", "docid" : "21", "parent" : "2", "name" : 
"First level document 2, child one" } },
"add" : { "doc" : {  "id" : "h", "docid" : "22", "parent" : "2", "name" : 
"First level document 2, child two" } },
"add" : { "doc" : {  "id" : "j", "docid" : "121", "parent" : "12", "name" : 
"Second level document 12, child one" } },
"add" : { "doc" : {  "id" : "k", "docid" : "122", "parent" : "12", "name" : 
"Second level document 12, child two" } },
"add" : { "doc" : {  "id" : "l", "docid" : "131", "parent" : "13", "name" : 
"Second level document 13, child three" } },
"commit" : {}
}'


For the above data , the below query gets all the children of document with 
docid 1.


http://localhost:8983/solr/graphtest/select?q=*:*={!graph%20from=parent%20to=docid}docid:"1"


How can I convert this query into streaming graph query with nodes 
expression.

Thanks,
Rajeswari





Re: Solr-Cloud, join and collection collocation

2019-10-15 Thread Erick Erickson
You can certainly replicate the joined collection to every shard. It must fit 
in one shard and a replica of that shard must be co-located with every replica 
of the “to” collection.

Have you looked at streaming and “streaming expressions"? It does not have the 
same problem, although it does have its own limitations.

Best,
Erick

> On Oct 15, 2019, at 6:58 PM, Nicolas Paris  wrote:
> 
> Hi
> 
> I have several large collections that cannot fit in a standalone solr
> instance. They are split over multiple shards in solr-cloud mode.
> 
> Those collections are supposed to be joined to an other collection to
> retrieve subset. Because I am using distributed collections, I am not
> able to use the solr join feature.
> 
> For this reason, I denormalize the information by adding the joined
> collection within every collections. Naturally, when I want to update
> the joined collection, I have to update every one of the distributed
> collections.
> 
> In standalone mode, I only would have to update the joined collection.
> 
> I wonder if there is a way to overcome this limitation. For example, by
> replicating the joined collection to every shard - or other method I am
> ignoring.
> 
> Any thought ? 
> -- 
> nicolas



Solr-Cloud, join and collection collocation

2019-10-15 Thread Nicolas Paris
Hi

I have several large collections that cannot fit in a standalone solr
instance. They are split over multiple shards in solr-cloud mode.

Those collections are supposed to be joined to an other collection to
retrieve subset. Because I am using distributed collections, I am not
able to use the solr join feature.

For this reason, I denormalize the information by adding the joined
collection within every collections. Naturally, when I want to update
the joined collection, I have to update every one of the distributed
collections.

In standalone mode, I only would have to update the joined collection.

I wonder if there is a way to overcome this limitation. For example, by
replicating the joined collection to every shard - or other method I am
ignoring.

Any thought ? 
-- 
nicolas


Document Update performances Improvement

2019-10-15 Thread Nicolas Paris
Hi

I am looking for a way to faster the update of documents.

In my context, the update replaces one of the many existing indexed
fields, and keep the others as is.

Right now, I am building the whole document, and replacing the existing
one by id.

I am wondering if **atomic update feature** would faster the process.

>From one hand, using this feature would save network because only a
small subset of the document would be send from the client to the
server. 
On the other hand, the server will have to collect the values from the
disk and reindex them. In addition, this implies to store the values for
every fields (I am not storing every fields) and use more space.

Also I have read about the ConcurrentUpdateSolrServer class might be an
optimized way of updating documents.

I am using spark-solr library to deal with solr-cloud. If something
exist to faster the process, I would be glad to implement it in that
library.
Also, I have split the collection over multiple shard, and I admit this
faster the update process, but who knows ?

Thoughts ?

-- 
nicolas


Solr JVM performance challenge with Updates

2019-10-15 Thread Ganesh Sethuraman
Hi Solr Users,

We are using Solr 7.2.1 with 2 nodes (245GB RAM each) and 3 node ZK cluster
in production. We are using Java 8 with default GC settings (with
NewRatio=3) with 15GB heap, changed to 16 GB after the performance issue
mentioned below.

We have about 90 collections in this (~8  shards each), about 50 of them
are actively being used. About 3 collections are being actively updated
using SolrJ update query with soft commit of 30 secs. Other collection go
through update handler batch CSV update.

We had read timeout/slowness issue when Young Collection size usage peaked.
As you can see in the GC Graph below during the problem time. After that we
increased the overall heap size to 16GB (from 15 GB) and as you can see
that we did not see any read issue.

   1. I see our Heap is very large, we are seeing higher usage of young
   collection, is this due to solrj updates (concurrent one record update)?
   2. Should we change the NewRatio to 2 (so that young size increases
   more)? as we are seeing only 58% usage of old gen
   3. We are also seeing a behavior that if we restart the Solr in
   production, when updates are happening, one server starts up, but does not
   have all collections and shards up, and when we restart both the server up,
   it comes up fine, is this behavior also related to the Solrj updates?



Problem GC Report
https://gceasy.io/my-gc-report.jsp?p=YXJjaGl2ZWQvMjAxOS8xMC83Ly0tMDJfc29scl9nYy5sb2cuNi5jdXJyZW50LS0xNC00My01OA===WEB



No Problem GC Report
https://gceasy.io/my-gc-report.jsp?p=YXJjaGl2ZWQvMjAxOS8xMC85Ly0tMDJfX3NvbHJfZ2MubG9nLjIuY3VycmVudC0tMjAtNDQtMjY==WEB



Thanks ,

Ganesh


Re: Position search

2019-10-15 Thread Tim Casey
If this is about a normalized query, I would put the normalization text
into a specific field.  The reason for this is you may want to search the
overall text during any form of expansion phase of searching for data.
That is, maybe you want to know the context of up to the 120th word.  At
least you have both.
Also, you may want to note which normalized fields were truncated or were
simply too small. This would give some guidance as to the bias of the
normalization.  If 95% of the fields were not truncated, there is a chance
you are not doing good at normalizing because you have a set of
particularly short messages.  So I would expect a small set of side fields
remarking this.  This would allow you to carry the measures along with the
data.

tim

On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch 
wrote:

> Is the 100 words a hard boundary or a soft one?
>
> If it is a hard one (always 100 words), the easiest is probably copy
> field and in the (unstored) copy, trim off whatever you don't want to
> search. Possibly using regular expressions. Of course, "what's a word"
> is an important question here.
>
> Similarly, you could do that with Update Request Processors and
> clone/process field even before it hits the schema. Then you could
> store the extract for highlighting purposes.
>
> Regards,
>Alex.
>
> On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi 
> wrote:
> >
> > Hi,
> > What's the recommended way to search in Solr (assuming 8.2 is used) for
> specific terms/phrases/expressions while limiting the search from position
> perspective.
> > For example to search only in the first/last 100 words of the document ?
> >
> > Is there any built-in functionality for that ?
> >
> > Thanks in advance,
> > Adi
> >
> >
> > This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


Re: Position search

2019-10-15 Thread Alexandre Rafalovitch
Is the 100 words a hard boundary or a soft one?

If it is a hard one (always 100 words), the easiest is probably copy
field and in the (unstored) copy, trim off whatever you don't want to
search. Possibly using regular expressions. Of course, "what's a word"
is an important question here.

Similarly, you could do that with Update Request Processors and
clone/process field even before it hits the schema. Then you could
store the extract for highlighting purposes.

Regards,
   Alex.

On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi  wrote:
>
> Hi,
> What's the recommended way to search in Solr (assuming 8.2 is used) for 
> specific terms/phrases/expressions while limiting the search from position 
> perspective.
> For example to search only in the first/last 100 words of the document ?
>
> Is there any built-in functionality for that ?
>
> Thanks in advance,
> Adi
>
>
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.


RE: [EXTERNAL] Re: High cpu usage when adding documents to v7.7 solr cloud

2019-10-15 Thread Peter Lancaster
Hi Oleksandr,

Thanks very much for help. Yes that jira looks like exactly our problem.

I'll give that a go tomorrow.

Cheers,
Peter.

-Original Message-
From: Oleksandr Drapushko [mailto:drapus...@gmail.com]
Sent: 15 October 2019 19:52
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: High cpu usage when adding documents to v7.7 solr cloud

Hi Peter,

This bug was introduced in Solr 7.7.0. It is related to Java 8. And it was 
fixed in Solr 7.7.2.

Here are the ways to deal with it:
1. Upgrade to Solr 7.7.2
2. Patch your Solr 7.7
3. Use Java 9+

You can read more on this here:
https://issues.apache.org/jira/browse/SOLR-13349


Regards,
Oleksandr

On Tue, Oct 15, 2019 at 8:31 PM Peter Lancaster < 
peter.lancas...@findmypast.com> wrote:

> We have a solr cloud on v7.7.0 and we observe very high cpu usage when
> we're indexing new documents.
>
> The solr cloud in question has 50 shards and 2 replicas of each and
> we're using NRT. Obviously indexing takes some resources but we see
> pretty much 100% cpu usage when we're indexing documents and we
> haven't seen this before on other v6.3.0 solr clouds indexing under a
> similar load. In the
> v7.7.0 cloud we're using nested child documents but other than that
> the set-ups are quite similar.
>
> For us performance is more important than having updates reflected in
> real-time and we have configured commits as follows:
> 
> 10
> 
> 180
> 30
> false
> 
> 
> -1
> false
> 
> 
> ${solr.data.dir:}
> 
> 
>
> I can observe the problem on a test server with 3 shards without any
> replication but the same schema and solr config. If I add a simple
> document like {Id:TEST01} through the document page in the solr admin
> UI I immediately I see 100% cpu usage on one core of the test server
> and this lasts for 300 seconds - the same time as the maxTime for
> autoCommit. If I then change the maxTime to say 10 seconds, then the
> high cpu usage lasts for just 10 seconds. I can't see anything being
> logged that would indicate what solr is using the cpu for.
>
> Have we made some error in our configuration or is this behaviour
> expected in v7? It just seems really odd that it's using loads of cpu
> just to add a single document and that the high usage lasts for the
> maxTime on the autocommit. I'm guessing that whatever is making the
> single document addition so inefficient is also affecting the
> performance of our live solr cloud and contributing to the 100% cpu
> usage that we observe when adding new documents. Any help, advice or insight 
> would be appreciated.
>
> Cheers,
> Peter Lancaster | Developer
> 
> This message is confidential and may contain privileged information.
> You should not disclose its contents to any other person. If you are
> not the intended recipient, please notify the sender named above
> immediately. It is expressly declared that this e-mail does not
> constitute nor form part of a contract or unilateral obligation.
> Opinions, conclusions and other information in this message that do
> not relate to the official business of findmypast shall be understood as 
> neither given nor endorsed by it.
> 
>


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.



Converting graph query to stream graph query

2019-10-15 Thread Natarajan, Rajeswari
Hi,


curl -XPOST -H 'Content-Type: application/json' 
'http://localhost:8983/solr/ggg/update' --data-binary '{
"add" : { "doc" : { "id" : "a", "docid" : "1", "name" : "Root document one" } },
"add" : { "doc" : { "id" : "b", "docid" : "2", "name" : "Root document two" } },
"add" : { "doc" : {  "id" : "c", "docid" : "3", "name" : "Root document three" 
} },
"add" : { "doc" : {  "id" : "d", "docid" : "11", "parent" : "1", "name" : 
"First level document 1, child one" } },
"add" : { "doc" : {  "id" : "e", "docid" : "12", "parent" : "1", "name" : 
"First level document 1, child two" } },
"add" : { "doc" : {  "id" : "f", "docid" : "13", "parent" : "1", "name" : 
"First level document 1, child three" } },
"add" : { "doc" : {  "id" : "g", "docid" : "21", "parent" : "2", "name" : 
"First level document 2, child one" } },
"add" : { "doc" : {  "id" : "h", "docid" : "22", "parent" : "2", "name" : 
"First level document 2, child two" } },
"add" : { "doc" : {  "id" : "j", "docid" : "121", "parent" : "12", "name" : 
"Second level document 12, child one" } },
"add" : { "doc" : {  "id" : "k", "docid" : "122", "parent" : "12", "name" : 
"Second level document 12, child two" } },
"add" : { "doc" : {  "id" : "l", "docid" : "131", "parent" : "13", "name" : 
"Second level document 13, child three" } },
"commit" : {}
}'


For the above data , the below query gets all the children of document with 
docid 1.

http://localhost:8983/solr/graphtest/select?q=*:*={!graph%20from=parent%20to=docid}docid:"1"


How can I convert this query into streaming graph query with nodes expression.

Thanks,
Rajeswari



RE: Position search

2019-10-15 Thread Markus Jelsma
Hello Adi,

There is no SpanLastQuery or equivalent. But you could reverse the text and use 
SpanFirstQuery. Or, perhaps easier, add a bogus term to the end of the field 
and use PhraseQuery.

Regards,
Markus
 
-Original message-
> From:Kaminski, Adi 
> Sent: Tuesday 15th October 2019 10:57
> To: solr-user@lucene.apache.org
> Subject: RE: Position search
> 
> Hi Markus,
> Thanks for the guidance.
> 
> Is there any official Solr documentation for that ? Tried some googling, only 
> some Stackoverflow / Lucene posts are available.
> 
> Also, will that approach work for the other use case of searching from end of 
> documents ?
> For example if I need to perform some term search from the end, e.g. "book" 
> in the last 30 or 100 words.
> 
> Is there SpanLastQuery ?
> 
> Thanks,
> Adi
> 
> -Original Message-
> From: Markus Jelsma 
> Sent: Tuesday, October 15, 2019 11:04 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Position search
> 
> Hello Adi,
> 
> Try SpanFirstQuery. It limits the search to within the Nth term in the field.
> 
> Regards,
> Markus
> 
> 
> 
> -Original message-
> > From:Kaminski, Adi 
> > Sent: Tuesday 15th October 2019 8:25
> > To: solr-user@lucene.apache.org
> > Subject: Position search
> >
> > Hi,
> > What's the recommended way to search in Solr (assuming 8.2 is used) for 
> > specific terms/phrases/expressions while limiting the search from position 
> > perspective.
> > For example to search only in the first/last 100 words of the document ?
> >
> > Is there any built-in functionality for that ?
> >
> > Thanks in advance,
> > Adi
> >
> >
> > This electronic message may contain proprietary and confidential 
> > information of Verint Systems Inc., its affiliates and/or subsidiaries. The 
> > information is intended to be for the use of the individual(s) or 
> > entity(ies) named above. If you are not the intended recipient (or 
> > authorized to receive this e-mail for the intended recipient), you may not 
> > use, copy, disclose or distribute to anyone this message or any information 
> > contained in this message. If you have received this electronic message in 
> > error, please notify us by replying to this e-mail.
> >
> 
> 
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.
> 


Re: High cpu usage when adding documents to v7.7 solr cloud

2019-10-15 Thread Oleksandr Drapushko
Hi Peter,

This bug was introduced in Solr 7.7.0. It is related to Java 8. And it was
fixed in Solr 7.7.2.

Here are the ways to deal with it:
1. Upgrade to Solr 7.7.2
2. Patch your Solr 7.7
3. Use Java 9+

You can read more on this here:
https://issues.apache.org/jira/browse/SOLR-13349


Regards,
Oleksandr

On Tue, Oct 15, 2019 at 8:31 PM Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> We have a solr cloud on v7.7.0 and we observe very high cpu usage when
> we're indexing new documents.
>
> The solr cloud in question has 50 shards and 2 replicas of each and we're
> using NRT. Obviously indexing takes some resources but we see pretty much
> 100% cpu usage when we're indexing documents and we haven't seen this
> before on other v6.3.0 solr clouds indexing under a similar load. In the
> v7.7.0 cloud we're using nested child documents but other than that the
> set-ups are quite similar.
>
> For us performance is more important than having updates reflected in
> real-time and we have configured commits as follows:
> 
> 10
> 
> 180
> 30
> false
> 
> 
> -1
> false
> 
> 
> ${solr.data.dir:}
> 
> 
>
> I can observe the problem on a test server with 3 shards without any
> replication but the same schema and solr config. If I add a simple document
> like {Id:TEST01} through the document page in the solr admin UI I
> immediately I see 100% cpu usage on one core of the test server and this
> lasts for 300 seconds - the same time as the maxTime for autoCommit. If I
> then change the maxTime to say 10 seconds, then the high cpu usage lasts
> for just 10 seconds. I can't see anything being logged that would indicate
> what solr is using the cpu for.
>
> Have we made some error in our configuration or is this behaviour expected
> in v7? It just seems really odd that it's using loads of cpu just to add a
> single document and that the high usage lasts for the maxTime on the
> autocommit. I'm guessing that whatever is making the single document
> addition so inefficient is also affecting the performance of our live solr
> cloud and contributing to the 100% cpu usage that we observe when adding
> new documents. Any help, advice or insight would be appreciated.
>
> Cheers,
> Peter Lancaster | Developer
> 
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> 
>


Re: solr 8.1.1 many time slower returning query results than solr 4.10.4 or solr 6.5.1

2019-10-15 Thread Russell Bahr
Hi Shawn,
I included the wrong file for solr4 and did not realize until you pointed
out the heap size.  The correct file that is setting the Java environment
is "Solr 4 tomcat setenv" I have uploaded that to the shared folder along
with the requested screenshots "Solr 4 top screenshot","Solr 6 top
screenshot","Solr 8 top screenshot".

I have also uploaded the solr.log, solr_gc.log, and solr_slow_requests.log
from a 2 hour period of time where I was running the email load test
against the solr8 implementation in which the queued tasks are taking too
long to complete.

solr_gc.log, solr_gc.log.1, solr_gc.log.2, solr.log, solr.log.10,
solr.log.6, solr.log.7, solr.log.8, solr.log.9, solr_slow_requests.log

Let me know if there is any other information that I can provide that may
help to work through this.


*Manzama*a MODERN GOVERNANCE company

Russell Bahr
Lead Infrastructure Engineer

USA & CAN Office: +1 (541) 306 3271
USA & CAN Support: +1 (541) 706 9393
UK Office & Support: +44 (0)203 282 1633
AUS Office & Support: +61 (0) 2 8417 2339

543 NW York Drive, Suite 100, Bend, OR 97703

LinkedIn  | Twitter
 | Facebook
 | YouTube



On Tue, Oct 15, 2019 at 2:28 AM Shawn Heisey  wrote:

> On 10/14/2019 1:36 PM, Russell Bahr wrote:
> > Backend replacement of solr4 and hopefully Frontend replacement as well.
> > solr-spec 8.1.1
> > lucene-spec 8.1.1
> > Runtime Oracle Corporation OpenJDK 64-Bit Server VM 12 12+33
> > 1 collection 6 shards 5 replicas per shard 17,919,889 current documents
> (35 days worth of documents) - indexing new documents regularly throughout
> the day, deleting aged out documents nightly.
>
> Java 12 is not recommended.  It is one of the "new feature" releases
> that only gets 6 months of support.  We would recommend Java 8 or Java
> 11.  These are the versions with long term support.  Probably a good
> thing to be using OpenJDK, as the official Oracle Java now requires
> paying for a license.
>
> Solr 8 ships with settings that enable the G1GC collector instead of
> CMS, because CMS is deprecated and will disappear in a future Java
> version.  We have seen problems with this when the system is
> misconfigured as far as heap size.  When the system is properly sized,
> G1 tends to do better than CMS, but when the heap is too large or too
> small, has a tendency to amplify garbage collection problems in comparison.
>
> Looking at your solr.in.sh files for each version ... the Solr 4 install
> appears to be setting the heap to 512 megabytes.  This is definitely not
> enough for millions of documents, and if this is what the heap size is
> actually set to, would almost certainly run into memory errors
> frequently and have absolutely terrible performance.  But you are saying
> that it works well, so I don't think the heap is actually set to 512
> megabytes.  Maybe the bin/solr script has been modified directly to set
> the memory size instead of setting it in solr.in.sh where it should be
> set.
>
> Solr 6 has a heap size of just under 27 gigabytes.  Solr 8 has a heap
> size of just under 8 gigabytes.  With millions of documents, it is
> likely that 8GB of heap is not quite big enough.
>
> For each of your installations (Solr 4, Solr 6, and Solr 8) can you
> provide the screenshot described at this wiki page?
>
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue
>
> It would also be helpful to see the GC logs from Solr 8.  We would need
> at least one GC log, making sure that they cover at least a few hours,
> including the timeframe when the slow indexing and slow queries were
> observed.
>
> Thanks,
> Shawn
>


High cpu usage when adding documents to v7.7 solr cloud

2019-10-15 Thread Peter Lancaster
We have a solr cloud on v7.7.0 and we observe very high cpu usage when we're 
indexing new documents.

The solr cloud in question has 50 shards and 2 replicas of each and we're using 
NRT. Obviously indexing takes some resources but we see pretty much 100% cpu 
usage when we're indexing documents and we haven't seen this before on other 
v6.3.0 solr clouds indexing under a similar load. In the v7.7.0 cloud we're 
using nested child documents but other than that the set-ups are quite similar.

For us performance is more important than having updates reflected in real-time 
and we have configured commits as follows:

10

180
30
false


-1
false


${solr.data.dir:}



I can observe the problem on a test server with 3 shards without any 
replication but the same schema and solr config. If I add a simple document 
like {Id:TEST01} through the document page in the solr admin UI I immediately I 
see 100% cpu usage on one core of the test server and this lasts for 300 
seconds - the same time as the maxTime for autoCommit. If I then change the 
maxTime to say 10 seconds, then the high cpu usage lasts for just 10 seconds. I 
can't see anything being logged that would indicate what solr is using the cpu 
for.

Have we made some error in our configuration or is this behaviour expected in 
v7? It just seems really odd that it's using loads of cpu just to add a single 
document and that the high usage lasts for the maxTime on the autocommit. I'm 
guessing that whatever is making the single document addition so inefficient is 
also affecting the performance of our live solr cloud and contributing to the 
100% cpu usage that we observe when adding new documents. Any help, advice or 
insight would be appreciated.

Cheers,
Peter Lancaster | Developer

This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.



Atomic Updates with PreAnalyzedField

2019-10-15 Thread Oleksandr Drapushko
Hello Community,

I've discovered data loss bug and couldn't find any mention of it. Please
confirm this bug haven't been reported yet.


Description:

If you try to update non pre-analyzed fields in a document using atomic
updates, data in pre-analyzed fields (if there is any) will be lost. The
bug was discovered in Solr 8.2 and 7.7.2.


Steps to reproduce:

1. Index this document into techproducts
{
  "id": "a",
  "n_s": "s1",
  "pre":
"{\"v\":\"1\",\"str\":\"Alaska\",\"tokens\":[{\"t\":\"alaska\",\"s\":0,\"e\":6,\"i\":1}]}"
}

2. Query the document
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
  {
"id":"a",
"n_s":"s1",
"pre":"Alaska",
"_version_":1647475215142223872}]
  }}

3. Update using atomic syntax
{
  "add": {
"doc": {
  "id": "a",
  "n_s": {"set": "s2"}
}
  }
}

4. Observe the warning in solr log
UI:
WARN  x:techproducts_shard2_replica_n6  PreAnalyzedField  Error parsing
pre-analyzed field 'pre'

solr.log:
WARN  (qtp1384454980-23) [c:techproducts s:shard2 r:core_node8
x:techproducts_shard2_replica_n6] o.a.s.s.PreAnalyzedField Error parsing
pre-analyzed field 'pre' => java.io.IOException: Invalid JSON type
java.lang.String, expected Map
at
org.apache.solr.schema.JsonPreAnalyzedParser.parse(JsonPreAnalyzedParser.java:86)

5. Query the document again
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
  {
"id":"a",
"n_s":"s2",
"_version_":1647475461695995904}]
  }}

Result: There is no 'pre' field in the document anymore.


My thoughts on it:

1. Data loss can be prevented if the warning will be replaced with error
(re-throwing exception). Atomic updates for such documents still won't
work, but updates will be explicitly rejected.

2. Solr tries to read the document from index, merge it with input document
and re-index the document, but when it reads indexed pre-analyzed fields
the format is different, so Solr cannot parse and re-index those fields
properly.


Thank you,
Oleksandr


Re: Metrics API - Documentation

2019-10-15 Thread Andrzej Białecki
We keep all essential user documentation (and some dev docs) in the Ref Guide.

The source for the Ref Guide is checked-in under solr/solr-ref-guide, it uses a 
simple ASCII markup so adding some content should be easy. You should follow 
the same workflow as with the code (create a JIRA, and then either add a patch 
or create a PR).

> On 15 Oct 2019, at 17:33, Richard Goodman  wrote:
> 
> Many thanks both for your responses, they've been helpful.
> 
> @Andrzej - Sorry I wasn't clear on the "A latency of 1mil" as I wasn't
> aware the image wouldn't come through. But following your bullet points
> helped me present a better unit for measurement in the axis.
> 
> In regards to contributing, would absolutely love to help there, just not
> sure what the correct direction is? I wasn't sure if the web page source
> code / contributions are in the apache-lucene repository?
> 
> Thanks,
> 
> 
> On Tue, 8 Oct 2019 at 11:04, Andrzej Białecki  wrote:
> 
>> Hi,
>> 
>> Starting with Solr 7.0 all JMX metrics are actually internally driven by
>> the metrics API - JMX (or Prometheus) is just a way of exposing them.
>> 
>> I agree that we need more documentation on metrics - contributions are
>> welcome :)
>> 
>> Regarding your specific examples (btw. our mailing lists aggressively
>> strip all attachments - your graphs didn’t make it):
>> 
>> * time units in time-based counters are in nanoseconds. This is just a
>> unit of value, not necessarily precision. In this specific example
>> `ADMIN./admin/collections.totalTime` (and similarly named metrics for all
>> other request handlers) represents the total elapsed time spent processing
>> requests.
>> * time-based histograms are expressed in milliseconds, where it is
>> indicated by the “_ms” suffix.
>> * 1-, 5- and 15-min rates represent an exponentially weighted moving
>> average over that time window, expressed in events/second.
>> * handlerStart is initialised with System.currentTimeMillis() when this
>> instance of request handler is first created.
>> * details on GC, memory buffer pools, and similar JVM metrics are
>> documented in JDK documentation on Management Beans. For example:
>> 
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
>> <
>> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
>>> 
>> * "A latency of 1mil” - no idea what that is, I don’t think Solr API uses
>> this abbreviation anywhere.
>> 
>> Hope this helps.
>> 
>> —
>> 
>> Andrzej Białecki
>> 
>>> On 7 Oct 2019, at 13:41, Emir Arnautović 
>> wrote:
>>> 
>>> Hi Richard,
>>> We do not use API to collect metrics but JMX, but I believe that those
>> are the same (did not verify it in code). You can see how we handled those
>> metrics into reports/charts or even use our agent to send data to
>> Prometheus:
>> https://github.com/sematext/sematext-agent-integrations/tree/master/solr <
>> https://github.com/sematext/sematext-agent-integrations/tree/master/solr>
>>> 
>>> You can also see some links to Solr metric related blog posts in this
>> repo. If you find out that managing your own monitoring stack is
>> overwhelming, you can try our Solr integration.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
 On 7 Oct 2019, at 12:40, Richard Goodman 
>> wrote:
 
 Hi there,
 
 I'm currently working on using the prometheus exporter to provide some
>> detailed insights for our Solr Cloud clusters.
 
 Using the provided template killed our prometheus server, as well as
>> the exporter due to the size of our clusters (each cluster is around 96
>> nodes, ~300 collections with 3way replication and 16 shards), so you can
>> imagine the amount of data that comes through /admin/metrics and not
>> filtering it down first.
 
 I've began working on writing my own template to reduce the amount of
>> data being requested and it's working fine, and I'm starting to build some
>> nice graphs in Grafana.
 
 The only difficulty I'm having with this, is I'm struggling to find
>> decent documentation on the metrics themselves. I was using the resources
>> metrics reporting - metrics-api <
>> https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api>
>> and monitoring solr with prometheus and grafana <
>> https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html>
>> but there is a lack of information on most metrics.
 
 For example:
 "ADMIN./admin/collections.totalTime":6715327903,
 I understand this is a counter, however, I'm not sure what unit this
>> would be represented when displaying it, for example:
 
 
 
 A latency of 1mil, not sure if this means milliseconds, million, etc.,
 Another example would be the GC metrics:
 

Re: Unable to log into Jira

2019-10-15 Thread Erick Erickson
I was once “Chris Erickson”, but infra straightened it out.


> On Oct 15, 2019, at 11:59 AM, Christine Poerschke (BLOOMBERG/ LONDON) 
>  wrote:
> 
> Hi Richard,
> 
> Sorry to hear you're experiencing log-in difficulties. I've opened 
> https://issues.apache.org/jira/browse/INFRA-19280 for this, hopefully it can 
> be read without logging in.
> 
> Regards,
> 
> Christine
> 
> From: solr-user@lucene.apache.org At: 10/15/19 16:31:36To:  
> solr-user@lucene.apache.org
> Subject: Unable to log into Jira
> 
> Hey,
> 
> Sorry if this is the wrong group, I tried to email us...@infra.apache.org a
> few weeks ago but haven't heard anything.
> 
> I am unable to log into my account, with it saying my password is
> incorrect. But what is more odd is my name on the account has changed from
> Richard Goodman to Alex Goodman.
> 
> I can send a forgot username which comes through to my registered email,
> which is this one. However, if I do a forgot password, the email never
> shows up 
> 
> Does anyone know which contact to use in order to help me sort this issue
> out?
> 
> Thanks,
> 
> Richard Goodman
> 
> 



Re:Unable to log into Jira

2019-10-15 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Richard,

Sorry to hear you're experiencing log-in difficulties. I've opened 
https://issues.apache.org/jira/browse/INFRA-19280 for this, hopefully it can be 
read without logging in.

Regards,

Christine

From: solr-user@lucene.apache.org At: 10/15/19 16:31:36To:  
solr-user@lucene.apache.org
Subject: Unable to log into Jira

Hey,

Sorry if this is the wrong group, I tried to email us...@infra.apache.org a
few weeks ago but haven't heard anything.

I am unable to log into my account, with it saying my password is
incorrect. But what is more odd is my name on the account has changed from
Richard Goodman to Alex Goodman.

I can send a forgot username which comes through to my registered email,
which is this one. However, if I do a forgot password, the email never
shows up 

Does anyone know which contact to use in order to help me sort this issue
out?

Thanks,

Richard Goodman




Re: Metrics API - Documentation

2019-10-15 Thread Richard Goodman
Many thanks both for your responses, they've been helpful.

@Andrzej - Sorry I wasn't clear on the "A latency of 1mil" as I wasn't
aware the image wouldn't come through. But following your bullet points
helped me present a better unit for measurement in the axis.

In regards to contributing, would absolutely love to help there, just not
sure what the correct direction is? I wasn't sure if the web page source
code / contributions are in the apache-lucene repository?

Thanks,


On Tue, 8 Oct 2019 at 11:04, Andrzej Białecki  wrote:

> Hi,
>
> Starting with Solr 7.0 all JMX metrics are actually internally driven by
> the metrics API - JMX (or Prometheus) is just a way of exposing them.
>
> I agree that we need more documentation on metrics - contributions are
> welcome :)
>
> Regarding your specific examples (btw. our mailing lists aggressively
> strip all attachments - your graphs didn’t make it):
>
> * time units in time-based counters are in nanoseconds. This is just a
> unit of value, not necessarily precision. In this specific example
> `ADMIN./admin/collections.totalTime` (and similarly named metrics for all
> other request handlers) represents the total elapsed time spent processing
> requests.
> * time-based histograms are expressed in milliseconds, where it is
> indicated by the “_ms” suffix.
> * 1-, 5- and 15-min rates represent an exponentially weighted moving
> average over that time window, expressed in events/second.
> * handlerStart is initialised with System.currentTimeMillis() when this
> instance of request handler is first created.
> * details on GC, memory buffer pools, and similar JVM metrics are
> documented in JDK documentation on Management Beans. For example:
>
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
> <
> https://docs.oracle.com/javase/7/docs/api/java/lang/management/GarbageCollectorMXBean.html?is-external=true
> >
> * "A latency of 1mil” - no idea what that is, I don’t think Solr API uses
> this abbreviation anywhere.
>
> Hope this helps.
>
> —
>
> Andrzej Białecki
>
> > On 7 Oct 2019, at 13:41, Emir Arnautović 
> wrote:
> >
> > Hi Richard,
> > We do not use API to collect metrics but JMX, but I believe that those
> are the same (did not verify it in code). You can see how we handled those
> metrics into reports/charts or even use our agent to send data to
> Prometheus:
> https://github.com/sematext/sematext-agent-integrations/tree/master/solr <
> https://github.com/sematext/sematext-agent-integrations/tree/master/solr>
> >
> > You can also see some links to Solr metric related blog posts in this
> repo. If you find out that managing your own monitoring stack is
> overwhelming, you can try our Solr integration.
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 7 Oct 2019, at 12:40, Richard Goodman 
> wrote:
> >>
> >> Hi there,
> >>
> >> I'm currently working on using the prometheus exporter to provide some
> detailed insights for our Solr Cloud clusters.
> >>
> >> Using the provided template killed our prometheus server, as well as
> the exporter due to the size of our clusters (each cluster is around 96
> nodes, ~300 collections with 3way replication and 16 shards), so you can
> imagine the amount of data that comes through /admin/metrics and not
> filtering it down first.
> >>
> >> I've began working on writing my own template to reduce the amount of
> data being requested and it's working fine, and I'm starting to build some
> nice graphs in Grafana.
> >>
> >> The only difficulty I'm having with this, is I'm struggling to find
> decent documentation on the metrics themselves. I was using the resources
> metrics reporting - metrics-api <
> https://lucene.apache.org/solr/guide/7_7/metrics-reporting.html#metrics-api>
> and monitoring solr with prometheus and grafana <
> https://lucene.apache.org/solr/guide/7_7/monitoring-solr-with-prometheus-and-grafana.html>
> but there is a lack of information on most metrics.
> >>
> >> For example:
> >> "ADMIN./admin/collections.totalTime":6715327903,
> >> I understand this is a counter, however, I'm not sure what unit this
> would be represented when displaying it, for example:
> >>
> >>
> >>
> >> A latency of 1mil, not sure if this means milliseconds, million, etc.,
> >> Another example would be the GC metrics:
> >>  "gc.ConcurrentMarkSweep.count":7,
> >>  "gc.ConcurrentMarkSweep.time":1247,
> >>  "gc.ParNew.count":16759,
> >>  "gc.ParNew.time":884173,
> >> Which when displayed, doesn't give the clearest insight as to what the
> unit is:
> >>
> >>
> >> If anyone has any advice / guidance, that would be greatly appreciated.
> If there isn't documentation for the API, then this would also be something
> I'll look into help contributing with too.
> >>
> >> Thanks,
> >> --
> >> Richard Goodman
> >
>
>

-- 

Richard Goodman|  

Unable to log into Jira

2019-10-15 Thread Richard Goodman
Hey,

Sorry if this is the wrong group, I tried to email us...@infra.apache.org a
few weeks ago but haven't heard anything.

I am unable to log into my account, with it saying my password is
incorrect. But what is more odd is my name on the account has changed from
Richard Goodman to Alex Goodman.

I can send a forgot username which comes through to my registered email,
which is this one. However, if I do a forgot password, the email never
shows up 

Does anyone know which contact to use in order to help me sort this issue
out?

Thanks,

Richard Goodman


Re: Re: Query on autoGeneratePhraseQueries

2019-10-15 Thread Rohan Kasat
Also check ,
pf , pf2 , pf3
ps , ps2, ps3 parameters for phrase searches.

Regards,
Rohan K

On Tue, Oct 15, 2019 at 6:41 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com  wrote:

> I'm not sure how your config file is setup, but I know that the way we do
> multi-token synonyms is to have the sow (split on whitespace) parameter set
> to False while using the edismax parser. I'm not sure if this would work
> with PhraseQueries , but it might be worth a try!
>
> In our config file we do something like this:
>
> 
> 
> edismax
> 1.0
> explicit
> 100
> content_en
> w3json_en
> false
> 
>  
>
> You can read a bit about the parameter here:
> https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
>
> Best,
> Audrey
>
> --
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>
>
> On 10/15/19, 5:50 AM, "Shubham Goswami" 
> wrote:
>
> Hi kshitij
>
> Thanks for the reply!
> I tried to debug it and found that raw query(black company) has parsed
> as
> two separate queries
> black and company and returning the results based on black query
> instead of
> this it should have
> got parsed as a single phrase query like("black company") because i am
> using
> autoGeneratedPhraseQuery.
> Do you have any idea about this please correct me if i am wrong.
>
> Thanks
> Shubham
>
> On Tue, Oct 15, 2019 at 1:58 PM kshitij tyagi <
> kshitij.shopcl...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Try debugging your solr query and understand how it gets parsed. Try
> using
> > "debug=true" for the same
> >
> > On Tue, Oct 15, 2019 at 12:58 PM Shubham Goswami <
> > shubham.gosw...@hotwax.co>
> > wrote:
> >
> > > *Hi all,*
> > >
> > > I am a beginner to solr framework and I am trying to implement
> > > *autoGeneratePhraseQueries* property in a fieldtype of
> > type=text_general, i
> > > kept the property value as true and restarted the solr server but
> still
> > it
> > > is not taking my two words query like(Black company) as a phrase
> without
> > > double quotes and returning the results only for Black.
> > >
> > >  Can somebody please help me to understand what am i
> missing ?
> > > Following is my Schema.xml file code and i am using solr 7.5
> version.
> > >  > > positionIncrementGap="100" multiValued="true"
> > > autoGeneratePhraseQueries="true">
> > > 
> > >   =
> > >> > ignoreCase="true"/>
> > >   
> > > 
> > > 
> > >   
> > >> > ignoreCase="true"/>
> > >> > ignoreCase="true" synonyms="synonyms.txt"/>
> > >   
> > > 
> > >   
> > >
> > >
> > > --
> > > *Thanks & Regards*
> > > Shubham Goswami
> > > Enterprise Software Engineer
> > > *HotWax Systems*
> > > *Enterprise open source experts*
> > > cell: +91-7803886288
> > > office: 0731-409-3684
> > >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.hotwaxsystems.com=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=Zi9beGF58BzJUNUdCkeW0pwliKwq9vdTSh0V_lR0734=FhSkJBcmYw_bfHgq1enzuYQeOZwKHzlP9h4VwTZSL5E=
> > >
> >
>
>
> --
> *Thanks & Regards*
> Shubham Goswami
> Enterprise Software Engineer
> *HotWax Systems*
> *Enterprise open source experts*
> cell: +91-7803886288
> office: 0731-409-3684
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.hotwaxsystems.com=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=Zi9beGF58BzJUNUdCkeW0pwliKwq9vdTSh0V_lR0734=FhSkJBcmYw_bfHgq1enzuYQeOZwKHzlP9h4VwTZSL5E=
>
>
> --

*Regards,Rohan Kasat*


RE: Facet Advice

2019-10-15 Thread Moyer, Brett
Hello Shawn, thanks for reply. The results that come back are correct, but are 
we implementing the query correctly to filter by a selected facet? When I say 
wrong, it's more about the design/use of Facets in the Query. Is it proper to 
do fq=Tags:Retirement? Is using a Multivalued field correct for Facets? Why do 
you say the above are not Facets?

Here is an excerpt from our JSON:

"facet_counts": {
"facet_queries": {},
"facet_fields": {
"Tags": [
"Retirement",
1260,
"Locations & People",
1149,
"Advice and Tools",
1015,
"Careers",
156,
"Annuities",
101,
"Performance",

Brett Moyer
Manager, Sr. Technical Lead | TFS Technology
  Public Production Support
  Digital Search & Discovery

8625 Andrew Carnegie Blvd | 4th floor
Charlotte, NC 28263
Tel: 704.988.4508
Fax: 704.988.4907
bmo...@tiaa.org

-Original Message-
From: Shawn Heisey  
Sent: Tuesday, October 15, 2019 5:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Facet Advice

On 10/14/2019 3:25 PM, Moyer, Brett wrote:
> Hello, looking for some advice, I have the suspicion we are doing Facets all 
> wrong. We host financial information and recently "tagged" our pages with 
> appropriate Facets. We have built a Flat design. Are we going at it the wrong 
> way?
> 
> In Solr we have a "Tags" field, based on some magic we tagged each page on 
> the site with a number of the below example Facets. We have the UI team 
> sending queries in the form of 1) q=get a loan=Tags:Retirement, 2) q=get a 
> loan=Tags:Retirement AND Tags:Move Money. This restricts the resultset 
> hopefully guiding the user to their desired result. Something about it 
> doesn’t seem right. Is this right with a flat single level pattern like what 
> we have? Should each doc have multiple Fields to map to different values? Any 
> help is appreciated. Thanks!
> 
> Example Facets:
> Brokerage
> Retirement
> Open an Account
> Move Money
> Estate Planning

The queries you mentioned above do not have facets, only the q and fq 
parameters.  You also have not mentioned what in the results is wrong to you.

If you restrict the query to only a certain value in the tag field, then facets 
will only count documents that match the full query -- users will not be able 
to see the count of documents that do NOT match the query, unless you use 
tagging/excluding with your filters.  This is part of the functionality called 
multi-select faceting.

http://yonik.com/multi-select-faceting/

Because your message doesn't say what in the results is wrong, we can only 
guess about how to help you.  I do not know if the above information will be 
helpful or not.

Thanks,
Shawn
*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA
*


Problems with Wildcard Queries / Own Filter

2019-10-15 Thread Björn Keil
Hello,

I am having a bit of a problem with Wildcard queries and I don't know how
to pin it down yet. I have a suspect, but I kind find an error in it, one
of the filters in the respective search field.

The problem is that when I do a wildcard query:
title:todesmä*
it does return a result, but it also returns results that would match
title:todesma* It is not supposed to do that because, due to the filter,
it's supposed to be equivalent to title:todesmae*

The reals problem is that if I search for title:todesmär* it does not find
anything at all anymore. There are titles on the index that would match
"todesmärsche" and "todesmärchen".

I have looked the Filter in a debugger, but I could not find anything wrong
with it. It's supposed to replace the "ä" with "ae", which it does, calls
termAtt.resizeBuffer() before it does and termAtt.length() afterwards. The
result seems perfectly alright. What it does not change is the endOffset
attribute of the CharTermAttribute object, that's probably because it's
counting Bytes, not characters; I replaced a single two-byte char with a
two one-byte chars, consequently the endOffset is the same.

Could anybody tell me whether there is anything wrong with the filter in
the attachment?
package de.example.analysis;

import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

/**
 * This TokenFilter replaces German umlauts and the character ß with a normalized form in ASCII characters.
 * 
 * ü => ue
 * ß => ss
 * etc.
 * 
 * This enables a sort order according DIN 5007, variant 2, the so called "phone book" sort order.
 * 
 * @see org.apache.lucene.analysis.TokenStream
 *
 */
public class GermanUmaultFilter extends TokenFilter {
	
	private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);

	/**
	 * @see org.apache.lucene.analysis.TokenFilter#TokenFilter()
	 * @param input TokenStream with the tokens to filter
	 */
	public GermanUmaultFilter(TokenStream input) {
		super(input);
	}

	/**
	 * Performs the actual filtering whenever upon request by the consumer.
	 * 
	 * @see org.apache.lucene.analysis.TokenStream#incrementToken()
	 * @return true on success, false on failure
	 */
	public boolean incrementToken() throws IOException {
		if (input.incrementToken()) {
			int countReplacements = 0;
			char[] origBuffer = termAtt.buffer();
			int origLength = termAtt.length();
			// Figure out how many replacements we need to get the size of the new buffer
			for (int i = 0; i < origLength; i++) {
if (origBuffer[i] == 'ü'
	|| origBuffer[i] == 'ä'
	|| origBuffer[i] == 'ö'
	|| origBuffer[i] == 'ß'
	|| origBuffer[i] == 'Ä'
	|| origBuffer[i] == 'Ö'
	|| origBuffer[i] == 'Ü'
) {
	countReplacements++;
}
			}
			
			// If there is a replacement create a new buffer of the appropriate length...
			if (countReplacements != 0) {
int newLength = origLength + countReplacements;
char[] target = new char[newLength];
int j = 0;
// ... perform the replacement ...
for (int i = 0; i < origLength; i++) {
	switch (origBuffer[i]) {
	case 'ä':
		target[j++] = 'a';
		target[j++] = 'e';
		break;
	case 'ö':
		target[j++] = 'o';
		target[j++] = 'e';
		break;
	case 'ü':
		target[j++] = 'u';
		target[j++] = 'e';
		break;
	case 'Ä':
		target[j++] = 'A';
		target[j++] = 'E';
		break;
	case 'Ö':
		target[j++] = 'O';
		target[j++] = 'E';
		break;
	case 'Ü':
		target[j++] = 'U';
		target[j++] = 'E';
		break;
	case 'ß':
		target[j++] = 's';
		target[j++] = 's';
		break;
	default:
		target[j++] = origBuffer[i];
	}
}
// ... make sure the attribute's buffer is large enough, copy the new buffer
// and set the length ...
termAtt.resizeBuffer(newLength);
termAtt.copyBuffer(target, 0, newLength);
termAtt.setLength(newLength);
			}
			return true;
		} else {
			return false;
		}
	}

}


Re: Re: Query on autoGeneratePhraseQueries

2019-10-15 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I'm not sure how your config file is setup, but I know that the way we do 
multi-token synonyms is to have the sow (split on whitespace) parameter set to 
False while using the edismax parser. I'm not sure if this would work with 
PhraseQueries , but it might be worth a try! 

In our config file we do something like this: 



edismax
1.0
explicit
100
content_en
w3json_en
false

 

You can read a bit about the parameter here: 
https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
 

Best,
Audrey

-- 
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
 

On 10/15/19, 5:50 AM, "Shubham Goswami"  wrote:

Hi kshitij

Thanks for the reply!
I tried to debug it and found that raw query(black company) has parsed as
two separate queries
black and company and returning the results based on black query instead of
this it should have
got parsed as a single phrase query like("black company") because i am using
autoGeneratedPhraseQuery.
Do you have any idea about this please correct me if i am wrong.

Thanks
Shubham

On Tue, Oct 15, 2019 at 1:58 PM kshitij tyagi 
wrote:

> Hi,
>
> Try debugging your solr query and understand how it gets parsed. Try using
> "debug=true" for the same
>
> On Tue, Oct 15, 2019 at 12:58 PM Shubham Goswami <
> shubham.gosw...@hotwax.co>
> wrote:
>
> > *Hi all,*
> >
> > I am a beginner to solr framework and I am trying to implement
> > *autoGeneratePhraseQueries* property in a fieldtype of
> type=text_general, i
> > kept the property value as true and restarted the solr server but still
> it
> > is not taking my two words query like(Black company) as a phrase without
> > double quotes and returning the results only for Black.
> >
> >  Can somebody please help me to understand what am i missing ?
> > Following is my Schema.xml file code and i am using solr 7.5 version.
> >  > positionIncrementGap="100" multiValued="true"
> > autoGeneratePhraseQueries="true">
> > 
> >   =
> >> ignoreCase="true"/>
> >   
> > 
> > 
> >   
> >> ignoreCase="true"/>
> >> ignoreCase="true" synonyms="synonyms.txt"/>
> >   
> > 
> >   
> >
> >
> > --
> > *Thanks & Regards*
> > Shubham Goswami
> > Enterprise Software Engineer
> > *HotWax Systems*
> > *Enterprise open source experts*
> > cell: +91-7803886288
> > office: 0731-409-3684
> > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.hotwaxsystems.com=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=Zi9beGF58BzJUNUdCkeW0pwliKwq9vdTSh0V_lR0734=FhSkJBcmYw_bfHgq1enzuYQeOZwKHzlP9h4VwTZSL5E=
 
> >
>


-- 
*Thanks & Regards*
Shubham Goswami
Enterprise Software Engineer
*HotWax Systems*
*Enterprise open source experts*
cell: +91-7803886288
office: 0731-409-3684

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.hotwaxsystems.com=DwIBaQ=jf_iaSHvJObTbx-siA1ZOg=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M=Zi9beGF58BzJUNUdCkeW0pwliKwq9vdTSh0V_lR0734=FhSkJBcmYw_bfHgq1enzuYQeOZwKHzlP9h4VwTZSL5E=
 




Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-15 Thread Shawn Heisey

On 10/15/2019 2:49 AM, Vassil Velichkov (Sensika) wrote:

I've reduced the JVM heap on one of the shards to 20GB and then simulated some 
heavy load to reproduce the issue in a faster way.
The solr.log ROOT was set to TRACE level, but I can't really see anything meaningful, the 
solr.log ends @ 07:31:40.352 GMT, while the GC log shows later entries and "Pause 
Full (Allocation Failure)".
BTW, I've never seen in the GC logs any automatic attempts for Full GC. I can't 
see any OOME messages in any of the logs, only in the separate solr_oom_killer 
log, but this is the log of the killer script.

Also, to answer your previous questions:
1. We run completely stock Solr, not custom code, no plugins. 
Regardless, we never had such OOMs with Solr 4.x or Solr 6.x
2. It seems that Full GC is never triggered. In some cases in the past 
I've seen log entries for Full GC attempts, but the JVM crashes with OOM long 
before the Full GC could do anything.


The goal for good GC tuning is to avoid full GCs ever being needed.  It 
cannot be prevented entirely, especially when humongous allocations are 
involved ... but a well-tuned GC should not do them very often.


You have only included snippets from your logs.  We would need full logs 
for any of that information to be useful.  Attachments to the list 
rarely work, so you will need to use some kind of file sharing site.  I 
find dropbox to be useful for this, but if you prefer something else 
that works well, feel free to use it.


If the OutOfMemoryError exceptions is logged, it will be in solr.log. 
It is not always logged.  I will ask the Java folks if there is a way we 
can have the killer script provide the reason for the OOME.


It should be unnecessary to increase Solr's log level beyond INFO, but 
DEBUG might provide some useful info.  TRACE will be insanely large and 
I would not recommend it.


Thanks,
Shawn


Re: Query on autoGeneratePhraseQueries

2019-10-15 Thread Shubham Goswami
Hi kshitij

Thanks for the reply!
I tried to debug it and found that raw query(black company) has parsed as
two separate queries
black and company and returning the results based on black query instead of
this it should have
got parsed as a single phrase query like("black company") because i am using
autoGeneratedPhraseQuery.
Do you have any idea about this please correct me if i am wrong.

Thanks
Shubham

On Tue, Oct 15, 2019 at 1:58 PM kshitij tyagi 
wrote:

> Hi,
>
> Try debugging your solr query and understand how it gets parsed. Try using
> "debug=true" for the same
>
> On Tue, Oct 15, 2019 at 12:58 PM Shubham Goswami <
> shubham.gosw...@hotwax.co>
> wrote:
>
> > *Hi all,*
> >
> > I am a beginner to solr framework and I am trying to implement
> > *autoGeneratePhraseQueries* property in a fieldtype of
> type=text_general, i
> > kept the property value as true and restarted the solr server but still
> it
> > is not taking my two words query like(Black company) as a phrase without
> > double quotes and returning the results only for Black.
> >
> >  Can somebody please help me to understand what am i missing ?
> > Following is my Schema.xml file code and i am using solr 7.5 version.
> >  > positionIncrementGap="100" multiValued="true"
> > autoGeneratePhraseQueries="true">
> > 
> >   =
> >> ignoreCase="true"/>
> >   
> > 
> > 
> >   
> >> ignoreCase="true"/>
> >> ignoreCase="true" synonyms="synonyms.txt"/>
> >   
> > 
> >   
> >
> >
> > --
> > *Thanks & Regards*
> > Shubham Goswami
> > Enterprise Software Engineer
> > *HotWax Systems*
> > *Enterprise open source experts*
> > cell: +91-7803886288
> > office: 0731-409-3684
> > http://www.hotwaxsystems.com
> >
>


-- 
*Thanks & Regards*
Shubham Goswami
Enterprise Software Engineer
*HotWax Systems*
*Enterprise open source experts*
cell: +91-7803886288
office: 0731-409-3684
http://www.hotwaxsystems.com


Re: Minimum Tomcat version that supports latest Solr version

2019-10-15 Thread Shawn Heisey

On 10/15/2019 12:42 AM, vikas shinde wrote:

Dear Solr team,

Which is the latest Tomcat version that supports the latest Solr version
8.2.0?

Also provide details about previous Solr versions & their compatible Tomcat
versions.


Dominique is correct.  We do not officially support running under tomcat 
any more.  We strongly recommend running with the Jetty that is included 
as part of Solr.  We cannot guarantee compatibility with any other 
servlet container.


https://cwiki.apache.org/confluence/display/solr/WhyNoWar

If you choose to go against Solr project advice, then I would recommend 
running the latest release in one of the top two minor versions.  Right 
now that's Tomcat 8.5.x or 9.0.x.


I can't stress enough that if you choose to run a new Solr version under 
tomcat that you will be on your own for most problems.  It is NOT supported.


Thanks,
Shawn


Re: Facet Advice

2019-10-15 Thread Shawn Heisey

On 10/14/2019 3:25 PM, Moyer, Brett wrote:

Hello, looking for some advice, I have the suspicion we are doing Facets all wrong. We 
host financial information and recently "tagged" our pages with appropriate 
Facets. We have built a Flat design. Are we going at it the wrong way?

In Solr we have a "Tags" field, based on some magic we tagged each page on the site 
with a number of the below example Facets. We have the UI team sending queries in the form of 1) 
q=get a loan=Tags:Retirement, 2) q=get a loan=Tags:Retirement AND Tags:Move Money. 
This restricts the resultset hopefully guiding the user to their desired result. Something about 
it doesn’t seem right. Is this right with a flat single level pattern like what we have? Should 
each doc have multiple Fields to map to different values? Any help is appreciated. Thanks!

Example Facets:
Brokerage
Retirement
Open an Account
Move Money
Estate Planning


The queries you mentioned above do not have facets, only the q and fq 
parameters.  You also have not mentioned what in the results is wrong to 
you.


If you restrict the query to only a certain value in the tag field, then 
facets will only count documents that match the full query -- users will 
not be able to see the count of documents that do NOT match the query, 
unless you use tagging/excluding with your filters.  This is part of the 
functionality called multi-select faceting.


http://yonik.com/multi-select-faceting/

Because your message doesn't say what in the results is wrong, we can 
only guess about how to help you.  I do not know if the above 
information will be helpful or not.


Thanks,
Shawn


Re: Minimum Tomcat version that supports latest Solr version

2019-10-15 Thread Dominique Bejean
Hi,

Solr is not tested with Tomcat since version 4.
Why not using the embedded Jetty server ?

Regards

Dominique

Le mar. 15 oct. 2019 à 10:44, vikas shinde  a écrit :

> Dear Solr team,
>
> Which is the latest Tomcat version that supports the latest Solr version
> 8.2.0?
>
> Also provide details about previous Solr versions & their compatible Tomcat
> versions.
>
>
> Thanks & Regards.
> Vikas Shinde.
>


Re: solr 8.1.1 many time slower returning query results than solr 4.10.4 or solr 6.5.1

2019-10-15 Thread Shawn Heisey

On 10/14/2019 1:36 PM, Russell Bahr wrote:

Backend replacement of solr4 and hopefully Frontend replacement as well.
solr-spec 8.1.1
lucene-spec 8.1.1
Runtime Oracle Corporation OpenJDK 64-Bit Server VM 12 12+33
1 collection 6 shards 5 replicas per shard 17,919,889 current documents (35 
days worth of documents) - indexing new documents regularly throughout the day, 
deleting aged out documents nightly.


Java 12 is not recommended.  It is one of the "new feature" releases 
that only gets 6 months of support.  We would recommend Java 8 or Java 
11.  These are the versions with long term support.  Probably a good 
thing to be using OpenJDK, as the official Oracle Java now requires 
paying for a license.


Solr 8 ships with settings that enable the G1GC collector instead of 
CMS, because CMS is deprecated and will disappear in a future Java 
version.  We have seen problems with this when the system is 
misconfigured as far as heap size.  When the system is properly sized, 
G1 tends to do better than CMS, but when the heap is too large or too 
small, has a tendency to amplify garbage collection problems in comparison.


Looking at your solr.in.sh files for each version ... the Solr 4 install 
appears to be setting the heap to 512 megabytes.  This is definitely not 
enough for millions of documents, and if this is what the heap size is 
actually set to, would almost certainly run into memory errors 
frequently and have absolutely terrible performance.  But you are saying 
that it works well, so I don't think the heap is actually set to 512 
megabytes.  Maybe the bin/solr script has been modified directly to set 
the memory size instead of setting it in solr.in.sh where it should be set.


Solr 6 has a heap size of just under 27 gigabytes.  Solr 8 has a heap 
size of just under 8 gigabytes.  With millions of documents, it is 
likely that 8GB of heap is not quite big enough.


For each of your installations (Solr 4, Solr 6, and Solr 8) can you 
provide the screenshot described at this wiki page?


https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue

It would also be helpful to see the GC logs from Solr 8.  We would need 
at least one GC log, making sure that they cover at least a few hours, 
including the timeframe when the slow indexing and slow queries were 
observed.


Thanks,
Shawn


RE: Position search

2019-10-15 Thread Kaminski, Adi
Hi Markus,
Thanks for the guidance.

Is there any official Solr documentation for that ? Tried some googling, only 
some Stackoverflow / Lucene posts are available.

Also, will that approach work for the other use case of searching from end of 
documents ?
For example if I need to perform some term search from the end, e.g. "book" in 
the last 30 or 100 words.

Is there SpanLastQuery ?

Thanks,
Adi

-Original Message-
From: Markus Jelsma 
Sent: Tuesday, October 15, 2019 11:04 AM
To: solr-user@lucene.apache.org
Subject: RE: Position search

Hello Adi,

Try SpanFirstQuery. It limits the search to within the Nth term in the field.

Regards,
Markus



-Original message-
> From:Kaminski, Adi 
> Sent: Tuesday 15th October 2019 8:25
> To: solr-user@lucene.apache.org
> Subject: Position search
>
> Hi,
> What's the recommended way to search in Solr (assuming 8.2 is used) for 
> specific terms/phrases/expressions while limiting the search from position 
> perspective.
> For example to search only in the first/last 100 words of the document ?
>
> Is there any built-in functionality for that ?
>
> Thanks in advance,
> Adi
>
>
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.
>


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


RE: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-15 Thread Vassil Velichkov (Sensika)
Hi Shawn,

I've reduced the JVM heap on one of the shards to 20GB and then simulated some 
heavy load to reproduce the issue in a faster way.
The solr.log ROOT was set to TRACE level, but I can't really see anything 
meaningful, the solr.log ends @ 07:31:40.352 GMT, while the GC log shows later 
entries and "Pause Full (Allocation Failure)".
BTW, I've never seen in the GC logs any automatic attempts for Full GC. I can't 
see any OOME messages in any of the logs, only in the separate solr_oom_killer 
log, but this is the log of the killer script.

Also, to answer your previous questions:
1. We run completely stock Solr, not custom code, no plugins. 
Regardless, we never had such OOMs with Solr 4.x or Solr 6.x
2. It seems that Full GC is never triggered. In some cases in the past 
I've seen log entries for Full GC attempts, but the JVM crashes with OOM long 
before the Full GC could do anything.
3. On a side note - it seems that when a Solr query spans across 
multiple shards (our sharding is by timePublished), the HTTP connections from 
the aggregation node to the other shards frequently time-out @ 60 seconds, 
despite the Solr HTTP client request timeout is set dynamically to a much 
higher value (120-1200 seconds), and despite we've increased the timeout values 
in solr.xml for shardHandlerFactory (socketTimout / connTimeout) to 1200 
seconds. In such cases when we have inter-cluster aggregation timeouts the 
end-users get "Error retrieving data" and they usually refresh the App, 
basically re-running the heavy Solr queries over and over again. I included a 
sample from the application logs below. This usage-patter might also make the 
things worse - I don't know what happens within the shards when the aggregation 
fails due to timed-out inter-shard connections? If the all the shards keep 
executing the queries passed from the aggregation node, they just waste 
resources and all subsequent query re-runs just increase the resource 
consumption.

>>> SOLR.LOG (last 1 minute)
2019-10-15 07:31:12.848 DEBUG (Connection evictor) [   ] 
o.a.s.u.s.InstrumentedPoolingHttpClientConnectionManager Closing expired 
connections
2019-10-15 07:31:12.848 DEBUG (Connection evictor) [   ] 
o.a.s.u.s.InstrumentedPoolingHttpClientConnectionManager Closing connections 
idle longer than 5 MILLISECONDS
2019-10-15 07:31:12.848 DEBUG (Connection evictor) [   ] 
o.a.s.u.s.InstrumentedPoolingHttpClientConnectionManager Closing expired 
connections
2019-10-15 07:31:40.352 DEBUG (Connection evictor) [   ] 
o.a.s.u.s.InstrumentedPoolingHttpClientConnectionManager Closing expired 
connections
2019-10-15 07:31:40.352 DEBUG (Connection evictor) [   ] 
o.a.s.u.s.InstrumentedPoolingHttpClientConnectionManager Closing connections 
idle longer than 5 MILLISECONDS

>>> SOLR_GC.LOG (last 1 minute)
[2019-10-15T10:32:07.509+0300][528.164s] GC(64) Pause Full (Allocation Failure)
[2019-10-15T10:32:07.539+0300][528.193s] GC(64) Phase 1: Mark live objects
[2019-10-15T10:32:16.785+0300][537.440s] GC(64) Cleaned string and symbol 
table, strings: 23724 processed, 0 removed, symbols: 149625 processed, 0 removed
[2019-10-15T10:32:16.785+0300][537.440s] GC(64) Phase 1: Mark live objects 
9246.644ms
[2019-10-15T10:32:16.785+0300][537.440s] GC(64) Phase 2: Compute new object 
addresses
[2019-10-15T10:32:23.065+0300][543.720s] GC(64) Phase 2: Compute new object 
addresses 6279.790ms
[2019-10-15T10:32:23.065+0300][543.720s] GC(64) Phase 3: Adjust pointers
[2019-10-15T10:32:28.905+0300][549.560s] GC(64) Phase 3: Adjust pointers 
5839.647ms
[2019-10-15T10:32:28.905+0300][549.560s] GC(64) Phase 4: Move objects
[2019-10-15T10:32:28.905+0300][549.560s] GC(64) Phase 4: Move objects 0.135ms
[2019-10-15T10:32:28.921+0300][549.576s] GC(64) Using 8 workers of 8 to rebuild 
remembered set
[2019-10-15T10:32:34.763+0300][555.418s] GC(64) Eden regions: 0->0(160)
[2019-10-15T10:32:34.763+0300][555.418s] GC(64) Survivor regions: 0->0(40)
[2019-10-15T10:32:34.763+0300][555.418s] GC(64) Old regions: 565->565
[2019-10-15T10:32:34.763+0300][555.418s] GC(64) Humongous regions: 75->75
[2019-10-15T10:32:34.763+0300][555.418s] GC(64) Metaspace: 
52093K->52093K(1097728K)
[2019-10-15T10:32:34.764+0300][555.418s] GC(64) Pause Full (Allocation Failure) 
20383M->20383M(20480M) 27254.340ms
[2019-10-15T10:32:34.764+0300][555.419s] GC(64) User=56.35s Sys=0.03s 
Real=27.26s

>>> solr_oom_killer-8983-2019-10-15_07_32_34.log
Running OOM killer script for process 953 for Solr on port 8983
Killed process 953

>>> JVM GC Seettings
-Duser.timezone=UTC-XX:+ParallelRefProcEnabled
-XX:+UseG1GC
-XX:+UseLargePages
-XX:ConcGCThreads=8
-XX:G1HeapRegionSize=32m
-XX:NewRatio=3
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/log/solr
-XX:ParallelGCThreads=8
-XX:SurvivorRatio=4
-Xlog:gc*:file=/var/log/solr/solr_gc.log:time,uptime:filecount=9,filesize=20M
-Xms20480M
-Xmx20480M
-Xss256k

>>> App Log Sample Entries
[2019-10-15 00:01:06] PRD-01-WEB-04.ERROR [0.000580]: 

Minimum Tomcat version that supports latest Solr version

2019-10-15 Thread vikas shinde
Dear Solr team,

Which is the latest Tomcat version that supports the latest Solr version
8.2.0?

Also provide details about previous Solr versions & their compatible Tomcat
versions.


Thanks & Regards.
Vikas Shinde.


Re: Query on autoGeneratePhraseQueries

2019-10-15 Thread kshitij tyagi
Hi,

Try debugging your solr query and understand how it gets parsed. Try using
"debug=true" for the same

On Tue, Oct 15, 2019 at 12:58 PM Shubham Goswami 
wrote:

> *Hi all,*
>
> I am a beginner to solr framework and I am trying to implement
> *autoGeneratePhraseQueries* property in a fieldtype of type=text_general, i
> kept the property value as true and restarted the solr server but still it
> is not taking my two words query like(Black company) as a phrase without
> double quotes and returning the results only for Black.
>
>  Can somebody please help me to understand what am i missing ?
> Following is my Schema.xml file code and i am using solr 7.5 version.
>  positionIncrementGap="100" multiValued="true"
> autoGeneratePhraseQueries="true">
> 
>   =
>ignoreCase="true"/>
>   
> 
> 
>   
>ignoreCase="true"/>
>ignoreCase="true" synonyms="synonyms.txt"/>
>   
> 
>   
>
>
> --
> *Thanks & Regards*
> Shubham Goswami
> Enterprise Software Engineer
> *HotWax Systems*
> *Enterprise open source experts*
> cell: +91-7803886288
> office: 0731-409-3684
> http://www.hotwaxsystems.com
>


RE: Position search

2019-10-15 Thread Markus Jelsma
Hello Adi,

Try SpanFirstQuery. It limits the search to within the Nth term in the field.

Regards,
Markus

 
 
-Original message-
> From:Kaminski, Adi 
> Sent: Tuesday 15th October 2019 8:25
> To: solr-user@lucene.apache.org
> Subject: Position search
> 
> Hi,
> What's the recommended way to search in Solr (assuming 8.2 is used) for 
> specific terms/phrases/expressions while limiting the search from position 
> perspective.
> For example to search only in the first/last 100 words of the document ?
> 
> Is there any built-in functionality for that ?
> 
> Thanks in advance,
> Adi
> 
> 
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.
> 


Query on autoGeneratePhraseQueries

2019-10-15 Thread Shubham Goswami
*Hi all,*

I am a beginner to solr framework and I am trying to implement
*autoGeneratePhraseQueries* property in a fieldtype of type=text_general, i
kept the property value as true and restarted the solr server but still it
is not taking my two words query like(Black company) as a phrase without
double quotes and returning the results only for Black.

 Can somebody please help me to understand what am i missing ?
Following is my Schema.xml file code and i am using solr 7.5 version.


  =
  
  


  
  
  
  

  


-- 
*Thanks & Regards*
Shubham Goswami
Enterprise Software Engineer
*HotWax Systems*
*Enterprise open source experts*
cell: +91-7803886288
office: 0731-409-3684
http://www.hotwaxsystems.com


Position search

2019-10-15 Thread Kaminski, Adi
Hi,
What's the recommended way to search in Solr (assuming 8.2 is used) for 
specific terms/phrases/expressions while limiting the search from position 
perspective.
For example to search only in the first/last 100 words of the document ?

Is there any built-in functionality for that ?

Thanks in advance,
Adi


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.