Re: Upload/use a plugin JAR in ZooKeeper

2019-07-18 Thread Richard Walker
On 19 Jul 2019, at 12:02 pm, Chee Yee Lim  wrote:
> Not sure if this is the recommended way, but I managed to use plugin JARs
> with Solr Cloud.
> 
> Either include the absolute path to JAR in solrconfig.xml, or put the JAR
> in a "lib" folder relative to your instanceDir. See the following text from
> solrconfig.xml.

As I already noted in my original message of 16 July:

> I've been able to get this to work the "simple" way,
> by putting the JAR in the file system, and specifying
> basic
> 
>  
>  
> 
> values in solrconfig.xml. No problem doing it this way.

... and that this is precisely what I do _not_ want to do,
unless I have to.

I want to use a JAR file uploaded to the collection's znode,
as the user guide strongly suggests is possible.
(And also again, no, I don't want to configure/use the Blob Store.)



Re: Upload/use a plugin JAR in ZooKeeper

2019-07-18 Thread Chee Yee Lim
Not sure if this is the recommended way, but I managed to use plugin JARs
with Solr Cloud.

Either include the absolute path to JAR in solrconfig.xml, or put the JAR
in a "lib" folder relative to your instanceDir. See the following text from
solrconfig.xml.

If a "./lib" directory exists in your instanceDir, all files found in it
are included as if you had used the following syntax...


If you modified the solrconfig.xml or other config files, remember to
upload it into Solr using the ConfigSet API or via command line tools. And
create a collection that uses the custom configset.

Hope this helps.

On Fri, 19 Jul 2019 at 09:20, Richard Walker 
wrote:

> On 16 Jul 2019, at 4:14 pm, Richard Walker 
> wrote:
> > ...
> >
> > To be specific, I'm trying to use this idea:
> >
> > "Resources and plugins may be stored:
> > • in ZooKeeper under a collection’s configset node (SolrCloud only);"
> >
> > ...
> >
> > So far, so good. But now how do I refer to the JAR in solrconfig.xml?
> > The user guide doesn't really say.
> >
> > ...
> >
> > No success at all; I only get a ClassNotFoundException
> > for the plugin class.
> >
> > ...
>
> I've now found this earlier thread:
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201701.mbox/%3ccakhkodqv-y59+7m86ogvf1feqj6ieiogp8trhl1mg5fuajl...@mail.gmail.com%3e
>
> in which the second message (from Shawn Heisey) says:
>
> > I actually do not know what the path for lib directives is relative to
> > when running SolrCloud.  Most things in a core config are relative to
> > the location of the config file itself, but in this case, the config
> > file is not on the filesystem at all, it's in zookeeper, and I don't
> > think Solr can use jars in zookeeper.
>
> So is this the definitive answer? As I suggested in my
> earlier message, the documentation in the user guide at
> https://lucene.apache.org/solr/guide/8_1/resource-and-plugin-loading.html
> strongly suggests that you _can_ use plugin JARs uploaded
> to a collection's znode.
>
> Richard.
>
>


Re: Upload/use a plugin JAR in ZooKeeper

2019-07-18 Thread Richard Walker
On 16 Jul 2019, at 4:14 pm, Richard Walker  wrote:
> ...
> 
> To be specific, I'm trying to use this idea:
> 
> "Resources and plugins may be stored:
> • in ZooKeeper under a collection’s configset node (SolrCloud only);"
> 
> ...
> 
> So far, so good. But now how do I refer to the JAR in solrconfig.xml?
> The user guide doesn't really say.
> 
> ...
> 
> No success at all; I only get a ClassNotFoundException
> for the plugin class.
> 
> ...

I've now found this earlier thread:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201701.mbox/%3ccakhkodqv-y59+7m86ogvf1feqj6ieiogp8trhl1mg5fuajl...@mail.gmail.com%3e

in which the second message (from Shawn Heisey) says:

> I actually do not know what the path for lib directives is relative to
> when running SolrCloud.  Most things in a core config are relative to
> the location of the config file itself, but in this case, the config
> file is not on the filesystem at all, it's in zookeeper, and I don't
> think Solr can use jars in zookeeper.  

So is this the definitive answer? As I suggested in my
earlier message, the documentation in the user guide at
https://lucene.apache.org/solr/guide/8_1/resource-and-plugin-loading.html
strongly suggests that you _can_ use plugin JARs uploaded
to a collection's znode.

Richard.



Re: Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread Sunil Srinivasan
Hi Erick, 
Is there anyway I can get it to match documents containing at least one of the 
words of the original query? i.e. 'frozen' or 'dinner' or both. (But not 
partial matches of the synonyms)
Thanks,Sunil


-Original Message-
From: Erick Erickson 
To: solr-user 
Sent: Thu, Jul 18, 2019 04:42 AM
Subject: Re: Solr edismax parser with multi-word synonyms


This is not a phrase query, rather it’s requiring either pair of words
to appear in the title.

You’ve told it that “frozen dinner” and “microwave foods” are synonyms. 
So it’s looking for both the words “microwave” and “foods” in the title field, 
or “frozen” and “dinner” in the title field.

You’d see the same thing with single-word synonyms, albeit a little less
confusingly.


Best,
Erick


> On Jul 18, 2019, at 1:01 AM, kshitij tyagi  
> wrote:
> 
> Hi sunil,
> 
> 1. as you have added "microwave food" in synonym as a multiword synonym to
> "frozen dinner", edismax parsers finds your synonym in the file and is
> considering your query as a Phrase query.
> 
> This is the reason you are seeing parsed query as  +(((+title:microwave
> +title:food) (+title:frozen +title:dinner))), frozen dinner is considered
> as a phrase here.
> 
> If you want partial match on your query then you can add frozen dinner,
> microwave food, microwave, food to your synonym file and you will see the
> parsed query as:
> "+(((+title:microwave +title:food) title:miccrowave title:food
> (+title:frozen +title:dinner)))"
> Another option is to write your own custom query parser and use it as a
> plugin.
> 
> Hope this helps!!
> 
> kshitij
> 
> 
> On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:
> 
>> 
>> I have enabled the SynonymGraphFilter in my field configuration in order
>> to support multi-word synonyms (I am using Solr 7.6). Here is my field
>> configuration:
>> 
>>    
>>      
>>    
>> 
>>    
>>      
>>      > synonyms="synonyms.txt"/>
>>    
>> 
>> 
>> 
>> 
>> And this is my synonyms.txt file:
>> frozen dinner,microwave food
>> 
>> Scenario 1: blue shirt (query with no synonyms)
>> 
>> Here is my first Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +((title:blue) (title:shirt))
>> 
>> Scenario 2: frozen dinner (query with synonyms)
>> 
>> Now, here is my second Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>> 
>> I am wondering why the first query looks for documents containing at least
>> one of the two query tokens, whereas the second query looks for documents
>> with both of the query tokens? I would understand if it looked for both the
>> tokens of the synonyms (i.e. both microwave and food) to avoid the
>> sausagization problem. But I would like to get partial matches on the
>> original query at least (i.e. it should also match documents containing
>> just the token 'dinner').
>> 
>> Would any one know why the behavior is different across queries with and
>> without synonyms? And how could I work around this if I wanted partial
>> matches on queries that also have synonyms?
>> 
>> Ideally, I would like the parsed query in the second case to be:
>> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>> 
>> I'd appreciate any help with this. Thanks!
>> 


Re: Exception while adding data in multiple threads

2019-07-18 Thread Erick Erickson
I doubt multiple threads are the issue here. This looks  a lot more like you’re 
using SolrJ jars on the client that do not match the version running on Solr.

Best,
Erick

> On Jul 18, 2019, at 10:50 AM, Ashish Athavale 
>  wrote:
> 
> Hi,
> 
> I am getting below exception while adding data into solr. I am adding data 
> concurrently in 20 threads, 100 documents in a batch per thread.
> Each documents contains 40 fields and all are indexed.
> This issue occurs only when I add in multi threads.
> 
> Can you please help out here?
> 
> Caused by: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://10.88.66.154:8983/solr: Invalid version (expected 2, 
> but 95) or the data in not in 'javabin' format
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
>at 
> org.springframework.data.solr.core.SolrTemplate.lambda$saveBeans$3(SolrTemplate.java:227)
>at 
> org.springframework.data.solr.core.SolrTemplate$$Lambda$649/753427667.doInSolr(Unknown
>  Source)
>at 
> org.springframework.data.solr.core.SolrTemplate.execute(SolrTemplate.java:167)
> 
> Regards
> Ashish Athavale | Architect
> ashish_athav...@persistent.com| Cell: 
> +91-9881137580| Tel: +91-02067034708
> Persistent Systems Ltd. |  www.persistent.com
> 
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.



Exception while adding data in multiple threads

2019-07-18 Thread Ashish Athavale
Hi,

I am getting below exception while adding data into solr. I am adding data 
concurrently in 20 threads, 100 documents in a batch per thread.
Each documents contains 40 fields and all are indexed.
This issue occurs only when I add in multi threads.

Can you please help out here?

Caused by: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://10.88.66.154:8983/solr: Invalid version (expected 2, but 
95) or the data in not in 'javabin' format
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at 
org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at 
org.springframework.data.solr.core.SolrTemplate.lambda$saveBeans$3(SolrTemplate.java:227)
at 
org.springframework.data.solr.core.SolrTemplate$$Lambda$649/753427667.doInSolr(Unknown
 Source)
at 
org.springframework.data.solr.core.SolrTemplate.execute(SolrTemplate.java:167)

Regards
Ashish Athavale | Architect
ashish_athav...@persistent.com| Cell: 
+91-9881137580| Tel: +91-02067034708
Persistent Systems Ltd. |  www.persistent.com

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: Does commitWithin override autoSoftCommit?

2019-07-18 Thread Benjamin Mellish
Thank you for the reply  We are using Solr to maintain a state and kick off
certain processing steps on the file. I am trying to tighten up this
timing.  I understand this is not a great system but it's sort of what I'm
stuck with. I've been reading over the NRT documents and adjusting my
commitWithin time down to about 1 second. I'm getting pretty good results
this way.  Is there another suggestion about how I might get a single field
with a state out of a doc in NRT without adjusting these fields?  I forgot
to mention I'm using 4.10.

On Thu, Jul 18, 2019, 8:50 AM Shawn Heisey  wrote:

> On 7/18/2019 9:37 AM, Benjamin Mellish wrote:
> > I have a solrconfig.xml file as follows:
> >
> > 
> >  
> >  2000
> >  
> >  
> >  2
> >  false
> >  
> >  
> >  ${solr.ulog.dir:}
> >  
> > 
> >
> > But I also submit records with a 'commitWithin' of 10 seconds. It seems
> > that my documents are not searchable with the soft commit every second,
> but
> > rather with the commitWithin or the hard commit. Is the 'commitWithin'
> > setting taking precedence over the autoSoftCommit timer?
>
> With this config and a 10 second commitWithin, the autoSoftCommit is
> what will fire first -- two seconds after the first update completes.
>
> It could be that the commit takes so long to complete that it LOOKS like
> it's the larger timeframe.  So if a commit that opens a new searcher
> takes 10 seconds or longer to complete, it will start after two seconds
> (the autoSoftCommit value), but it will take at least 12 seconds total
> for users to actually see the change.
>
> Reducing or eliminating cache warming can help commits to complete
> faster.  Sometimes it is not possible to speed up the soft commit.
> Having low values for autoSoftCommit like two seconds is not recommended.
>
> Thanks,
> Shawn
>


HowtoConfigureIntelliJ link is broken

2019-07-18 Thread Richard Goodman
Hi there,

I went to set up the repo with intellij, but it was having some problems
figuring out the source folders etc., So I went to navigate to the
following link 
as I remember from the past there were a few commands that helped, however,
it appears to be broken? I used a website archiver to retrieve the original
contents, but wasn't sure if it had been raised.

Thanks,

-- 

Richard Goodman|Data Infrastructure engineer

richa...@brandwatch.com


NEW YORK   | BOSTON   | BRIGHTON   | LONDON   | BERLIN |   STUTTGART |
PARIS   | SINGAPORE | SYDNEY




Re: Returning multiple fields in graph streaming expression response documents

2019-07-18 Thread Joel Bernstein
Hi Ahmed,

Take a look at the fetch
https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch

It probably makes sense to allow more field to be returned from a nodes
expression as well.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jul 17, 2019 at 3:12 AM Ahmed Adel  wrote:

> Hi,
>
> Thank you for your reply. Could you give more details on the „join“
> operation, such as what the sides of the join and the joining condition
> would be in this case?
>
> Best regards,
> A.
>
> On Tue, Jul 16, 2019 at 2:02 PM markus kalkbrenner <
> markus.kalkbren...@biologis.com> wrote:
>
> >
> >
> > You have to perform a „join“ to get more fields.
> >
> > > Am 16.07.2019 um 13:52 schrieb Ahmed Adel :
> > >
> > > Hi,
> > >
> > > How can multiple fields be returned in graph traversal streaming
> > expression
> > > response documents? For example, the following query:
> > >
> > > nodes(emails,
> > >  walk="john...@apache.org->from",
> > >  gather="to")
> > >
> > >
> > > returns these documents in the response:
> > >
> > > {
> > >  "result-set": {
> > >"docs": [
> > >  {
> > >"node": "sl...@campbell.com",
> > >"collection": "emails",
> > >"field": "to",
> > >"level": 1
> > >  },
> > >  {
> > >"node": "catherine.per...@enron.com",
> > >"collection": "emails",
> > >"field": "to",
> > >"level": 1
> > >  },
> > >  {
> > >"node": "airam.arte...@enron.com",
> > >"collection": "emails",
> > >"field": "to",
> > >"level": 1
> > >  },
> > >  {
> > >"EOF": true,
> > >"RESPONSE_TIME": 44
> > >  }
> > >]
> > >  }
> > > }
> > >
> > > How can the query above be modified to return more document fields,
> > > "subject" for example?
> > >
> > > Best regards,
> > >
> > > A.
> >
>


Re: Does commitWithin override autoSoftCommit?

2019-07-18 Thread Shawn Heisey

On 7/18/2019 9:37 AM, Benjamin Mellish wrote:

I have a solrconfig.xml file as follows:


 
 2000
 
 
 2
 false
 
 
 ${solr.ulog.dir:}
 


But I also submit records with a 'commitWithin' of 10 seconds. It seems
that my documents are not searchable with the soft commit every second, but
rather with the commitWithin or the hard commit. Is the 'commitWithin'
setting taking precedence over the autoSoftCommit timer?


With this config and a 10 second commitWithin, the autoSoftCommit is 
what will fire first -- two seconds after the first update completes.


It could be that the commit takes so long to complete that it LOOKS like 
it's the larger timeframe.  So if a commit that opens a new searcher 
takes 10 seconds or longer to complete, it will start after two seconds 
(the autoSoftCommit value), but it will take at least 12 seconds total 
for users to actually see the change.


Reducing or eliminating cache warming can help commits to complete 
faster.  Sometimes it is not possible to speed up the soft commit. 
Having low values for autoSoftCommit like two seconds is not recommended.


Thanks,
Shawn


Does commitWithin override autoSoftCommit?

2019-07-18 Thread Benjamin Mellish
I have a solrconfig.xml file as follows:



2000


2
false


${solr.ulog.dir:}



But I also submit records with a 'commitWithin' of 10 seconds. It seems
that my documents are not searchable with the soft commit every second, but
rather with the commitWithin or the hard commit. Is the 'commitWithin'
setting taking precedence over the autoSoftCommit timer?


Re: Correct order of mappinCharFilter, Tokenizer and GermanStemFilter

2019-07-18 Thread Shawn Heisey

On 7/18/2019 3:01 AM, Doris Peter wrote:

So, the mappingCharFilter seems to be executed at first, no matter which 
position it has in the configuration?


CharFilters are always executed first.  Then one Tokenizer, then 
Filters.  This will always be the case, even if you order the config so 
that the Tokenizer and one or more Filters are listed before CharFilter 
entries.  It's one of the quirks of analysis definitions.


The fix for this would be to see if there is a regular Filter that does 
what the CharFilter you're using does and use that filter instead.


If it were me, I would likely use ICUFoldingFilterFactory rather than 
MappingCharFilterFactory.  The ICU analysis components do require 
installing contrib jars into Solr.


https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#icu-folding-filter

Thanks,
Shawn


Problem with confirming e-mail address for joining the user mailing list

2019-07-18 Thread Vassil Velichkov (Sensika)
Hi,

I am trying to join the Solr User mailing list, but I can’t confirm my e-mail 
address – following the instructions in the “Confirm subscribe” e-mail, I 
replied to the confirmation request e-mail, but I have been getting 
“Undeliverable” receipts since yesterday. In short - that’s what my mail server 
returns as an error:
Remote Server returned '501 Syntax error - Badly formatted address.'

Please, find attached both the “Undeliverable: Confirmation” and the request 
for confirmation e-mails.

Is there any other way to confirm my e-mail and my willingness to join the Solr 
User mailing list?

Best regards,
Vassil Velichkov
CTO, Sensika Technologies


--- Begin Message ---
Delivery has failed to these recipients or groups:

solr-user-sc.1563444943.hlpklibaogainglfpegm-vassil.velichkov=sensika@lucene.apache.org
 
(solr-user-sc.1563444943.hlpklibaogainglfpegm-vassil.velichkov=sensika@lucene.apache.org)
A problem occurred while delivering this message to this email address. Try 
sending this message again. If the problem continues, please contact your 
helpdesk.



The following organization rejected your message: Requested.







Diagnostic information for administrators:

Generating server: DGM-XCH-01.grammasystems.com


solr-user-sc.1563444943.hlpklibaogainglfpegm-vassil.velichkov=sensika@lucene.apache.org
Requested
Remote Server returned '501 Syntax error - Badly formatted address.'


Original message headers:

Received: from DGM-XCH-01.grammasystems.com (10.0.0.33) by
 DGM-XCH-01.grammasystems.com (10.0.0.33) with Microsoft SMTP Server (TLS) id
 15.0.1367.3; Thu, 18 Jul 2019 13:24:01 +0300
Received: from DGM-XCH-01.grammasystems.com ([::1]) by
 DGM-XCH-01.grammasystems.com ([::1]) with mapi id 15.00.1367.000; Thu, 18 Jul
 2019 13:24:01 +0300
From: "Vassil Velichkov (Sensika)" 
To:

"solr-user-sc.1563444943.hlpklibaogainglfpegm-vassil.velichkov=sensika@lucene.apache.org"


Subject: Confirmation
Thread-Topic: Confirmation
Thread-Index: AdU9UupBF+ZXBemERqeUzQFTC/Cflw==
Date: Thu, 18 Jul 2019 10:24:01 +
Message-ID: 
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [91.209.8.202]
Content-Type: multipart/alternative;
boundary="_000_dc2bbdb7d5604ed5b90c7f1c1e826127DGMXCH01grammasystemsco_"
MIME-Version: 1.0

Reporting-MTA: dns; DGM-XCH-01.grammasystems.com

Final-recipient: RFC822;
 solr-user-sc.1563444943.hlpklibaogainglfpegm-vassil.velichkov=sensika.com@lucene.apache.org
Action: failed
Status: 5.0.0
Remote-MTA: dns; Requested
X-Supplementary-Info: 

--- Begin Message ---

--- End Message ---
--- End Message ---
--- Begin Message ---
Hi! This is the ezmlm program. I'm managing the
solr-user@lucene.apache.org mailing list.

I'm working for my owner, who can be reached
at solr-user-ow...@lucene.apache.org.

To confirm that you would like

   vassil.velich...@sensika.com

added to the solr-user mailing list, please send
a short reply to this address:

   
solr-user-sc.1563444943.hlpklibaogainglfpegm-vassil.velichkov=sensika@lucene.apache.org

Usually, this happens when you just hit the "reply" button.
If this does not work, simply copy the address and paste it into
the "To:" field of a new message.

or click here:

mailto:solr-user-sc.1563444943.hlpklibaogainglfpegm-vassil.velichkov=sensika@lucene.apache.org

This confirmation serves two purposes. First, it verifies that I am able
to get mail through to you. Second, it protects you in case someone
forges a subscription request in your name.

Please note that ALL Apache dev- and user- mailing lists are publicly
archived.  Do familiarize yourself with Apache's public archive policy at

http://www.apache.org/foundation/public-archives.html

prior to subscribing and posting messages to solr-user@lucene.apache.org.
If you're not sure whether or not the policy applies to this mailing list,
assume it does unless the list name contains the word "private" in it.

Some mail programs are broken and cannot handle long addresses. If you
cannot reply to this request, instead send a message to
 and put the
entire address listed above into the "Subject:" line.


--- Administrative commands for the solr-user list ---

I can handle administrative requests automatically. Please
do not send them to the list address! Instead, send
your message to the correct command address:

To subscribe to the list, send a message to:
   

To remove your address from the list, send a message to:
   

Send mail to the following for info and FAQ for this list:
   
   

Similar addresses exist for the digest list:
   
   

To get messages 123 through 145 (a maximum of 100 per request), mail:
   

To get an index with subject and author for messages 123-456 , mail:
   

They are always returned as sets of 100, max 2000 per 

Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread Erick Erickson
This is not a phrase query, rather it’s requiring either pair of words
to appear in the title.

You’ve told it that “frozen dinner” and “microwave foods” are synonyms. 
So it’s looking for both the words “microwave” and “foods” in the title field, 
or “frozen” and “dinner” in the title field.

You’d see the same thing with single-word synonyms, albeit a little less
confusingly.


Best,
Erick


> On Jul 18, 2019, at 1:01 AM, kshitij tyagi  
> wrote:
> 
> Hi sunil,
> 
> 1. as you have added "microwave food" in synonym as a multiword synonym to
> "frozen dinner", edismax parsers finds your synonym in the file and is
> considering your query as a Phrase query.
> 
> This is the reason you are seeing parsed query as  +(((+title:microwave
> +title:food) (+title:frozen +title:dinner))), frozen dinner is considered
> as a phrase here.
> 
> If you want partial match on your query then you can add frozen dinner,
> microwave food, microwave, food to your synonym file and you will see the
> parsed query as:
> "+(((+title:microwave +title:food) title:miccrowave title:food
> (+title:frozen +title:dinner)))"
> Another option is to write your own custom query parser and use it as a
> plugin.
> 
> Hope this helps!!
> 
> kshitij
> 
> 
> On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:
> 
>> 
>> I have enabled the SynonymGraphFilter in my field configuration in order
>> to support multi-word synonyms (I am using Solr 7.6). Here is my field
>> configuration:
>> 
>>
>>  
>>
>> 
>>
>>  
>>  > synonyms="synonyms.txt"/>
>>
>> 
>> 
>> 
>> 
>> And this is my synonyms.txt file:
>> frozen dinner,microwave food
>> 
>> Scenario 1: blue shirt (query with no synonyms)
>> 
>> Here is my first Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +((title:blue) (title:shirt))
>> 
>> Scenario 2: frozen dinner (query with synonyms)
>> 
>> Now, here is my second Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>> 
>> And this is the parsed query I see in the debug output:
>> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>> 
>> I am wondering why the first query looks for documents containing at least
>> one of the two query tokens, whereas the second query looks for documents
>> with both of the query tokens? I would understand if it looked for both the
>> tokens of the synonyms (i.e. both microwave and food) to avoid the
>> sausagization problem. But I would like to get partial matches on the
>> original query at least (i.e. it should also match documents containing
>> just the token 'dinner').
>> 
>> Would any one know why the behavior is different across queries with and
>> without synonyms? And how could I work around this if I wanted partial
>> matches on queries that also have synonyms?
>> 
>> Ideally, I would like the parsed query in the second case to be:
>> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>> 
>> I'd appreciate any help with this. Thanks!
>> 



Correct order of mappinCharFilter, Tokenizer and GermanStemFilter

2019-07-18 Thread Doris Peter
Hi, 

another problem with the stemming:

Most of our texts are in German, so we use the GermanStemFilterFactory. But we 
also use MappingCharFilterFactory where we map for example ä->ae.

But of course we want the stemming to turn for example 'häuser' into 'haus', 
which the GermanStemFilterFactory should do, according to the documentation.

At the moment, my configuration looks like this:


  






  


So, Stemming before CharFilter.

But the Solr Analyzer says:

MCF 0 h a e u s e r

WT

text
raw_bytes
start
end
positionLength
type
termFrequency
position

haeuser
[68 61 65 75 73 65 72]
0
6
1
word
1
1
LCF

text
raw_bytes
start
end
positionLength
type
termFrequency
position

haeuser
[68 61 65 75 73 65 72]
0
6
1
word
1
1
GSF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword

haeu
[68 61 65 75]
0
6
1
word
1
1
false
DPTF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword
payload

haeu
[68 61 65 75]
0
6
1
word
1
1
false
WDGF

text
raw_bytes
start
end
positionLength
type
termFrequency
position
keyword
payload

haeu
[68 61 65 75]
0
6
1
word
1
1
false

So, the mappingCharFilter seems to be executed at first, no matter which 
position it has in the configuration?

Solr documentation also says, it should be put before the Tokenizer:
https://lucene.apache.org/solr/guide/7_6/charfilterfactories.html
"CharFilters can be chained like Token Filters and placed in front of a 
Tokenizer."

But if the word häuser is changed to haeuser, the stemmer doesn't stem the word 
anymore :-/

Is there a way to solve this problem?

Thanks a lot, Doris




Problems with StemFilter and Wildcards

2019-07-18 Thread Doris Peter
Hi, we have got some problems with the stemming of our ocr-texts:

We use the following configuration for our full-text-ocr field:

 
  






  



Now it seems, the StemFilter and wildcard queries don't work together.
When I search for 

Weltkriegs I get 6 documents.

But when I search for 

Weltkrie?s I get only 1 document.

For

wel?kriegs as well, only 1 document.


It happens only with terms which are changed by the stemming filter. Is there a 
way to fix this?


Thanks a lot, Doris



Re: Solr edismax parser with multi-word synonyms

2019-07-18 Thread kshitij tyagi
Hi sunil,

1. as you have added "microwave food" in synonym as a multiword synonym to
"frozen dinner", edismax parsers finds your synonym in the file and is
considering your query as a Phrase query.

This is the reason you are seeing parsed query as  +(((+title:microwave
+title:food) (+title:frozen +title:dinner))), frozen dinner is considered
as a phrase here.

If you want partial match on your query then you can add frozen dinner,
microwave food, microwave, food to your synonym file and you will see the
parsed query as:
"+(((+title:microwave +title:food) title:miccrowave title:food
(+title:frozen +title:dinner)))"
 Another option is to write your own custom query parser and use it as a
plugin.

Hope this helps!!

kshitij


On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan  wrote:

>
> I have enabled the SynonymGraphFilter in my field configuration in order
> to support multi-word synonyms (I am using Solr 7.6). Here is my field
> configuration:
> 
> 
>   
> 
>
> 
>   
>synonyms="synonyms.txt"/>
> 
> 
>
> 
>
> And this is my synonyms.txt file:
> frozen dinner,microwave food
>
> Scenario 1: blue shirt (query with no synonyms)
>
> Here is my first Solr query:
>
> http://localhost:8983/solr/base/search?q=blue+shirt=title=edismax=on
>
> And this is the parsed query I see in the debug output:
> +((title:blue) (title:shirt))
>
> Scenario 2: frozen dinner (query with synonyms)
>
> Now, here is my second Solr query:
>
> http://localhost:8983/solr/base/search?q=frozen+dinner=title=edismax=on
>
> And this is the parsed query I see in the debug output:
> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>
> I am wondering why the first query looks for documents containing at least
> one of the two query tokens, whereas the second query looks for documents
> with both of the query tokens? I would understand if it looked for both the
> tokens of the synonyms (i.e. both microwave and food) to avoid the
> sausagization problem. But I would like to get partial matches on the
> original query at least (i.e. it should also match documents containing
> just the token 'dinner').
>
> Would any one know why the behavior is different across queries with and
> without synonyms? And how could I work around this if I wanted partial
> matches on queries that also have synonyms?
>
> Ideally, I would like the parsed query in the second case to be:
> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>
> I'd appreciate any help with this. Thanks!
>