RE: Delete documents from the Solr index using SolrJ

2019-11-04 Thread Peter Lancaster
You can delete documents in SolrJ by using deleteByQuery. Using this you can 
delete any number of documents from your index or all your documents depending 
on the query you specify as the parameter. How you use it is down to your 
application.

You haven't said if your application performs a full re-index, but if so you 
might find it useful to index a version number for your data which you 
increment each time you perform the full indexing. Then you can increment 
version, re-index data, delete data for old version number.


-Original Message-
From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
Sent: 04 November 2019 15:03
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] RE: Delete documents from the Solr index using SolrJ

Thanks!
Actually am working on a Java web application using SolrJ for Solr search.
The users would actually be uploading/editing/deleting the docs. What have done 
is defined a location/directory where the docs would be stored and passed that 
location for indexing.
So, I am quite confused how to carry on with the solution that you proposed. 
Please guide !

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: 04 November 2019 20:10
To: solr-user@lucene.apache.org
Subject: Re: Delete documents from the Solr index using SolrJ

delete them by query would do the trick unless im missing something significant 
in what youre trying to do here. you can just pass in an xml
command:
'".$kill_query."'

On Mon, Nov 4, 2019 at 9:37 AM Khare, Kushal (MIND) < 
kushal.kh...@mind-infotech.com> wrote:

> In my case, id won't be same.
> Suppose, I have a doc with id : 20
> Now, it's newer version would be either 20.1 or 22 What in this case?
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 04 November 2019 20:04
> To: solr-user@lucene.apache.org
> Subject: Re: Delete documents from the Solr index using SolrJ
>
> when you add a new document using the same "id" value as another it
> just over writes it
>
> On Mon, Nov 4, 2019 at 9:30 AM Khare, Kushal (MIND) <
> kushal.kh...@mind-infotech.com> wrote:
>
> > Could you please let me know how to achieve that ?
> >
> >
> > -Original Message-
> > From: Jörn Franke [mailto:jornfra...@gmail.com]
> > Sent: 04 November 2019 19:59
> > To: solr-user@lucene.apache.org
> > Subject: Re: Delete documents from the Solr index using SolrJ
> >
> > I don’t understand why it is not possible.
> >
> > However why don’t you simply overwrite the existing document instead
> > of
> > add+delete
> >
> > > Am 04.11.2019 um 15:12 schrieb Khare, Kushal (MIND) <
> > kushal.kh...@mind-infotech.com>:
> > >
> > > Hello mates!
> > > I want to know how we can delete the documents from the Solr index .
> > Suppose for my system, I have a document that has been indexed, now
> > its newer version is into use, so I want to use the latest one, for
> > that I want the previous one to be deleted from the index.
> > > Kindly help me a way out !
> > > I went through many articles and blogs, got the way (methods) for
> > deleting , but not actually, how to do it, because it's not possible
> > to delete every time by passing id's in around 50,000 doc system.
> > > Please suggest!
> > >
> > > 
> > >
> > > The information contained in this electronic message and any
> > > attachments
> > to this message are intended for the exclusive use of the
> > addressee(s) and may contain proprietary, confidential or privileged 
> > information.
> > If you are not the intended recipient, you should not disseminate,
> > distribute or copy this e-mail. Please notify the sender immediately
> > and destroy all copies of this message and any attachments. WARNING:
> > Computer viruses can be transmitted via email. The recipient should
> > check this email and any attachments for the presence of viruses.
> > The company accepts no liability for any damage caused by any
> > virus/trojan/worms/malicious code transmitted by this email.
> > www.motherson.com
> >
> > 
> >
> > The information contained in this electronic message and any
> > attachments to this message are intended for the exclusive use of
> > the
> > addressee(s) and may contain proprietary, confidential or privileged
> > information. If you are not the intended recipient, you should not
> > disseminate, distribute or copy this e-mail. Please notify the
> > sender immediately and destroy all copies of this message and any
> > attachments. WARNING: Computer viruses can be transmitted via email.
> > The recipient should check this email and any attachments for the
> > presence of viruses. The company accepts no liability for any damage
> > caused by any virus/trojan/worms/malicious code transmitted by this
> > email. www.motherson.com
> >
>
> 
>
> The information contained in this electronic message and any
> attachments to this message are intended

RE: [EXTERNAL] Re: High cpu usage when adding documents to v7.7 solr cloud

2019-10-15 Thread Peter Lancaster
Hi Oleksandr,

Thanks very much for help. Yes that jira looks like exactly our problem.

I'll give that a go tomorrow.

Cheers,
Peter.

-Original Message-
From: Oleksandr Drapushko [mailto:drapus...@gmail.com]
Sent: 15 October 2019 19:52
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: High cpu usage when adding documents to v7.7 solr cloud

Hi Peter,

This bug was introduced in Solr 7.7.0. It is related to Java 8. And it was 
fixed in Solr 7.7.2.

Here are the ways to deal with it:
1. Upgrade to Solr 7.7.2
2. Patch your Solr 7.7
3. Use Java 9+

You can read more on this here:
https://issues.apache.org/jira/browse/SOLR-13349


Regards,
Oleksandr

On Tue, Oct 15, 2019 at 8:31 PM Peter Lancaster < 
peter.lancas...@findmypast.com> wrote:

> We have a solr cloud on v7.7.0 and we observe very high cpu usage when
> we're indexing new documents.
>
> The solr cloud in question has 50 shards and 2 replicas of each and
> we're using NRT. Obviously indexing takes some resources but we see
> pretty much 100% cpu usage when we're indexing documents and we
> haven't seen this before on other v6.3.0 solr clouds indexing under a
> similar load. In the
> v7.7.0 cloud we're using nested child documents but other than that
> the set-ups are quite similar.
>
> For us performance is more important than having updates reflected in
> real-time and we have configured commits as follows:
> 
> 10
> 
> 180
> 30
> false
> 
> 
> -1
> false
> 
> 
> ${solr.data.dir:}
> 
> 
>
> I can observe the problem on a test server with 3 shards without any
> replication but the same schema and solr config. If I add a simple
> document like {Id:TEST01} through the document page in the solr admin
> UI I immediately I see 100% cpu usage on one core of the test server
> and this lasts for 300 seconds - the same time as the maxTime for
> autoCommit. If I then change the maxTime to say 10 seconds, then the
> high cpu usage lasts for just 10 seconds. I can't see anything being
> logged that would indicate what solr is using the cpu for.
>
> Have we made some error in our configuration or is this behaviour
> expected in v7? It just seems really odd that it's using loads of cpu
> just to add a single document and that the high usage lasts for the
> maxTime on the autocommit. I'm guessing that whatever is making the
> single document addition so inefficient is also affecting the
> performance of our live solr cloud and contributing to the 100% cpu
> usage that we observe when adding new documents. Any help, advice or insight 
> would be appreciated.
>
> Cheers,
> Peter Lancaster | Developer
> 
> This message is confidential and may contain privileged information.
> You should not disclose its contents to any other person. If you are
> not the intended recipient, please notify the sender named above
> immediately. It is expressly declared that this e-mail does not
> constitute nor form part of a contract or unilateral obligation.
> Opinions, conclusions and other information in this message that do
> not relate to the official business of findmypast shall be understood as 
> neither given nor endorsed by it.
> 
>


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.



High cpu usage when adding documents to v7.7 solr cloud

2019-10-15 Thread Peter Lancaster
We have a solr cloud on v7.7.0 and we observe very high cpu usage when we're 
indexing new documents.

The solr cloud in question has 50 shards and 2 replicas of each and we're using 
NRT. Obviously indexing takes some resources but we see pretty much 100% cpu 
usage when we're indexing documents and we haven't seen this before on other 
v6.3.0 solr clouds indexing under a similar load. In the v7.7.0 cloud we're 
using nested child documents but other than that the set-ups are quite similar.

For us performance is more important than having updates reflected in real-time 
and we have configured commits as follows:

10

180
30
false


-1
false


${solr.data.dir:}



I can observe the problem on a test server with 3 shards without any 
replication but the same schema and solr config. If I add a simple document 
like {Id:TEST01} through the document page in the solr admin UI I immediately I 
see 100% cpu usage on one core of the test server and this lasts for 300 
seconds - the same time as the maxTime for autoCommit. If I then change the 
maxTime to say 10 seconds, then the high cpu usage lasts for just 10 seconds. I 
can't see anything being logged that would indicate what solr is using the cpu 
for.

Have we made some error in our configuration or is this behaviour expected in 
v7? It just seems really odd that it's using loads of cpu just to add a single 
document and that the high usage lasts for the maxTime on the autocommit. I'm 
guessing that whatever is making the single document addition so inefficient is 
also affecting the performance of our live solr cloud and contributing to the 
100% cpu usage that we observe when adding new documents. Any help, advice or 
insight would be appreciated.

Cheers,
Peter Lancaster | Developer

This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.



RE: Geofilt and distance measurement problems using SpatialRecursivePrefixTreeFieldType field type

2018-12-21 Thread Peter Lancaster
Hi David,

Ignore my previous reply.

I think you've supplied the answer. Yes we do need to use a space to index 
points in an rpt field, but when we do that the order is flipped from Lat,Lon 
to Lon Lat, so we need to re-index our data. In my defence that is far from 
obvious in the documentation.

Thanks again for your help.

Cheers,
Peter.

-Original Message-
From: David Smiley [mailto:david.w.smi...@gmail.com]
Sent: 21 December 2018 04:44
To: solr-user@lucene.apache.org
Subject: Re: Geofilt and distance measurement problems using 
SpatialRecursivePrefixTreeFieldType field type

Hi Peter,

Use of an RPT field for distance sorting/boosting is to be avoided where 
possible because it's very inefficient at this specific use-case.  Simply use 
LatLonType for this task, and continue to use RPT for the filter/search 
use-case.

Also I see you putting a space between the coordinates instead of a
comma...   yet you have geo (latitude & longitude data) so this is a bit
confusing.  Do "lat,lon".  I think a space will be interpreted as "x y"
(thus reversed).  Perhaps you've mixed up the coordinates and this explains the 
error?  A quick lookup of your sample coordinates suggests to me this is likely 
the problem.  It's a common mistake.

BTW this:
maxDistErr="0.2" distanceUnits="kilometers"
means 200m accuracy (or better).  Is this what you want?  Just checking.

~ David

On Thu, Dec 13, 2018 at 6:38 AM Peter Lancaster < 
peter.lancas...@findmypast.com> wrote:

> I am currently using Solr 5.5.2 and implementing a GeoSpatial search
> that returns results within a radius in Km of a specified LatLon.
> Using a field of type solr.LatLonType and a geofilt query this gives
> good results but is much slower than our regular queries. Using a bbox
> query is faster but of course less accurate.
>
> I then attempted to use a field of type
> solr.SpatialRecursivePrefixTreeFieldType to check performance and
> because I want to be able to do searches within a polygon eventually.
> The field is defined as follows
>
>   class="solr.SpatialRecursivePrefixTreeFieldType"
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> geo="true" distErrPct="0.05" maxDistErr="0.2"
> distanceUnits="kilometers" autoIndex="true"/>
>
>  stored="true" multiValued="false" omitNorms="true" />
>
> I'm just using it to index single points right now. The problem is
> that the distance calculation is not working correctly. It seems to
> overstate the distances for differences in longitude.
>
> For example a query for
> &fl=Id,LatLonRPT__location_rpt,_dist_:geodist()&sfield=LatLonRPT__loca
> tion_rpt&pt=53.409490 -2.979677&query={!geofilt
> sfield=LatLonRPT__location_rpt pt="53.409490 -2.979677" d=25} returns
>
> {
> "Id": "HAR/CH1/80763270",
> "LatLonRPT__location_rpt": "53.2 -2.91",
> "_dist_": 24.295607
> },
> {
> "Id": "HAR/CH42/1918283949",
> "LatLonRPT__location_rpt": "53.393239 -3.028859",
> "_dist_": 5.7587695
> }
>
> The true distances for these results are 23.67 and 3.73 km and other
> results at a true distance of 17 km aren't returned within the 25 km radius.
>
> The explain has the following
>
> +IntersectsPrefixTreeQuery(IntersectsPrefixTreeQuery(fieldName=LatLonR
> +PT__location_rpt,queryShape=Circle(Pt(x=53.40949,y=-2.979677),
> d=0.2° 25.00km),detailLevel=6,prefixGridScanLevel=7))
>
> Is my set up incorrect in some way or is the
> SpatialRecursivePrefixTreeFieldType not suitable for doing radius
> searches on points in this way?
>
> Thanks in anticipation for any suggestions.
>
> Peter Lancaster.
>
> 
> This message is confidential and may contain privileged information.
> You should not disclose its contents to any other person. If you are
> not the intended recipient, please notify the sender named above
> immediately. It is expressly declared that this e-mail does not
> constitute nor form part of a contract or unilateral obligation.
> Opinions, conclusions and other information in this message that do
> not relate to the official business of findmypast shall be understood as 
> neither given nor endorsed by it.
> 
>
> __
> 
>
> This email has been checked for virus and other malicious content
> prior to leaving our network.
> ___

RE: Geofilt and distance measurement problems using SpatialRecursivePrefixTreeFieldType field type

2018-12-21 Thread Peter Lancaster
Hi David,

Thanks for coming back to me.

When using rpt fields I believe you do need to use a space between Lat and Lon 
to indicate a point; for rpt fields commas are used to separate points in a 
polygon. See 
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf
 bottom of page 364.

We do have the Lat Lons the correct way round and the calculations work just 
fine when using a LatLonType it's just very slow which is why I was trying out 
the rpt field type to see how this compared. Yes it's faster using rpt but the 
distance calculations just don't seem to work.

Of course it's not just the distance calculation that's not working the 
filtering doesn't work correctly as a consequence so e.g. fq={!geofilt 
sfield=LatLonRPT__location_rpt pt="53.409490 -2.979677" d=25}  omits Lat Lons 
that are say only 17Km away. It appears that it's just not using a correct 
haversine calculation for distances.

For the moment I'm having to use a point type to filter on because rpt doesn't 
calculate distances correctly, and I'm having to use a bounding box query 
rather than a radius because the performance isn't good enough with radius.

Just wondering if there's something else wrong with our set up which is causing 
this behaviour for rpt fields.

Cheers,
Peter.


-Original Message-
From: David Smiley [mailto:david.w.smi...@gmail.com]
Sent: 21 December 2018 04:44
To: solr-user@lucene.apache.org
Subject: Re: Geofilt and distance measurement problems using 
SpatialRecursivePrefixTreeFieldType field type

Hi Peter,

Use of an RPT field for distance sorting/boosting is to be avoided where 
possible because it's very inefficient at this specific use-case.  Simply use 
LatLonType for this task, and continue to use RPT for the filter/search 
use-case.

Also I see you putting a space between the coordinates instead of a
comma...   yet you have geo (latitude & longitude data) so this is a bit
confusing.  Do "lat,lon".  I think a space will be interpreted as "x y"
(thus reversed).  Perhaps you've mixed up the coordinates and this explains the 
error?  A quick lookup of your sample coordinates suggests to me this is likely 
the problem.  It's a common mistake.

BTW this:
maxDistErr="0.2" distanceUnits="kilometers"
means 200m accuracy (or better).  Is this what you want?  Just checking.

~ David

On Thu, Dec 13, 2018 at 6:38 AM Peter Lancaster < 
peter.lancas...@findmypast.com> wrote:

> I am currently using Solr 5.5.2 and implementing a GeoSpatial search
> that returns results within a radius in Km of a specified LatLon.
> Using a field of type solr.LatLonType and a geofilt query this gives
> good results but is much slower than our regular queries. Using a bbox
> query is faster but of course less accurate.
>
> I then attempted to use a field of type
> solr.SpatialRecursivePrefixTreeFieldType to check performance and
> because I want to be able to do searches within a polygon eventually.
> The field is defined as follows
>
>   class="solr.SpatialRecursivePrefixTreeFieldType"
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> geo="true" distErrPct="0.05" maxDistErr="0.2"
> distanceUnits="kilometers" autoIndex="true"/>
>
>  stored="true" multiValued="false" omitNorms="true" />
>
> I'm just using it to index single points right now. The problem is
> that the distance calculation is not working correctly. It seems to
> overstate the distances for differences in longitude.
>
> For example a query for
> &fl=Id,LatLonRPT__location_rpt,_dist_:geodist()&sfield=LatLonRPT__loca
> tion_rpt&pt=53.409490 -2.979677&query={!geofilt
> sfield=LatLonRPT__location_rpt pt="53.409490 -2.979677" d=25} returns
>
> {
> "Id": "HAR/CH1/80763270",
> "LatLonRPT__location_rpt": "53.2 -2.91",
> "_dist_": 24.295607
> },
> {
> "Id": "HAR/CH42/1918283949",
> "LatLonRPT__location_rpt": "53.393239 -3.028859",
> "_dist_": 5.7587695
> }
>
> The true distances for these results are 23.67 and 3.73 km and other
> results at a true distance of 17 km aren't returned within the 25 km radius.
>
> The explain has the following
>
> +IntersectsPrefixTreeQuery(IntersectsPrefixTreeQuery(fieldName=LatLonR
> +PT__location_rpt,queryShape=Circle(Pt(x=53.40949,y=-2.979677),
> d=0.2° 25.00km),detailLevel=6,prefixGridScanLevel=7))
>
> Is my set up incorrect in some way or is the
> 

RE: terms not to match in a search query

2018-12-14 Thread Peter Lancaster
Hi Tanya,

I think can have a stop filter applied to the query for your field type.


...


You should be aable to use the length filter for the second part of your 
question.



Cheers,
Peter.


-Original Message-
From: Tanya Bompi [mailto:tanya.bo...@gmail.com]
Sent: 13 December 2018 19:54
To: solr-user@lucene.apache.org
Subject: terms not to match in a search query

Hi,
  If there are certain terms in the query like "pvt", "ltd" which I wouldn't 
want to be matched against the index, is there a way to specify the list of 
words that I could set in the configuration and not make it part of the query.

Say, is it possible to add the terms to stopwords.txt or any other file that 
could be treated as a blacklist which at querying time will be taken of.

Also, is there a configuration setting to be able to set a min length of the 
words that should be used in the matching when retrieving the documents? 
Basically any words after tokenization of length < 3 to be ignored.

Kindly let me know.

Thanks,
Tanya


_

Caution: This email originated from outwith the organisation. Do not click on 
links or open attachments unless you recognise the sender and know the content 
is safe 
_


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


Geofilt and distance measurement problems using SpatialRecursivePrefixTreeFieldType field type

2018-12-13 Thread Peter Lancaster
I am currently using Solr 5.5.2 and implementing a GeoSpatial search that 
returns results within a radius in Km of a specified LatLon. Using a field of 
type solr.LatLonType and a geofilt query this gives good results but is much 
slower than our regular queries. Using a bbox query is faster but of course 
less accurate.

I then attempted to use a field of type 
solr.SpatialRecursivePrefixTreeFieldType to check performance and because I 
want to be able to do searches within a polygon eventually. The field is 
defined as follows





I'm just using it to index single points right now. The problem is that the 
distance calculation is not working correctly. It seems to overstate the 
distances for differences in longitude.

For example a query for 
&fl=Id,LatLonRPT__location_rpt,_dist_:geodist()&sfield=LatLonRPT__location_rpt&pt=53.409490
 -2.979677&query={!geofilt sfield=LatLonRPT__location_rpt pt="53.409490 
-2.979677" d=25} returns

{
"Id": "HAR/CH1/80763270",
"LatLonRPT__location_rpt": "53.2 -2.91",
"_dist_": 24.295607
},
{
"Id": "HAR/CH42/1918283949",
"LatLonRPT__location_rpt": "53.393239 -3.028859",
"_dist_": 5.7587695
}

The true distances for these results are 23.67 and 3.73 km and other results at 
a true distance of 17 km aren't returned within the 25 km radius.

The explain has the following

+IntersectsPrefixTreeQuery(IntersectsPrefixTreeQuery(fieldName=LatLonRPT__location_rpt,queryShape=Circle(Pt(x=53.40949,y=-2.979677),
 d=0.2° 25.00km),detailLevel=6,prefixGridScanLevel=7))

Is my set up incorrect in some way or is the 
SpatialRecursivePrefixTreeFieldType not suitable for doing radius searches on 
points in this way?

Thanks in anticipation for any suggestions.

Peter Lancaster.


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__

RE: Filtering solr suggest results

2018-07-03 Thread Peter Lancaster
Hi Arunan,

You can use a context filter query as described 
https://lucene.apache.org/solr/guide/6_6/suggester.html

Cheers,
Peter.

-Original Message-
From: Arunan Sugunakumar [mailto:arunans...@cse.mrt.ac.lk]
Sent: 03 July 2018 12:17
To: solr-user@lucene.apache.org
Subject: Filtering solr suggest results

Hi,

I would like to know whether it is possible to filter the suggestions returned 
by the suggest component according to a field. For example I have a list of 
books published by different publications. I want to show suggestions for a 
book title under a specific publication.

Thanks in Advance,

Arunan

*Sugunakumar Arunan*
Undergraduate - CSE | UOM

Email : aruna ns...@cse.mrt.ac.lk


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


RE: Query redg : diacritics in keyword search

2018-03-29 Thread Peter Lancaster
Hi,

You don't say whether the AsciiFolding filter is at index time or query time. 
In any case you can easily look at what's happening using the admin analysis 
tool which helpfully will even highlight where the analysed query and index 
token match.

That said I'd expect what you want to work if you simply use  on both index and query.

Cheers,
Peter.

-Original Message-
From: Paul, Lulu [mailto:lulu.p...@bl.uk]
Sent: 29 March 2018 12:03
To: solr-user@lucene.apache.org
Subject: Query redg : diacritics in keyword search

Hi,

The keyword search Carré  returns values Carré and Carre (this works well as I 
added the tokenizer  in the schema config to enable returning of both sets 
of values)

Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
work. Solr only returns Carre) – any ideas on how this scenario can be achieved?

Thanks & Best Regards,
Lulu Paul



**
Experience the British Library online at www.bl.uk The 
British Library’s latest Annual Report and Accounts : 
www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. 
www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*
The information contained in this e-mail is confidential and may be legally 
privileged. It is intended for the addressee(s) only. If you are not the 
intended recipient, please delete this e-mail and notify the 
postmas...@bl.uk : The contents of this e-mail must 
not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author 
and do not necessarily reflect those of the British Library. The British 
Library does not take any responsibility for the views of the author.
*
Think before you print


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


RE: How to escape OR or any other keyword in solr

2018-03-27 Thread Peter Lancaster
Hi Raunak,

Are you using a stop word file? That might be why you're getting 0 results 
searching for "OR".

Cheers,
Peter.

-Original Message-
From: RAUNAK AGRAWAL [mailto:agrawal.rau...@gmail.com]
Sent: 27 March 2018 07:45
To: solr-user@lucene.apache.org
Subject: How to escape OR or any other keyword in solr

I have to search for state "OR" [short form for Oregon]. When I am making query 
state:OR, I am getting SolrException since it is recognising it as keyword.

Now I tried with quotes ("") or //OR as well and when doing so..Solr doesn't 
give exception but it also doesn't return any matching document.

Kindly let me know what is the workaround for this issue?

Thanks


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


RE: Got unexpected results.

2018-01-15 Thread Peter Lancaster
Shouldn't the query just be something like title: "to order this report" and 
then it will work.


-Original Message-
From: Sanjeet Kumar [mailto:sanjeetkumar...@gmail.com]
Sent: 15 January 2018 06:20
To: solr-user@lucene.apache.org
Subject: Got unexpected results.

Hi,

I am using Solr-6.4.2, did a query (*title*:("to order this report"~*0*)) on 
"*text_en*" field and matched ("title":"Forrester Research cites SAP Hybris as 
a leader in B2B Order Management report").

As per my understanding, this could not match as there is a word "Management"
between "Order' and "report". Can somebody explain this?.

Thanks.


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


RE: Search suggester - threshold parameter

2017-11-17 Thread Peter Lancaster
Hi Ruby,

The documentation says that threshold is available for the 
HighFrequencyDictionaryFactory implementation. Since you're using 
DocumentDictionaryFactory I guess it will be ignored.

Cheers,
Peter.

-Original Message-
From: ruby [mailto:rshoss...@gmail.com]
Sent: 17 November 2017 15:41
To: solr-user@lucene.apache.org
Subject: Search suggester - threshold parameter

Does any of the phrase suggesters in  Solr 6.1 honor the threshold parameter?

I made following changes to enable phrase suggestion in my environment.
Played with different threshold values but looks like the parameter is not 
being used.


  
mySuggester
FuzzyLookupFactory
suggester_fuzzy_dir



DocumentDictionaryFactory
title
suggestType
false
false
*0.005*
  



  
true
10
mySuggester
  
  
suggest
  




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


RE: Phrase suggester - field limit and order

2017-11-09 Thread Peter Lancaster
Hi,

The weight field in combination with the BlenderType will determine the order, 
so yes you can control the order.

I don't think you can return only the matched phrase, but I would guess that 
highlighting would enable you to pick off the phrase that was matched in your 
client.

Cheers,
Peter.


-Original Message-
From: ruby [mailto:rshoss...@gmail.com]
Sent: 09 November 2017 19:29
To: solr-user@lucene.apache.org
Subject: Phrase suggester - field limit and order

I'm using the BlendedInfixLookupFactory to get phrase suggestions. It returns 
the entire field content. I've tried the others and they do the same.

  AnalyzingInfixSuggester
  BlendedInfixLookupFactory
  DocumentDictionaryFactory
  title
  price
  text_en


Is there a way to only return a fraction of the phrase containing the matched 
phrase? Also is there a way to control in which order the suggestions are 
returned?

Thanks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


RE: Solr 6 and IDF

2017-08-08 Thread Peter Lancaster
Hi Webster,

If you're not worried about using BM25 searcher then you should just be able to 
continue as you were before by providing your own similarity class that extends 
ClassicSimilarity and then override the idf method to always return 1,  then 
reference that in your schema
e.g.


As far as I know you've been able to have different similarities per field in 
solr for a while now. https://wiki.apache.org/solr/SchemaXml#Similarity

Cheers,
Peter Lancaster.


-Original Message-
From: Webster Homer [mailto:webster.ho...@sial.com]
Sent: 08 August 2017 20:39
To: solr-user@lucene.apache.org
Subject: Solr 6 and IDF

Our most common use for solr is searching for products, not text search. My 
company is in the process of migrating away from an Endeca search engine,  the 
goal to keep the business happy is to make sure that search results from the 
different engines be fairly similar, one area that we have found that 
suppresses a result from being as good as it was in the old system is the idf.

We are using Solr 6. After moving to it, a lot of our results got better, but 
idf still seems to deaden some results. Given that our focus is product 
searching I really don't see a need for idf at all. Previous to Solr 6 you 
could suppress idf by providing a custom similarity class. Looking over the 
newer documentation a lot of things have improved, but I'm not sure I see a 
simple way to turn off idf in Solr 6's BM25 searcher.

How do I disable IDF in Solr 6?

We also do have needs for text searching so it would be nice if we could 
suppress IDF on a field or schema level

--


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish 
and Portuguese versions of this disclaimer.


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


RE: SolrCloud - leader updates not updating followers

2017-08-08 Thread Peter Lancaster
Hi Erik,

Thanks for your quick reply. It's given me a few things to research and work on 
tomorrow.

In the meantime, in case it triggers any other thoughts,  just to say that our 
AutoCommit settings are

180
30


1


When I ingest data I don't see data appearing on the follower in the log.

It really seems like the data isn't being sent from the leader. As I said it 
could easily be something stupid that I've done along the way but I can't see 
what it is.

Thanks again,
Peter.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 08 August 2017 18:23
To: solr-user 
Subject: Re: SolrCloud - leader updates not updating followers

This _better_ be a problem with your configuration or all my assumptions are 
false ;)

What are you autocommit settings? The documents should be forwarded to each 
replica from the leader during ingestion. However, they are not visible on the 
follower until a hard commit(openSearcher=true) or soft commit is triggered. 
Long blog on all this here:

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

An easy way to check that docs are being sent to the follower is to tail 
solr.log and send a doc to the leader. You should see the doc arrive at the 
follower. If you do see that then it's your autocommit settings.

NOTE: Solr promises eventual consistency. Due to the fact that autocommits will 
execute at different wall-clock times on the leaders and followers you can be 
out of sync by up to your autocommit interval.

You can force a commit by a URL like
"http://solr:port/solr/collection/update?commit=true"; that will commit on all 
replicas in a collection that may also be useful for seeing if it's an 
autocommit issue.

Best,
Erick

On Tue, Aug 8, 2017 at 9:49 AM, lancasp22  
wrote:
> Hi,
>
> I've recently created a solr cloud on solr 5.5.2 with a separate
> zookeeper cluster. I write to the cloud by posting to update/json and
> the documents appear fine in the leader.
>
> The problem I have is that new documents added to the cloud aren't
> then being automatically applied to the followers of each leader. If I
> query a core then I can get different counts depending on which
> version of the core the query ran over and the solr admin statistics
> page confirms that the followers have fewer documents and are behind the 
> leader.
>
> If I restart the solr core for a follower, it does recover quickly and
> brings itself up-to-date with the leader. Looking at the logs for the
> follower you see that on re-start it identifies the leader and gets
> the changes from that leader.
>
> I think when documents are added to the leader these should be pushed
> to the followers so maybe it's this push process that isn't being triggered.
>
> It’s quite possible that I’ve made a simple error in the set-up but
> I’m baffled as to what it is. Please can anyone advise on any
> configuration that I need to check that might be causing these symptoms.
>
> Thanks in anticipation,
> Peter Lancaster.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-leader-updates-not-updati
> ng-followers-tp4349618.html Sent from the Solr - User mailing list
> archive at Nabble.com.


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__