query rewriting

2017-03-05 Thread Hendrik Haddorp

Hi,

I would like to dynamically modify a query, for example by replacing a 
field name with a different one. Given how complex the query parsing is 
it does look error prone to duplicate that so I would like to work on 
the Lucene Query object model instead. The subclasses of Query look 
relatively simple and easy to rewrite on the Lucene side but on the Solr 
side this does not seem to be the case. Any suggestions on how this 
could be done?


thanks,
Hendrik


sort by function with cursor based result fetching

2017-03-05 Thread Dmitry Kan
Hi,

Solr: 4.10.2


We've noticed a potential bug with fetching results over cursor when
sorting by a function on dynamic date fields. Filed as:
https://issues.apache.org/jira/browse/SOLR-10231

Is there an obvious reason for sorting by function not to work with
cursors? Could this have been fixed in solr 6.x?

Thanks!
-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
Insider Solutions: https://semanticanalyzer.info


Re: FieldName as case insenstive

2017-03-05 Thread Mikhail Khludnev
Hello, Preeti.

Field name it case sensitive. Probably you need to extend default query
parser of case insensitivity, or  check something about aliases in eDisMax,
iirc.

On Mon, Mar 6, 2017 at 9:31 AM, Preeti Bhat 
wrote:

> Hi All,
>
> Did anyone get a chance to look at this?
>
>
> Thanks and Regards,
> Preeti Bhat
>
> From: Preeti Bhat
> Sent: Friday, March 03, 2017 2:47 PM
> To: solr-user
> Subject: FieldName as case insenstive
>
> Hi All,
>
> I have a field named "CompanyName" in one of my collection. When I try to
> search CompanyName:xyz or CompanyName:XYZ it gives me results. But when I
> try companyname:xyz then the result fails. Is there a way to ensure that
> fieldname in solr is case insensitive as the client is going to pass the
> search string along with the fieldname for us.
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
>
> NOTICE TO RECIPIENTS: This communication may contain confidential and/or
> privileged information. If you are not the intended recipient (or have
> received this communication in error) please notify the sender and
> it-supp...@shoregrp.com immediately, and destroy this communication. Any
> unauthorized copying, disclosure or distribution of the material in this
> communication is strictly forbidden. Any views or opinions presented in
> this email are solely those of the author and do not necessarily represent
> those of the company. Finally, the recipient should check this email and
> any attachments for the presence of viruses. The company accepts no
> liability for any damage caused by any virus transmitted by this email.
>
>
>


-- 
Sincerely yours
Mikhail Khludnev


RE: FieldName as case insenstive

2017-03-05 Thread Preeti Bhat
Hi All,

Did anyone get a chance to look at this?


Thanks and Regards,
Preeti Bhat

From: Preeti Bhat
Sent: Friday, March 03, 2017 2:47 PM
To: solr-user
Subject: FieldName as case insenstive

Hi All,

I have a field named "CompanyName" in one of my collection. When I try to 
search CompanyName:xyz or CompanyName:XYZ it gives me results. But when I try 
companyname:xyz then the result fails. Is there a way to ensure that fieldname 
in solr is case insensitive as the client is going to pass the search string 
along with the fieldname for us.


Thanks and Regards,
Preeti Bhat



NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.




Re: Use Solr Suggest to autocomplete words and suggest co-occurences

2017-03-05 Thread Mikhail Khludnev
Hello, Georg!
Have you seen
http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html
?

On Sun, Mar 5, 2017 at 11:43 PM, Georg Sorst  wrote:

> Hi all,
>
> is there a way to get the suggester to autocomplete words and suggest
> co-occurences instead of suggesting complete field values? The behavior I'm
> looking for is quite similar to Google, only based on index values not
> actual queries.
>
> Let's say there are two items in the index:
>
>1. "Adidas running shoe"
>2. "Nike running shoe"
>
> Now when the user types in "running sh" the suggestions should be something
> like:
>
>- "running shoe" (completion)
>- "running shoe adidas" (completion + co-ocurrence)
>- "running shoe nike" (completion + co-ocurrence)
>
> I've actually got this running already through some abomination that abuses
> the facets built on the title field. This works surprisingly well, but I
> can't find a way to make this error-tolerant ("runing sh" with a single "n"
> should provide the same suggestions).
>
> So, any ideas on how to get the suggester do this in a error-tolerant way?
>
> Thanks and all the best,
> Georg
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Use Solr Suggest to autocomplete words and suggest co-occurences

2017-03-05 Thread Joel Bernstein
The significantTerms streaming expression could be useful as a
co-occurrence based suggester. This coming in Solr 6.5 but could be easily
backported to earlier releases. This blog describes how it works:

http://joelsolr.blogspot.com/2017/02/anomaly-detection-in-solr-65.html

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Mar 5, 2017 at 3:43 PM, Georg Sorst  wrote:

> Hi all,
>
> is there a way to get the suggester to autocomplete words and suggest
> co-occurences instead of suggesting complete field values? The behavior I'm
> looking for is quite similar to Google, only based on index values not
> actual queries.
>
> Let's say there are two items in the index:
>
>1. "Adidas running shoe"
>2. "Nike running shoe"
>
> Now when the user types in "running sh" the suggestions should be something
> like:
>
>- "running shoe" (completion)
>- "running shoe adidas" (completion + co-ocurrence)
>- "running shoe nike" (completion + co-ocurrence)
>
> I've actually got this running already through some abomination that abuses
> the facets built on the title field. This works surprisingly well, but I
> can't find a way to make this error-tolerant ("runing sh" with a single "n"
> should provide the same suggestions).
>
> So, any ideas on how to get the suggester do this in a error-tolerant way?
>
> Thanks and all the best,
> Georg
>


Use Solr Suggest to autocomplete words and suggest co-occurences

2017-03-05 Thread Georg Sorst
Hi all,

is there a way to get the suggester to autocomplete words and suggest
co-occurences instead of suggesting complete field values? The behavior I'm
looking for is quite similar to Google, only based on index values not
actual queries.

Let's say there are two items in the index:

   1. "Adidas running shoe"
   2. "Nike running shoe"

Now when the user types in "running sh" the suggestions should be something
like:

   - "running shoe" (completion)
   - "running shoe adidas" (completion + co-ocurrence)
   - "running shoe nike" (completion + co-ocurrence)

I've actually got this running already through some abomination that abuses
the facets built on the title field. This works surprisingly well, but I
can't find a way to make this error-tolerant ("runing sh" with a single "n"
should provide the same suggestions).

So, any ideas on how to get the suggester do this in a error-tolerant way?

Thanks and all the best,
Georg


Re: maxwarmingSearchers and memory leak

2017-03-05 Thread SOLR4189
1) We've actually got 60 to 80 GB of index on the machine (in the image below
you can see that size of index on the machine 82GB, because all index is in
path /opt/solr):
 

2) Our commits runs: autoSoftCommit - each 15 minutes and autoHardCommit -
each 30 minutes
and our commits take 10 seconds only

3) ConcurrentLFUCaches (that you saw in the image in the previous message)
aren't filterCaches, they are fieldValueCaches

4) Solr top:
 

5) We don't know if this related to problem, but all our SOLR servers are
virtual servers.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937p4323509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-05 Thread Joel Bernstein
I believe StanfordCore is licensed under the GPL which means it will be
incompatible with the Apache License. Would it be possible to port to a
different NLP library?

Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Mar 5, 2017 at 12:14 PM, Erick Erickson 
wrote:

> Well, you've taken the first step ;).
>
> Start by going here: https://issues.apache.org/jira/browse/SOLR/ and
> creating a logon and a JIRA.
>
> NOTE: Before you go to the trouble of creating a patch, it's perfectly
> OK to do a high-level overview of the approach you used and see what
> the feedback is. It'll be a short discussion if the licensing is
> incompatible for instance ;).
>
> After that, be ready for some discussion back and forth, reviews and
> the like and we'll see where this goes.
>
> Best,
> Erick
>
> On Sun, Mar 5, 2017 at 4:40 AM, Avtar Singh Mehra 
> wrote:
> > Hello everyone,
> > I have developed project called WiseOwl which is basically a fact based
> > question answering system which can be accessed at :
> > https://github.com/asmehra95/wiseowl
> >
> > In the process of making the project work i have developed pluggable solr
> > filters optimised for solr 6.3.0.
> > I would like to donate them to solr.
> > 1. *WiseOwlStanford Filter* :It uses StanfordCoreNLP to tag named
> entities
> > and it also normalises Dates during indexing or searching. DEmonstration
> > screenshots are available on the github profile. But i don't know how to
> > donate them.
> >
> > If there is a way then please let me know. As it may be useful for anyone
> > doing natural language processing.
>


Re: Setting up to index multiple datastores

2017-03-05 Thread Erick Erickson
bq:  Is each shard/replica/core in fact a separate instance?

No. I'm defining "instance" here as a JVM running Solr. And be careful
here, a "shard" is made up of one or more "replicas". Those replicas
may or may not be distributed amongst separate JVMs/machines. Each
replica of a given shard has the same documents in it.

A "replica" is a specialized "core". The term "replica" is generally
confined to talking about SolrCloud.

So, in SolrCloud a "collection" is made up of one or more "shards".
Each shard is made up of one or more "replicas".
A replica is a specialized "core".
Each Solr instance can host one or more "cores". I've seen hundreds of
cores hosted by a single JVM.

bq: If I'm running on a single machine - would I then have multiple
"cores" listening on multiple ports?

No. They're each address by a separate URL on the same port, i.e.
http://localhost:8983/solr/core1
http://localhost:8983/solr/core2

etc.

If you have more than one JVM on a single machine, _then_ you address
them by different ports.

bq: If so - I'm thinking there'd be no benefit.

It Depends (tm). There's some loss since each core has some overhead.
There's some gain because certain operations (filterCache comes to
mind) operate over all the docs in a core so having one core has some
memory costs. Not to mention that scoring happens over all the docs in
a core, so the response time may be quicker with multiple cores (yes,
fq clauses help with this, but they have their own overhead).


If you're not using SolrCloud, you can use "Transient Cores" to limit
the number of cores in memory at any given point. Smaller heap
required, better performance characteristics. That presupposes that
your usage pattern is "user signs on, searches for a bit and signs
off", i.e. you're not supporting all users searching simultaneously.

Best,
Erick

On Sun, Mar 5, 2017 at 12:13 AM, Daniel Miller  wrote:
> On 3/4/2017 12:00 PM, Shawn Heisey wrote:
>>
>> On 3/3/2017 11:28 PM, Daniel Miller wrote:
>>>
>>> What I think I want is create a single collection, with a
>>> shard/replica/core per user.  Or maybe I'm wanting a separate
>>> collection per user - which would again mean a single
>>> shard/replica/core.  But it seems like each shard/replica/core is a
>>> separate instance.
>>
>> Manual sharding (implicit) is something you can do, but it does mean a
>> LOT of individual cores.  Many shards/replicas can cause just as many
>> performance issues as many collections.
>
>
> Sorry to keep hitting the same point - but I'm still not understanding.  Is
> each shard/replica/core in fact a separate instance?  If I'm running on a
> single machine - would I then have multiple "cores" listening on multiple
> ports?  If so - I'm thinking there'd be no benefit.
>
>>
>>> Without modifying Dovecot source, I can have it generate URL's like,
>>> "http://solr.server.local:8983/solr/dovecot/; (which is what I do now)
>>> or maybe, "http://solr.server.local:8983/solr/dovecot_user/; or even
>>> "http://solr.server.local:8983/solr/dovecot/dovecot_user;.  But I'm
>>> not understanding how, if possible, I can have the indexes created
>>> appropriately to support such access.  The only examples I've seen use
>>> either separate ports or ip's for listeners.
>>
>> If you use shards, the shard name would be a URL parameter, not part of
>> the URL path.  Can Dovecot do that?
>
>
> Not without modifying the source - which may indeed be appropriate. What I'm
> still not clear on (actually there's a lot...) is:
>
> Without using multiple servers for redundancy or distributed search - would
> splitting the index offer any performance benefit?  If not, there's probably
> no point in continuing and digging into Dovecot internals.
>
> Daniel
>


Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-05 Thread Erick Erickson
Well, you've taken the first step ;).

Start by going here: https://issues.apache.org/jira/browse/SOLR/ and
creating a logon and a JIRA.

NOTE: Before you go to the trouble of creating a patch, it's perfectly
OK to do a high-level overview of the approach you used and see what
the feedback is. It'll be a short discussion if the licensing is
incompatible for instance ;).

After that, be ready for some discussion back and forth, reviews and
the like and we'll see where this goes.

Best,
Erick

On Sun, Mar 5, 2017 at 4:40 AM, Avtar Singh Mehra  wrote:
> Hello everyone,
> I have developed project called WiseOwl which is basically a fact based
> question answering system which can be accessed at :
> https://github.com/asmehra95/wiseowl
>
> In the process of making the project work i have developed pluggable solr
> filters optimised for solr 6.3.0.
> I would like to donate them to solr.
> 1. *WiseOwlStanford Filter* :It uses StanfordCoreNLP to tag named entities
> and it also normalises Dates during indexing or searching. DEmonstration
> screenshots are available on the github profile. But i don't know how to
> donate them.
>
> If there is a way then please let me know. As it may be useful for anyone
> doing natural language processing.


Re: Use case for the Shingle Filter

2017-03-05 Thread Ryan Josal
I thought new versions of solr didn't split on whitespace at the query
parser anymore, so this should work?

That being said, I think I remember it having a problem coming after a
synonym filter.  IIRC, if your input is "Foo Bar" and you have a synonym
"foo <=> baz" you would get foobaz bazbar instead of foobar and bazbar.  I
wrote a custom shingler to account for that.

Ryan

On Sun, Mar 5, 2017 at 02:48 Markus Jelsma 
wrote:

> Hello - we use it for text classification and online near-duplicate
> document detection/filtering. Using shingles means you want to consider
> order in the text. It is analogous to using bigrams and trigrams when doing
> language detection, you cannot distinguish between Danish and Norwegian
> solely on single characters.
>
> Markus
>
>
>
> -Original message-
> > From:Ryan Yacyshyn 
> > Sent: Sunday 5th March 2017 5:57
> > To: solr-user@lucene.apache.org
> > Subject: Use case for the Shingle Filter
> >
> > Hi everyone,
> >
> > I was thinking of using the Shingle Filter to help solve an issue I'm
> > facing. I can see this working in the analysis panel in the Solr admin,
> but
> > not when I make my queries.
> >
> > I find out it's because of the query parser splitting up the tokens on
> > white space before passing them along.
> >
> > This made me wonder what a practical use case can be, for using the
> shingle
> > filter?
> >
> > Any enlightenment on this would be much appreciated!
> >
> > Thanks,
> > Ryan
> >
>


Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-05 Thread Caruana, Matthew
Hi Rick,

We already do this with 30 eight-core machines running seven jobs each, working 
off a shared queue. See https://github.com/ICIJ/extract which has been in 
production for almost two years. Originally developed in order to OCR almost 
ten million PDFs and TIFFs from the Panama Papers.

Matthew

> On 5 Mar 2017, at 3:42 pm, Rick Leir  wrote:
> 
> Hi Matthew
> 
> OCR is something which can be parallelized outside of Solr/Tika. Do one OCR 
> task per core, and you can have all cores running at 100%. Write the OCR 
> output to a staging area in the filesystem.
> 
> cheers -- Rick
> 
> 
>> On 2017-03-03 03:00 AM, Caruana, Matthew wrote:
>> This is the current config:
>> 
>> 
>> 100
>> 1
>> > class="org.apache.lucene.index.ConcurrentMergeScheduler" />
>> > class="org.apache.solr.index.TieredMergePolicyFactory">
>> 10
>> 10
>> 
>> 
>> 
>> We index in bulk, so after indexing about 4 million documents over a week 
>> (OCR takes long) we normally end up with about 60-70 segments with this 
>> configuration.
>> 
>>> On 3 Mar 2017, at 02:42, Alexandre Rafalovitch  wrote:
>>> 
>>> What do you have for merge configuration in solrconfig.xml? You should
>>> be able to tune it to - approximately - whatever you want without
>>> doing the grand optimize:
>>> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>>> 
>>> 
 On 2 March 2017 at 16:37, Caruana, Matthew  wrote:
 Yes, we already do it outside Solr. See https://github.com/ICIJ/extract 
 which we developed for this purpose. My guess is that the documents are 
 very large, as you say.
 
 Optimising was always an attempt to bring down the number of segments from 
 60+. Not sure how else to do that.
 
> On 2 Mar 2017, at 7:42 pm, Michael Joyner  wrote:
> 
> You can solve the disk space and time issues by specifying multiple 
> segments to optimize down to instead of a single segment.
> 
> When we reindex we have to optimize or we end up with hundreds of 
> segments and very horrible performance.
> 
> We optimize down to like 16 segments or so and it doesn't do the 3x disk 
> space thing and usually runs in a decent amount of time. (we have >50 
> million articles in one of our solr indexes).
> 
> 
>> On 03/02/2017 10:20 AM, David Hastings wrote:
>> Agreed, and since it takes three times the space is part of the reason it
>> takes so long, so that 190gb index ends up writing another 380 gb until 
>> it
>> compresses down and deletes the two left over files.  its a pretty hefty
>> operation
>> 
>> On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch 
>> 
>> wrote:
>> 
>>> Optimize operation is no longer recommended for Solr, as the
>>> background merges got a lot smarter.
>>> 
>>> It is an extremely expensive operation that can require up to 3-times
>>> amount of disk during the processing.
>>> 
>>> This is not to say yours is a valid question, which I am leaving to
>>> others to respond.
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and 
>>> experienced
>>> 
>>> 
 On 2 March 2017 at 10:04, Caruana, Matthew  wrote:
 I’m currently performing an optimise operation on a ~190GB index with
>>> about 4 million documents. The process has been running for hours.
 This is surprising, because the machine is an EC2 r4.xlarge with four
>>> cores and 30GB of RAM, 24GB of which is allocated to the JVM.
 The load average has been steady at about 1.3. Memory usage is 25% or
>>> less the whole time. iostat reports ~6% util.
 What gives?
 
 Running Solr 6.4.1.
> 


Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-05 Thread Rick Leir

Hi Matthew

OCR is something which can be parallelized outside of Solr/Tika. Do one 
OCR task per core, and you can have all cores running at 100%. Write the 
OCR output to a staging area in the filesystem.


cheers -- Rick


On 2017-03-03 03:00 AM, Caruana, Matthew wrote:

This is the current config:

 
 100
 1
 
 
 10
 10
 
 

We index in bulk, so after indexing about 4 million documents over a week (OCR 
takes long) we normally end up with about 60-70 segments with this 
configuration.


On 3 Mar 2017, at 02:42, Alexandre Rafalovitch  wrote:

What do you have for merge configuration in solrconfig.xml? You should
be able to tune it to - approximately - whatever you want without
doing the grand optimize:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 2 March 2017 at 16:37, Caruana, Matthew  wrote:

Yes, we already do it outside Solr. See https://github.com/ICIJ/extract which 
we developed for this purpose. My guess is that the documents are very large, 
as you say.

Optimising was always an attempt to bring down the number of segments from 60+. 
Not sure how else to do that.


On 2 Mar 2017, at 7:42 pm, Michael Joyner  wrote:

You can solve the disk space and time issues by specifying multiple segments to 
optimize down to instead of a single segment.

When we reindex we have to optimize or we end up with hundreds of segments and 
very horrible performance.

We optimize down to like 16 segments or so and it doesn't do the 3x disk space 
thing and usually runs in a decent amount of time. (we have >50 million 
articles in one of our solr indexes).



On 03/02/2017 10:20 AM, David Hastings wrote:
Agreed, and since it takes three times the space is part of the reason it
takes so long, so that 190gb index ends up writing another 380 gb until it
compresses down and deletes the two left over files.  its a pretty hefty
operation

On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch 
wrote:


Optimize operation is no longer recommended for Solr, as the
background merges got a lot smarter.

It is an extremely expensive operation that can require up to 3-times
amount of disk during the processing.

This is not to say yours is a valid question, which I am leaving to
others to respond.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced



On 2 March 2017 at 10:04, Caruana, Matthew  wrote:
I’m currently performing an optimise operation on a ~190GB index with

about 4 million documents. The process has been running for hours.

This is surprising, because the machine is an EC2 r4.xlarge with four

cores and 30GB of RAM, 24GB of which is allocated to the JVM.

The load average has been steady at about 1.3. Memory usage is 25% or

less the whole time. iostat reports ~6% util.

What gives?

Running Solr 6.4.1.




I want to contribute custom made NLP based solr filters but dont know how.

2017-03-05 Thread Avtar Singh Mehra
Hello everyone,
I have developed project called WiseOwl which is basically a fact based
question answering system which can be accessed at :
https://github.com/asmehra95/wiseowl

In the process of making the project work i have developed pluggable solr
filters optimised for solr 6.3.0.
I would like to donate them to solr.
1. *WiseOwlStanford Filter* :It uses StanfordCoreNLP to tag named entities
and it also normalises Dates during indexing or searching. DEmonstration
screenshots are available on the github profile. But i don't know how to
donate them.

If there is a way then please let me know. As it may be useful for anyone
doing natural language processing.


RE: Use case for the Shingle Filter

2017-03-05 Thread Markus Jelsma
Hello - we use it for text classification and online near-duplicate document 
detection/filtering. Using shingles means you want to consider order in the 
text. It is analogous to using bigrams and trigrams when doing language 
detection, you cannot distinguish between Danish and Norwegian solely on single 
characters.

Markus

 
 
-Original message-
> From:Ryan Yacyshyn 
> Sent: Sunday 5th March 2017 5:57
> To: solr-user@lucene.apache.org
> Subject: Use case for the Shingle Filter
> 
> Hi everyone,
> 
> I was thinking of using the Shingle Filter to help solve an issue I'm
> facing. I can see this working in the analysis panel in the Solr admin, but
> not when I make my queries.
> 
> I find out it's because of the query parser splitting up the tokens on
> white space before passing them along.
> 
> This made me wonder what a practical use case can be, for using the shingle
> filter?
> 
> Any enlightenment on this would be much appreciated!
> 
> Thanks,
> Ryan
> 


Re: Data Import Handler, also "Real Time" index updates

2017-03-05 Thread Damien Kamerman
You could configure the dataimporthandler to not delete at the start
(either do a delta or set the preimportdeltequery), and set a
postimportdeletequery if required.

On Saturday, 4 March 2017, Alexandre Rafalovitch  wrote:

> Commit is index global. So if you have overlapping timelines and commit is
> issued, it will affect all changes done to that point.
>
> So, the aliases may be better for you. You could potentially also reload a
> cure with changes solrconfig.XML settings, but that's heavy on caches.
>
> Regards,
>Alex
>
> On 3 Mar 2017 1:21 PM, "Sales"  >
> wrote:
>
>
> >
> > You have indicated that you have a way to avoid doing updates during the
> > full import.  Because of this, you do have another option that is likely
> > much easier for you to implement:  Set the "commitWithin" parameter on
> > each update request.  This works almost identically to autoSoftCommit,
> > but only after a request is made.  As long as there are never any of
> > these updates during a full import, these commits cannot affect that
> import.
>
> I had attempted at least to say that there may be a few updates that happen
> at the start of an import, so, they are while an import is happening just
> due to timing issues. Those will be detected, and, re-executed once the
> import is done though. But my question here is if the update is using
> commitWithin, then, does that only affect those updates that have the
> parameter, or, does it then also soft commit the in progress import? I
> cannot guarantee that zero updates will be done as there is a timing issue
> at the very start of the import, so, a few could cross over.
>
> Adding commitWithin is fine. Just want to make sure those that might
> execute for the first few seconds of an import don’t kill anything.
> >
> > No matter what is happening, you should have autoCommit (not
> > autoSoftCommit) configured with openSearcher set to false.  This will
> > ensure transaction log rollover, without affecting change visibility.  I
> > recommend a maxTime of one to five minutes for this.  You'll see 15
> > seconds as the recommended value in many places.
> >
> > https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/ <
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-
> and-commit-in-sorlcloud/>
>
> Oh, we are fine with much longer, does not have to be instant. 10-15
> minutes would be fine.
>
> >
> > Thanks
> > Shawn
> >
>


Re: Setting up to index multiple datastores

2017-03-05 Thread Daniel Miller

On 3/4/2017 12:00 PM, Shawn Heisey wrote:

On 3/3/2017 11:28 PM, Daniel Miller wrote:

What I think I want is create a single collection, with a
shard/replica/core per user.  Or maybe I'm wanting a separate
collection per user - which would again mean a single
shard/replica/core.  But it seems like each shard/replica/core is a
separate instance.

Manual sharding (implicit) is something you can do, but it does mean a
LOT of individual cores.  Many shards/replicas can cause just as many
performance issues as many collections.


Sorry to keep hitting the same point - but I'm still not understanding.  
Is each shard/replica/core in fact a separate instance?  If I'm running 
on a single machine - would I then have multiple "cores" listening on 
multiple ports?  If so - I'm thinking there'd be no benefit.





Without modifying Dovecot source, I can have it generate URL's like,
"http://solr.server.local:8983/solr/dovecot/; (which is what I do now)
or maybe, "http://solr.server.local:8983/solr/dovecot_user/; or even
"http://solr.server.local:8983/solr/dovecot/dovecot_user;.  But I'm
not understanding how, if possible, I can have the indexes created
appropriately to support such access.  The only examples I've seen use
either separate ports or ip's for listeners.

If you use shards, the shard name would be a URL parameter, not part of
the URL path.  Can Dovecot do that?


Not without modifying the source - which may indeed be appropriate. What 
I'm still not clear on (actually there's a lot...) is:


Without using multiple servers for redundancy or distributed search - 
would splitting the index offer any performance benefit?  If not, 
there's probably no point in continuing and digging into Dovecot internals.


Daniel