Re: Data Import Handler, also "Real Time" index updates

2017-03-05 Thread Damien Kamerman
You could configure the dataimporthandler to not delete at the start
(either do a delta or set the preimportdeltequery), and set a
postimportdeletequery if required.

On Saturday, 4 March 2017, Alexandre Rafalovitch  wrote:

> Commit is index global. So if you have overlapping timelines and commit is
> issued, it will affect all changes done to that point.
>
> So, the aliases may be better for you. You could potentially also reload a
> cure with changes solrconfig.XML settings, but that's heavy on caches.
>
> Regards,
>Alex
>
> On 3 Mar 2017 1:21 PM, "Sales"  >
> wrote:
>
>
> >
> > You have indicated that you have a way to avoid doing updates during the
> > full import.  Because of this, you do have another option that is likely
> > much easier for you to implement:  Set the "commitWithin" parameter on
> > each update request.  This works almost identically to autoSoftCommit,
> > but only after a request is made.  As long as there are never any of
> > these updates during a full import, these commits cannot affect that
> import.
>
> I had attempted at least to say that there may be a few updates that happen
> at the start of an import, so, they are while an import is happening just
> due to timing issues. Those will be detected, and, re-executed once the
> import is done though. But my question here is if the update is using
> commitWithin, then, does that only affect those updates that have the
> parameter, or, does it then also soft commit the in progress import? I
> cannot guarantee that zero updates will be done as there is a timing issue
> at the very start of the import, so, a few could cross over.
>
> Adding commitWithin is fine. Just want to make sure those that might
> execute for the first few seconds of an import don’t kill anything.
> >
> > No matter what is happening, you should have autoCommit (not
> > autoSoftCommit) configured with openSearcher set to false.  This will
> > ensure transaction log rollover, without affecting change visibility.  I
> > recommend a maxTime of one to five minutes for this.  You'll see 15
> > seconds as the recommended value in many places.
> >
> > https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/ <
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-
> and-commit-in-sorlcloud/>
>
> Oh, we are fine with much longer, does not have to be instant. 10-15
> minutes would be fine.
>
> >
> > Thanks
> > Shawn
> >
>


Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Alexandre Rafalovitch
Commit is index global. So if you have overlapping timelines and commit is
issued, it will affect all changes done to that point.

So, the aliases may be better for you. You could potentially also reload a
cure with changes solrconfig.XML settings, but that's heavy on caches.

Regards,
   Alex

On 3 Mar 2017 1:21 PM, "Sales" 
wrote:


>
> You have indicated that you have a way to avoid doing updates during the
> full import.  Because of this, you do have another option that is likely
> much easier for you to implement:  Set the "commitWithin" parameter on
> each update request.  This works almost identically to autoSoftCommit,
> but only after a request is made.  As long as there are never any of
> these updates during a full import, these commits cannot affect that
import.

I had attempted at least to say that there may be a few updates that happen
at the start of an import, so, they are while an import is happening just
due to timing issues. Those will be detected, and, re-executed once the
import is done though. But my question here is if the update is using
commitWithin, then, does that only affect those updates that have the
parameter, or, does it then also soft commit the in progress import? I
cannot guarantee that zero updates will be done as there is a timing issue
at the very start of the import, so, a few could cross over.

Adding commitWithin is fine. Just want to make sure those that might
execute for the first few seconds of an import don’t kill anything.
>
> No matter what is happening, you should have autoCommit (not
> autoSoftCommit) configured with openSearcher set to false.  This will
> ensure transaction log rollover, without affecting change visibility.  I
> recommend a maxTime of one to five minutes for this.  You'll see 15
> seconds as the recommended value in many places.
>
> https://lucidworks.com/2013/08/23/understanding-
transaction-logs-softcommit-and-commit-in-sorlcloud/ <
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-
and-commit-in-sorlcloud/>

Oh, we are fine with much longer, does not have to be instant. 10-15
minutes would be fine.

>
> Thanks
> Shawn
>


Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales

> 
> You have indicated that you have a way to avoid doing updates during the
> full import.  Because of this, you do have another option that is likely
> much easier for you to implement:  Set the "commitWithin" parameter on
> each update request.  This works almost identically to autoSoftCommit,
> but only after a request is made.  As long as there are never any of
> these updates during a full import, these commits cannot affect that import.

I had attempted at least to say that there may be a few updates that happen at 
the start of an import, so, they are while an import is happening just due to 
timing issues. Those will be detected, and, re-executed once the import is done 
though. But my question here is if the update is using commitWithin, then, does 
that only affect those updates that have the parameter, or, does it then also 
soft commit the in progress import? I cannot guarantee that zero updates will 
be done as there is a timing issue at the very start of the import, so, a few 
could cross over. 

Adding commitWithin is fine. Just want to make sure those that might execute 
for the first few seconds of an import don’t kill anything. 
> 
> No matter what is happening, you should have autoCommit (not
> autoSoftCommit) configured with openSearcher set to false.  This will
> ensure transaction log rollover, without affecting change visibility.  I
> recommend a maxTime of one to five minutes for this.  You'll see 15
> seconds as the recommended value in many places.
> 
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>  
> 

Oh, we are fine with much longer, does not have to be instant. 10-15 minutes 
would be fine.

> 
> Thanks
> Shawn
> 



Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Shawn Heisey
On 3/3/2017 10:17 AM, Sales wrote:
> I am not sure how best to handle this. We use the data import handle to 
> re-sync all our data on a daily basis, takes 1-2 hours depending on system 
> load. It is set up to commit at the end, so, the old index remains until it’s 
> done, and, we lose no access while the import is happening.
>
> But, we now want to update certain fields in the index, but still regen 
> daily. So, it would seem we might need to autocommit, and, soft commit 
> potentially. When we enabled those, during the index, the data disappeared 
> since it kept soft committing during the import process, I see no way to 
> avoid soft commits during the import. But soft commits would appear to be 
> needed for the (non import) updates to the index. 
>
> I realize the import could happen while an update is done, but we can 
> actually avoid those. So, that is not an issue (one or two might go through, 
> but, we will redo those updates once the index is done, that part is all 
> handled.

Erick's solution of using aliases to swap a live index and a build index
is one very good way to go.  It does involve some additional complexity
that you may not be ready for.  Only you will know whether that's
something you can implement easily.  Collection aliasing was implemented
in Solr 4.2 by SOLR-4497, so 4.10 should definitely have it.

You have indicated that you have a way to avoid doing updates during the
full import.  Because of this, you do have another option that is likely
much easier for you to implement:  Set the "commitWithin" parameter on
each update request.  This works almost identically to autoSoftCommit,
but only after a request is made.  As long as there are never any of
these updates during a full import, these commits cannot affect that import.

No matter what is happening, you should have autoCommit (not
autoSoftCommit) configured with openSearcher set to false.  This will
ensure transaction log rollover, without affecting change visibility.  I
recommend a maxTime of one to five minutes for this.  You'll see 15
seconds as the recommended value in many places.

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks
Shawn



Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales

> On Mar 3, 2017, at 11:30 AM, Erick Erickson  wrote:
> 
> One way to handle this (presuming SolrCloud) is collection aliasing.
> You create two collections, c1 and c2. You then have two aliases. when
> you start "index" is aliased to c1 and "search" is aliased to c2. Now
> do your full import  to "index" (and, BTW, you'd be well advised to do
> at least a hard commit openSearcher=false during that time or you risk
> replaying all the docs in the tlog).
> 
> When the full import is done, switch the aliases so "search" points to c1 and
> "index" points to c2. Rinse. Repeat. Your client apps always use the same 
> alias,
> the alias switching makes whether c1 or c2 is being used transparent.
> By that I mean your user-facing app uses "search" and your indexing client
> uses "index".
> 
> You can now do your live updates to the "search" alias that has a soft
> commit set.
> Of course you have to have some mechanism for replaying all the live updates
> that came in when you were doing your full index into the "indexing"
> alias before
> you switch, but you say you have that handled.
> 
> Best,
> Erick
> 

Thanks. So, is this available on 4.10.4? 

If not, we used to gen another core, do the import, and, swap cores so this is 
possibly similar to collection aliases since in the end, the client did not 
care. I don’t see why that would not still work. Took a little effort to 
automate, but, not much. 

Regarding the import and commit, we use in data-config.xml readonly so this 
sets autocommit the way I understand it. Not sure what happens with 
opensearcher though. If that is not sufficient, how would I do hard commit and 
opensearcher false during that time? Surely not by modifying the config file?

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Erick Erickson
One way to handle this (presuming SolrCloud) is collection aliasing.
You create two collections, c1 and c2. You then have two aliases. when
you start "index" is aliased to c1 and "search" is aliased to c2. Now
do your full import  to "index" (and, BTW, you'd be well advised to do
at least a hard commit openSearcher=false during that time or you risk
replaying all the docs in the tlog).

When the full import is done, switch the aliases so "search" points to c1 and
"index" points to c2. Rinse. Repeat. Your client apps always use the same alias,
the alias switching makes whether c1 or c2 is being used transparent.
By that I mean your user-facing app uses "search" and your indexing client
uses "index".

You can now do your live updates to the "search" alias that has a soft
commit set.
Of course you have to have some mechanism for replaying all the live updates
that came in when you were doing your full index into the "indexing"
alias before
you switch, but you say you have that handled.

Best,
Erick

On Fri, Mar 3, 2017 at 9:22 AM, Alexandre Rafalovitch
 wrote:
> On 3 March 2017 at 12:17, Sales  
> wrote:
>> When we enabled those, during the index, the data disappeared since it kept 
>> soft committing during the import process,
>
> This part does not quite make sense. Could you expand on this "data
> disappeared" part to understand what the issue is.
>
> The main issue with "update" is that all fields (apart from pure
> copyField destinations) need to be stored, so the document can be
> reconstructed, updated, re-indexed. Perhaps you have something strange
> happening around that?
>
> Regards,
>Alex.
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced


Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
> 
> On Mar 3, 2017, at 11:22 AM, Alexandre Rafalovitch  wrote:
> 
> On 3 March 2017 at 12:17, Sales  
> wrote:
>> When we enabled those, during the index, the data disappeared since it kept 
>> soft committing during the import process,
> 
> This part does not quite make sense. Could you expand on this "data
> disappeared" part to understand what the issue is.
> 

So, the issue here is the first part of the import handler is to erase all the 
data, so, there are no products left in the index (it would appear based on 
what we see, after the first softcommit), and, a search returns no result at 
first, but, ever increasing number of records while the import is happening. We 
have 6 million indexed products.

I can't find a way to stop soft commits during the import?

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Alexandre Rafalovitch
On 3 March 2017 at 12:17, Sales  wrote:
> When we enabled those, during the index, the data disappeared since it kept 
> soft committing during the import process,

This part does not quite make sense. Could you expand on this "data
disappeared" part to understand what the issue is.

The main issue with "update" is that all fields (apart from pure
copyField destinations) need to be stored, so the document can be
reconstructed, updated, re-indexed. Perhaps you have something strange
happening around that?

Regards,
   Alex.


http://www.solr-start.com/ - Resources for Solr users, new and experienced


Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
I am not sure how best to handle this. We use the data import handle to re-sync 
all our data on a daily basis, takes 1-2 hours depending on system load. It is 
set up to commit at the end, so, the old index remains until it’s done, and, we 
lose no access while the import is happening.

But, we now want to update certain fields in the index, but still regen daily. 
So, it would seem we might need to autocommit, and, soft commit potentially. 
When we enabled those, during the index, the data disappeared since it kept 
soft committing during the import process, I see no way to avoid soft commits 
during the import. But soft commits would appear to be needed for the (non 
import) updates to the index. 

I realize the import could happen while an update is done, but we can actually 
avoid those. So, that is not an issue (one or two might go through, but, we 
will redo those updates once the index is done, that part is all handled.

So, what is the best way to handle “real time” updates (10-15 minutes is fine 
to see the updates in a searcher), yet, also allow dataimport handler to do a 
full clear and regen without losing products (what we index) during the import, 
we don’t want searchers not seeing the data! Have not seen any techniques for 
this. 

Steve