subject:"Indexing in one collection affect index in another collection"

Re: Indexing in one collection affect index in another collection

2019-04-04 Thread Zheng Lin Edwin Yeo

Hi all,

This issue is still surfacing in the new Soir 8.0.0.
Can't really figure out what is the issue, as it occurs also in system with
more memory.

Anyone has any further insights on this?

Regards,
Edwin

On Fri, 15 Feb 2019 at 18:40, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> This issue is also occurring in the new Solr 7.7.0, with only the same
> data size of 20 GB.
>
> Regards,
> Edwin
>
> On Fri, 8 Feb 2019 at 23:53, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Shawn,
>>
>> Thanks for your reply.
>>
>> Although the space in the OS disk cache could be the issue, but we didn't
>> face this problem previously, especially in our other setup using Solr
>> 6.5.1, which contains much more data (more than 1 TB), as compared to our
>> current setup in Solr 7.6.0, in which the data size is only 20 GB.
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On Wed, 6 Feb 2019 at 23:52, Shawn Heisey  wrote:
>>
>>> On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
>>> > Hi everyone,
>>> >
>>> > Does anyone has further updates on this issue?
>>>
>>> It is my strong belief that all the software running on this server
>>> OTHER than Solr is competing with Solr for space in the OS disk cache,
>>> and that Solr's data is getting pushed out of that cache.
>>>
>>> Best guess is that with only one collection, the disk cache was able to
>>> hold onto Solr's data better, and that with another collection present,
>>> there's not enough disk cache space available to cache both of them
>>> effectively.
>>>
>>> I think you're going to need a dedicated machine for Solr, so Solr isn't
>>> competing for system resources.
>>>
>>> Thanks,
>>> Shawn
>>>
>>

Re: Indexing in one collection affect index in another collection

2019-02-15 Thread Zheng Lin Edwin Yeo

Hi Shawn,

This issue is also occurring in the new Solr 7.7.0, with only the same data
size of 20 GB.

Regards,
Edwin

On Fri, 8 Feb 2019 at 23:53, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> Although the space in the OS disk cache could be the issue, but we didn't
> face this problem previously, especially in our other setup using Solr
> 6.5.1, which contains much more data (more than 1 TB), as compared to our
> current setup in Solr 7.6.0, in which the data size is only 20 GB.
>
> Regards,
> Edwin
>
>
>
> On Wed, 6 Feb 2019 at 23:52, Shawn Heisey  wrote:
>
>> On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
>> > Hi everyone,
>> >
>> > Does anyone has further updates on this issue?
>>
>> It is my strong belief that all the software running on this server
>> OTHER than Solr is competing with Solr for space in the OS disk cache,
>> and that Solr's data is getting pushed out of that cache.
>>
>> Best guess is that with only one collection, the disk cache was able to
>> hold onto Solr's data better, and that with another collection present,
>> there's not enough disk cache space available to cache both of them
>> effectively.
>>
>> I think you're going to need a dedicated machine for Solr, so Solr isn't
>> competing for system resources.
>>
>> Thanks,
>> Shawn
>>
>

Re: Indexing in one collection affect index in another collection

2019-02-08 Thread Zheng Lin Edwin Yeo

Hi Shawn,

Thanks for your reply.

Although the space in the OS disk cache could be the issue, but we didn't
face this problem previously, especially in our other setup using Solr
6.5.1, which contains much more data (more than 1 TB), as compared to our
current setup in Solr 7.6.0, in which the data size is only 20 GB.

Regards,
Edwin

On Wed, 6 Feb 2019 at 23:52, Shawn Heisey  wrote:

> On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
> > Hi everyone,
> >
> > Does anyone has further updates on this issue?
>
> It is my strong belief that all the software running on this server
> OTHER than Solr is competing with Solr for space in the OS disk cache,
> and that Solr's data is getting pushed out of that cache.
>
> Best guess is that with only one collection, the disk cache was able to
> hold onto Solr's data better, and that with another collection present,
> there's not enough disk cache space available to cache both of them
> effectively.
>
> I think you're going to need a dedicated machine for Solr, so Solr isn't
> competing for system resources.
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

2019-02-06 Thread Shawn Heisey


On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:

Hi everyone,

Does anyone has further updates on this issue?


It is my strong belief that all the software running on this server 
OTHER than Solr is competing with Solr for space in the OS disk cache, 
and that Solr's data is getting pushed out of that cache.


Best guess is that with only one collection, the disk cache was able to 
hold onto Solr's data better, and that with another collection present, 
there's not enough disk cache space available to cache both of them 
effectively.


I think you're going to need a dedicated machine for Solr, so Solr isn't 
competing for system resources.


Thanks,
Shawn

Re: Indexing in one collection affect index in another collection

2019-02-06 Thread Zheng Lin Edwin Yeo

Hi everyone,

Does anyone has further updates on this issue?
Thank you.

Regards,
Edwin

On Wed, 30 Jan 2019 at 14:17, Zheng Lin Edwin Yeo 
wrote:

> Hi everyone,
>
> We have tried to do the setup and indexing on the latest Solr 7.6.0
>
> However, we faced exactly the same issue as what we faced in Solr 7.5.0,
> in which the search for customers collection slowed down once we indexed
> policies collection.
>
> Regards,
> Edwin
>
> On Wed, 30 Jan 2019 at 01:19, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Paul,
>>
>> Thanks for the reply and suggestion
>>
>> Yes, we have installed RamMap, and are analyzing the results from there.
>> The problem we are facing is that once the query for that collection
>> becomes slow, it will not be fast again even after we restart Solr or the
>> entire machine.
>>
>> Regards,
>> Edwin
>>
>> On Tue, 29 Jan 2019 at 20:30,  wrote:
>>
>>> Hi
>>>
>>> If the reason for the difference in speed is that the index is being
>>> read from disk, I would expect that the first query would be slow, but
>>> subsequent queries on the same collection should speed up. A query on the
>>> other collection could then be slower. In this case I would say that this
>>> is normal behavior. The OS file cache cannot be relied upon to give the
>>> same results in different circumstances, including different software
>>> versions.
>>>
>>> You may wish to install the RamMap tool[1], [2], although you may be
>>> having the inverse problem to that described in [1]. You can then see how
>>> much space is used by the cache and other demands.
>>>
>>> If subsequent queries are fast, then to me it does not seem like a
>>> problem for a development machine.  For production you may wish to store
>>> the indices in ram and/or change from windows to linux, id it is important
>>> that all queries including the first are very fast.
>>>
>>> Have a nice day
>>> Paul
>>>
>>> -Ursprüngliche Nachricht-
>>> Von: Shawn Heisey 
>>> Gesendet: Dienstag, 29. Januar 2019 13:25
>>> An: solr-user@lucene.apache.org
>>> Betreff: Re: Indexing in one collection affect index in another
>>> collection
>>>
>>> On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
>>> > My guess is after we change our searchFields_tcs schema which is:
>>> >
>>> > *From*:
>>> > >> > stored="true" multiValued="true" termVectors="true"
>>> termPositions="true"
>>> > termOffsets="true"/>
>>> >
>>> > *To:*
>>> > >> > stored="true" multiValued="true" storeOffsetsWithPositions="true"
>>> > termVectors="true" termPositions="false" termOffsets="false"/>
>>>
>>> Adding termVectors will make the index bigger.  Potentially much bigger.
>>>   This will increase the overall RAM requirement of the server,
>>> especially if the server is handling software other than Solr.  Anything
>>> that makes the index bigger can affect performance.
>>>
>>> > The above change was done in order to use the Solr recommended unified
>>> > highlighter (Posting with light term vectors) with Solr's
>>> > documentation claimed it is the fastest.
>>> >
>>> > My best guess is Solr 7.5.0 has some bugs that slowed down the whole
>>> > index and queries with the new approach (above new dynamicField
>>> > schema), which it affects the index OS filecaching or any other issues.
>>> >
>>> > So I kindly suggest you look deeper and see whether such bugs are
>>> exists?
>>>
>>> I know almost nothing about highlighting.  I wouldn't be able to look
>>> for bugs.
>>>
>>> Thanks,
>>> Shawn
>>>
>>

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo

Hi everyone,

We have tried to do the setup and indexing on the latest Solr 7.6.0

However, we faced exactly the same issue as what we faced in Solr 7.5.0, in
which the search for customers collection slowed down once we indexed
policies collection.

Regards,
Edwin

On Wed, 30 Jan 2019 at 01:19, Zheng Lin Edwin Yeo 
wrote:

> Hi Paul,
>
> Thanks for the reply and suggestion
>
> Yes, we have installed RamMap, and are analyzing the results from there.
> The problem we are facing is that once the query for that collection
> becomes slow, it will not be fast again even after we restart Solr or the
> entire machine.
>
> Regards,
> Edwin
>
> On Tue, 29 Jan 2019 at 20:30,  wrote:
>
>> Hi
>>
>> If the reason for the difference in speed is that the index is being read
>> from disk, I would expect that the first query would be slow, but
>> subsequent queries on the same collection should speed up. A query on the
>> other collection could then be slower. In this case I would say that this
>> is normal behavior. The OS file cache cannot be relied upon to give the
>> same results in different circumstances, including different software
>> versions.
>>
>> You may wish to install the RamMap tool[1], [2], although you may be
>> having the inverse problem to that described in [1]. You can then see how
>> much space is used by the cache and other demands.
>>
>> If subsequent queries are fast, then to me it does not seem like a
>> problem for a development machine.  For production you may wish to store
>> the indices in ram and/or change from windows to linux, id it is important
>> that all queries including the first are very fast.
>>
>> Have a nice day
>> Paul
>>
>> -Ursprüngliche Nachricht-
>> Von: Shawn Heisey 
>> Gesendet: Dienstag, 29. Januar 2019 13:25
>> An: solr-user@lucene.apache.org
>> Betreff: Re: Indexing in one collection affect index in another collection
>>
>> On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
>> > My guess is after we change our searchFields_tcs schema which is:
>> >
>> > *From*:
>> > > > stored="true" multiValued="true" termVectors="true" termPositions="true"
>> > termOffsets="true"/>
>> >
>> > *To:*
>> > > > stored="true" multiValued="true" storeOffsetsWithPositions="true"
>> > termVectors="true" termPositions="false" termOffsets="false"/>
>>
>> Adding termVectors will make the index bigger.  Potentially much bigger.
>>   This will increase the overall RAM requirement of the server,
>> especially if the server is handling software other than Solr.  Anything
>> that makes the index bigger can affect performance.
>>
>> > The above change was done in order to use the Solr recommended unified
>> > highlighter (Posting with light term vectors) with Solr's
>> > documentation claimed it is the fastest.
>> >
>> > My best guess is Solr 7.5.0 has some bugs that slowed down the whole
>> > index and queries with the new approach (above new dynamicField
>> > schema), which it affects the index OS filecaching or any other issues.
>> >
>> > So I kindly suggest you look deeper and see whether such bugs are
>> exists?
>>
>> I know almost nothing about highlighting.  I wouldn't be able to look for
>> bugs.
>>
>> Thanks,
>> Shawn
>>
>

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo

Hi Paul,

Thanks for the reply and suggestion

Yes, we have installed RamMap, and are analyzing the results from there.
The problem we are facing is that once the query for that collection
becomes slow, it will not be fast again even after we restart Solr or the
entire machine.

Regards,
Edwin

On Tue, 29 Jan 2019 at 20:30,  wrote:

> Hi
>
> If the reason for the difference in speed is that the index is being read
> from disk, I would expect that the first query would be slow, but
> subsequent queries on the same collection should speed up. A query on the
> other collection could then be slower. In this case I would say that this
> is normal behavior. The OS file cache cannot be relied upon to give the
> same results in different circumstances, including different software
> versions.
>
> You may wish to install the RamMap tool[1], [2], although you may be
> having the inverse problem to that described in [1]. You can then see how
> much space is used by the cache and other demands.
>
> If subsequent queries are fast, then to me it does not seem like a problem
> for a development machine.  For production you may wish to store  the
> indices in ram and/or change from windows to linux, id it is important that
> all queries including the first are very fast.
>
> Have a nice day
> Paul
>
> -Ursprüngliche Nachricht-
> Von: Shawn Heisey 
> Gesendet: Dienstag, 29. Januar 2019 13:25
> An: solr-user@lucene.apache.org
> Betreff: Re: Indexing in one collection affect index in another collection
>
> On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
> > My guess is after we change our searchFields_tcs schema which is:
> >
> > *From*:
> >  > stored="true" multiValued="true" termVectors="true" termPositions="true"
> > termOffsets="true"/>
> >
> > *To:*
> >  > stored="true" multiValued="true" storeOffsetsWithPositions="true"
> > termVectors="true" termPositions="false" termOffsets="false"/>
>
> Adding termVectors will make the index bigger.  Potentially much bigger.
>   This will increase the overall RAM requirement of the server, especially
> if the server is handling software other than Solr.  Anything that makes
> the index bigger can affect performance.
>
> > The above change was done in order to use the Solr recommended unified
> > highlighter (Posting with light term vectors) with Solr's
> > documentation claimed it is the fastest.
> >
> > My best guess is Solr 7.5.0 has some bugs that slowed down the whole
> > index and queries with the new approach (above new dynamicField
> > schema), which it affects the index OS filecaching or any other issues.
> >
> > So I kindly suggest you look deeper and see whether such bugs are exists?
>
> I know almost nothing about highlighting.  I wouldn't be able to look for
> bugs.
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo

Hi Shawn,

No worries, and thanks for your clarification.

We make these changes in order to use the Unifed Highlighter, with
hl.offsetSource = POSTING, and add "light" term vectors.

The settings comes from what is written in the Solr guide on highlighting,
which says the following:

*Postings*: Supported by the Unified Highlighter. Set
storeOffsetsWithPositions to true. This adds a moderate amount of extra
data to the index but it speeds up highlighting tremendously, especially
compared to analysis with longer text fields.

However, wildcard queries will fall back to analysis unless "light" term
vectors are added.

   -

   *with Term Vectors (light)*: Supported only by the Unified Highlighter.
   To enable this mode set termVectors to true but no other term vector
   related options on the field being highlighted.

   This adds even more data to the index than just
storeOffsetsWithPositions but
   not as much as enabling all the extra term vector options. Term Vectors are
   only accessed by the highlighter when a wildcard query is used and will
   prevent a fall back to analysis of the stored text.

   This is definitely the fastest option for highlighting wildcard queries
   on large text fields.

Below is the link to the guide:
https://lucene.apache.org/solr/guide/7_5/highlighting.html

Regards,
Edwin

On Tue, 29 Jan 2019 at 20:39, Shawn Heisey  wrote:

> On 1/29/2019 5:25 AM, Shawn Heisey wrote:
> > Adding termVectors will make the index bigger.  Potentially much bigger.
> > This will increase the overall RAM requirement of the server,
> > especially if the server is handling software other than Solr.  Anything
> > that makes the index bigger can affect performance.
>
> I misread the change.  Apologies.  Both definitions have termVectors.  I
> didn't notice it in the second definition because it was on a different
> line than in the first one.
>
> After figuring out what you changed, I cannot figure out what it is
> you're trying to do, and I'm not sure that the settings make sense.
> You've added/changed these three settings:
>
> storeOffsetsWithPositions="true"
> termPositions="false"
> termOffsets="false"
>
> It seems to me that the first new setting is directly contrary to the
> other two new settings.  I really have no idea what the outcome of the
> changes will be.
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey


On 1/29/2019 5:25 AM, Shawn Heisey wrote:
Adding termVectors will make the index bigger.  Potentially much bigger. 
This will increase the overall RAM requirement of the server, 
especially if the server is handling software other than Solr.  Anything 
that makes the index bigger can affect performance.


I misread the change.  Apologies.  Both definitions have termVectors.  I 
didn't notice it in the second definition because it was on a different 
line than in the first one.


After figuring out what you changed, I cannot figure out what it is 
you're trying to do, and I'm not sure that the settings make sense. 
You've added/changed these three settings:


storeOffsetsWithPositions="true"
termPositions="false"
termOffsets="false"

It seems to me that the first new setting is directly contrary to the 
other two new settings.  I really have no idea what the outcome of the 
changes will be.


Thanks,
Shawn

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey


On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:

My guess is after we change our searchFields_tcs schema which is:

*From*:


*To:*



Adding termVectors will make the index bigger.  Potentially much bigger. 
 This will increase the overall RAM requirement of the server, 
especially if the server is handling software other than Solr.  Anything 
that makes the index bigger can affect performance.



The above change was done in order to use the Solr recommended unified
highlighter (Posting with light term vectors) with Solr's documentation
claimed it is the fastest.

My best guess is Solr 7.5.0 has some bugs that slowed down the whole index
and queries with the new approach (above new dynamicField schema), which it
affects the index OS filecaching or any other issues.

So I kindly suggest you look deeper and see whether such bugs are exists?


I know almost nothing about highlighting.  I wouldn't be able to look 
for bugs.


Thanks,
Shawn

AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd

Hi

If the reason for the difference in speed is that the index is being read from 
disk, I would expect that the first query would be slow, but subsequent queries 
on the same collection should speed up. A query on the other collection could 
then be slower. In this case I would say that this is normal behavior. The OS 
file cache cannot be relied upon to give the same results in different 
circumstances, including different software  versions.

You may wish to install the RamMap tool[1], [2], although you may be having the 
inverse problem to that described in [1]. You can then see how much space is 
used by the cache and other demands.

If subsequent queries are fast, then to me it does not seem like a problem for 
a development machine.  For production you may wish to store  the indices in 
ram and/or change from windows to linux, id it is important that all queries 
including the first are very fast.

Have a nice day
Paul

-Ursprüngliche Nachricht-
Von: Shawn Heisey  
Gesendet: Dienstag, 29. Januar 2019 13:25
An: solr-user@lucene.apache.org
Betreff: Re: Indexing in one collection affect index in another collection

On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
> My guess is after we change our searchFields_tcs schema which is:
> 
> *From*:
>  stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
> 
> *To:*
>  stored="true" multiValued="true" storeOffsetsWithPositions="true"
> termVectors="true" termPositions="false" termOffsets="false"/>

Adding termVectors will make the index bigger.  Potentially much bigger. 
  This will increase the overall RAM requirement of the server, especially if 
the server is handling software other than Solr.  Anything that makes the index 
bigger can affect performance.

> The above change was done in order to use the Solr recommended unified 
> highlighter (Posting with light term vectors) with Solr's 
> documentation claimed it is the fastest.
> 
> My best guess is Solr 7.5.0 has some bugs that slowed down the whole 
> index and queries with the new approach (above new dynamicField 
> schema), which it affects the index OS filecaching or any other issues.
> 
> So I kindly suggest you look deeper and see whether such bugs are exists?

I know almost nothing about highlighting.  I wouldn't be able to look for bugs.

Thanks,
Shawn

AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd

References, sorry:

[1] 
https://support.microsoft.com/en-ca/help/976618/you-experience-performance-issues-in-applications-and-services-when-th
[2] https://docs.microsoft.com/en-us/sysinternals/downloads/rammap

-Ursprüngliche Nachricht-
Von: Dodd, Paul Sutton (UB) 
Gesendet: Dienstag, 29. Januar 2019 13:31
An: 'solr-user@lucene.apache.org' 
Betreff: AW: Indexing in one collection affect index in another collection

Hi

If the reason for the difference in speed is that the index is being read from 
disk, I would expect that the first query would be slow, but subsequent queries 
on the same collection should speed up. A query on the other collection could 
then be slower. In this case I would say that this is normal behavior. The OS 
file cache cannot be relied upon to give the same results in different 
circumstances, including different software  versions.

You may wish to install the RamMap tool[1], [2], although you may be having the 
inverse problem to that described in [1]. You can then see how much space is 
used by the cache and other demands.

If subsequent queries are fast, then to me it does not seem like a problem for 
a development machine.  For production you may wish to store  the indices in 
ram and/or change from windows to linux, id it is important that all queries 
including the first are very fast.

Have a nice day
Paul

-Ursprüngliche Nachricht-
Von: Shawn Heisey 
Gesendet: Dienstag, 29. Januar 2019 13:25
An: solr-user@lucene.apache.org
Betreff: Re: Indexing in one collection affect index in another collection

On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
> My guess is after we change our searchFields_tcs schema which is:
> 
> *From*:
>  stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
> 
> *To:*
>  stored="true" multiValued="true" storeOffsetsWithPositions="true"
> termVectors="true" termPositions="false" termOffsets="false"/>

Adding termVectors will make the index bigger.  Potentially much bigger. 
  This will increase the overall RAM requirement of the server, especially if 
the server is handling software other than Solr.  Anything that makes the index 
bigger can affect performance.

> The above change was done in order to use the Solr recommended unified 
> highlighter (Posting with light term vectors) with Solr's 
> documentation claimed it is the fastest.
> 
> My best guess is Solr 7.5.0 has some bugs that slowed down the whole 
> index and queries with the new approach (above new dynamicField 
> schema), which it affects the index OS filecaching or any other issues.
> 
> So I kindly suggest you look deeper and see whether such bugs are exists?

I know almost nothing about highlighting.  I wouldn't be able to look for bugs.

Thanks,
Shawn

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo

Hi Shawn,

Thanks for you reply.

However, we did not delete our index when the screenshot was taken. All the
indexes are still in Solr.

My guess is after we change our searchFields_tcs schema which is:

*From*:

*To:*

The above change was done in order to use the Solr recommended unified
highlighter (Posting with light term vectors) with Solr's documentation
claimed it is the fastest.

My best guess is Solr 7.5.0 has some bugs that slowed down the whole index
and queries with the new approach (above new dynamicField schema), which it
affects the index OS filecaching or any other issues.

So I kindly suggest you look deeper and see whether such bugs are exists?

Note: If you need my schema and configuration files, please refer to my
earlier correspondences in the same thread.

Regards,
Edwin

On Tue, 29 Jan 2019 at 18:38, Shawn Heisey  wrote:

> On 1/26/2019 4:48 PM, Zheng Lin Edwin Yeo wrote:
> > Thanks for your reply. Below are the replies to your email:
> >
> > 1) We have tried to set the heap size to be 8g previously when we faced
> the
> > same issue, and changing to 7g does not help too.
> >
> > 2) We are using standard disk at the moment.
> >
> > 3) In the link is the screenshot of the process list that is sort by
> Commit.
> >
> https://drive.google.com/file/d/1TzxaAqbDJwYO0aHo9GW34p2kncnylRkG/view?usp=sharing
>
> My original thought is still the best idea I have.  I think that the
> other software on the system is heavily using the disk cache and not
> leaving enough of it for Solr's data.
>
>  From what I can tell, the other software on the system is not using
> MMAP for disk access, so the large amount of disk cache usage is not
> reflected in the "Commit" number for those programs.
>
> In the last screenshot, the Solr instances appear to be handling very
> little index data -- the Commit number is actually *smaller* than the
> Working Set number, which will not be the case when there is a lot of
> index data.  I'm betting that at the point when that screenshot was
> taken, all the index data had been deleted, possibly in preparation for
> rebuilding the indexes.
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey


On 1/26/2019 4:48 PM, Zheng Lin Edwin Yeo wrote:

Thanks for your reply. Below are the replies to your email:

1) We have tried to set the heap size to be 8g previously when we faced the
same issue, and changing to 7g does not help too.

2) We are using standard disk at the moment.

3) In the link is the screenshot of the process list that is sort by Commit.
https://drive.google.com/file/d/1TzxaAqbDJwYO0aHo9GW34p2kncnylRkG/view?usp=sharing


My original thought is still the best idea I have.  I think that the 
other software on the system is heavily using the disk cache and not 
leaving enough of it for Solr's data.


From what I can tell, the other software on the system is not using 
MMAP for disk access, so the large amount of disk cache usage is not 
reflected in the "Commit" number for those programs.


In the last screenshot, the Solr instances appear to be handling very 
little index data -- the Commit number is actually *smaller* than the 
Working Set number, which will not be the case when there is a lot of 
index data.  I'm betting that at the point when that screenshot was 
taken, all the index data had been deleted, possibly in preparation for 
rebuilding the indexes.


Thanks,
Shawn

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo

Hi Shawn / Jan,

Do we have any further insights about this problem?
The same problem still happens even after we make the changes and re-index
all the data.

Regards,
Edwin

On Sun, 27 Jan 2019 at 07:48, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> Thanks for your reply. Below are the replies to your email:
>
> 1) We have tried to set the heap size to be 8g previously when we faced
> the same issue, and changing to 7g does not help too.
>
> 2) We are using standard disk at the moment.
>
> 3) In the link is the screenshot of the process list that is sort by
> Commit.
>
> https://drive.google.com/file/d/1TzxaAqbDJwYO0aHo9GW34p2kncnylRkG/view?usp=sharing
>
> Regards,
> Edwin
>
> On Sun, 27 Jan 2019 at 02:07, Shawn Heisey  wrote:
>
>> On 1/26/2019 9:40 AM, Zheng Lin Edwin Yeo wrote:
>> > We have tried to add -a "-XX:+AlwaysPreTouch" that starts Solr, but
>> there
>> > is no noticeable difference in the performance.
>> >
>> > As for the screenshot, I have captured another one after we added  -a
>> > "-XX:+AlwaysPreTouch", and it is sorted on the Working Set column.
>> > Below is the link to the new screenshot:
>> >
>> https://drive.google.com/file/d/1YEsJxnCeRorvBRCSqeowZOu3Fpena5Mo/view?usp=sharing
>>
>> That would mean that it's probably not a heap issue.  You could try
>> increasing the heap size on each Solr instance to 7g as a test to see
>> whether it helps at all.  I'd be a little bit surprised if that helps.
>>
>> I can't tell much about the software other than Solr that's running on
>> this machine, but my best guess at this point is that Solr index
>> information is being pushed out of the disk cache by the other software
>> running on the machine, making it so that when Solr needs to do a query,
>> a lot of information must be read from disk instead of the cache.  Disks
>> are very very slow compared to memory.  SSD is faster, but still quite a
>> bit slower than main memory.
>>
>> What kind of disk are you using?  If it's standard disks, I don't know
>> how easily you could try putting the index data on SSD.  If doing so
>> makes it quite a bit faster, then my suspicion above is probably correct.
>>
>> A "by the way" question:  What do you see if you sort the process list
>> by Commit instead?  Doing this might not reveal anything useful.  Only
>> software using MMAP for file access (which Solr does by default) would
>> show up near the top of that list, so it's possible that a new sort
>> would not reveal anything interesting.
>>
>> Thanks,
>> Shawn
>>
>

Re: Indexing in one collection affect index in another collection

2019-01-26 Thread Zheng Lin Edwin Yeo

Hi Shawn,

Thanks for your reply. Below are the replies to your email:

1) We have tried to set the heap size to be 8g previously when we faced the
same issue, and changing to 7g does not help too.

2) We are using standard disk at the moment.

3) In the link is the screenshot of the process list that is sort by Commit.
https://drive.google.com/file/d/1TzxaAqbDJwYO0aHo9GW34p2kncnylRkG/view?usp=sharing

Regards,
Edwin

On Sun, 27 Jan 2019 at 02:07, Shawn Heisey  wrote:

> On 1/26/2019 9:40 AM, Zheng Lin Edwin Yeo wrote:
> > We have tried to add -a "-XX:+AlwaysPreTouch" that starts Solr, but there
> > is no noticeable difference in the performance.
> >
> > As for the screenshot, I have captured another one after we added  -a
> > "-XX:+AlwaysPreTouch", and it is sorted on the Working Set column.
> > Below is the link to the new screenshot:
> >
> https://drive.google.com/file/d/1YEsJxnCeRorvBRCSqeowZOu3Fpena5Mo/view?usp=sharing
>
> That would mean that it's probably not a heap issue.  You could try
> increasing the heap size on each Solr instance to 7g as a test to see
> whether it helps at all.  I'd be a little bit surprised if that helps.
>
> I can't tell much about the software other than Solr that's running on
> this machine, but my best guess at this point is that Solr index
> information is being pushed out of the disk cache by the other software
> running on the machine, making it so that when Solr needs to do a query,
> a lot of information must be read from disk instead of the cache.  Disks
> are very very slow compared to memory.  SSD is faster, but still quite a
> bit slower than main memory.
>
> What kind of disk are you using?  If it's standard disks, I don't know
> how easily you could try putting the index data on SSD.  If doing so
> makes it quite a bit faster, then my suspicion above is probably correct.
>
> A "by the way" question:  What do you see if you sort the process list
> by Commit instead?  Doing this might not reveal anything useful.  Only
> software using MMAP for file access (which Solr does by default) would
> show up near the top of that list, so it's possible that a new sort
> would not reveal anything interesting.
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

2019-01-26 Thread Shawn Heisey


On 1/26/2019 9:40 AM, Zheng Lin Edwin Yeo wrote:

We have tried to add -a "-XX:+AlwaysPreTouch" that starts Solr, but there
is no noticeable difference in the performance.

As for the screenshot, I have captured another one after we added  -a
"-XX:+AlwaysPreTouch", and it is sorted on the Working Set column.
Below is the link to the new screenshot:
https://drive.google.com/file/d/1YEsJxnCeRorvBRCSqeowZOu3Fpena5Mo/view?usp=sharing


That would mean that it's probably not a heap issue.  You could try 
increasing the heap size on each Solr instance to 7g as a test to see 
whether it helps at all.  I'd be a little bit surprised if that helps.


I can't tell much about the software other than Solr that's running on 
this machine, but my best guess at this point is that Solr index 
information is being pushed out of the disk cache by the other software 
running on the machine, making it so that when Solr needs to do a query, 
a lot of information must be read from disk instead of the cache.  Disks 
are very very slow compared to memory.  SSD is faster, but still quite a 
bit slower than main memory.


What kind of disk are you using?  If it's standard disks, I don't know 
how easily you could try putting the index data on SSD.  If doing so 
makes it quite a bit faster, then my suspicion above is probably correct.


A "by the way" question:  What do you see if you sort the process list 
by Commit instead?  Doing this might not reveal anything useful.  Only 
software using MMAP for file access (which Solr does by default) would 
show up near the top of that list, so it's possible that a new sort 
would not reveal anything interesting.


Thanks,
Shawn

Re: Indexing in one collection affect index in another collection

2019-01-26 Thread Zheng Lin Edwin Yeo

Hi Shawn,

We have tried to add -a "-XX:+AlwaysPreTouch" that starts Solr, but there
is no noticeable difference in the performance.

As for the screenshot, I have captured another one after we added  -a
"-XX:+AlwaysPreTouch", and it is sorted on the Working Set column.
Below is the link to the new screenshot:
https://drive.google.com/file/d/1YEsJxnCeRorvBRCSqeowZOu3Fpena5Mo/view?usp=sharing

Regards,
Edwin


On Sat, 26 Jan 2019 at 01:33, Shawn Heisey  wrote:

> On 1/25/2019 9:11 AM, Zheng Lin Edwin Yeo wrote:
> > As requested, below is the link to the screenshot of the resource monitor
> > of our system.
> >
> https://drive.google.com/file/d/1_-Tqhk9YYp9w8injHU4ZPSvdFJOx8A5s/view?usp=sharing
>
> The wiki page says to sort on the Working Set column.  Your screenshot
> shows it sorted by the Private column.  This might not be a problem, or
> switching it might reveal other information.  For now, I'm going to
> assume that sorting on the other column would show very similar lines in
> nearly the same order.  If you see a very different process listing when
> changing the sort column, I would like to see a new screenshot.
>
> I can see that Java is not allocating the entire allowed heap
> immediately.  This can lead to some odd problems.  See what happens to
> performance if you add this option to the commandline that starts Solr:
>
> -a "-XX:+AlwaysPreTouch"
>
> This option is going to make a noticeable difference in the process
> listing.  I cannot say whether it would help, and it might make things
> worse.  If it does make things worse, then this machine needs more
> memory to do all the jobs it has been asked to do.
>
> Together the two Solr instances are accessing somewhat less than 9GB of
> index data.  But the system shows that 22GB of memory is in the disk
> cache.  This must mean that other software on the machine is accessing a
> very large amount of other data.  When multiple applications compete for
> space in the disk cache, slowdowns will happen.
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Shawn Heisey


On 1/25/2019 9:11 AM, Zheng Lin Edwin Yeo wrote:

As requested, below is the link to the screenshot of the resource monitor
of our system.
https://drive.google.com/file/d/1_-Tqhk9YYp9w8injHU4ZPSvdFJOx8A5s/view?usp=sharing


The wiki page says to sort on the Working Set column.  Your screenshot 
shows it sorted by the Private column.  This might not be a problem, or 
switching it might reveal other information.  For now, I'm going to 
assume that sorting on the other column would show very similar lines in 
nearly the same order.  If you see a very different process listing when 
changing the sort column, I would like to see a new screenshot.


I can see that Java is not allocating the entire allowed heap 
immediately.  This can lead to some odd problems.  See what happens to 
performance if you add this option to the commandline that starts Solr:


-a "-XX:+AlwaysPreTouch"

This option is going to make a noticeable difference in the process 
listing.  I cannot say whether it would help, and it might make things 
worse.  If it does make things worse, then this machine needs more 
memory to do all the jobs it has been asked to do.


Together the two Solr instances are accessing somewhat less than 9GB of 
index data.  But the system shows that 22GB of memory is in the disk 
cache.  This must mean that other software on the machine is accessing a 
very large amount of other data.  When multiple applications compete for 
space in the disk cache, slowdowns will happen.


Thanks,
Shawn

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo

Hi Jorn,

I have set the heap size to 6GB, and the system has 32GB of RAM.

The data is indexed from CSV file, so each field's data is like database
type of data. Only the searchFields may have more data as it contains the
important fields of the collection. But then again it is not as large as
things like contents from emails.

The index size is currently only 3GB, which shouldn't be considered big to
split the collections to different nodes?

Regards,
Edwin

On Fri, 25 Jan 2019 at 20:49, Jörn Franke  wrote:

> Have you done a correct sizing wrt to memory / CPU?
>
> Check also the data model if you have a lot of queried stored fields that
> may contain a lot of data.
>
> You may also split those two collections on different nodes.
>
> > Am 23.01.2019 um 18:01 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
> > indexing in collection2, the indexing affects the records in collection1.
> > Although the records are still intact, it seems that the settings of the
> > termVecotrs get wipe out, and the index size of collection1 reduced from
> > 3.3GB to 2.1GB after I do the indexing in collection2. Also, the search
> in
> > collection1, which was originall very fast, becomes very slow after the
> > indexing is done is collection2.
> >
> > Anyone has faced such issues before or have any idea on what may have
> gone
> > wrong?
> >
> > Regards,
> > Edwin
>

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo

Hi Shawn,

As requested, below is the link to the screenshot of the resource monitor
of our system.
https://drive.google.com/file/d/1_-Tqhk9YYp9w8injHU4ZPSvdFJOx8A5s/view?usp=sharing

Regards,
Edwin

On Fri, 25 Jan 2019 at 23:35, Shawn Heisey  wrote:

> On 1/25/2019 7:47 AM, Zheng Lin Edwin Yeo wrote:
> > Below is the command that we used to start Solr:
> >
> > cd solr-7.5.0
> > bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z
> > "localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
> > pause
>
> Can you gather the screenshot mentioned here and send us a link to a
> file sharing site so we can view the screenshot?
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_Windows
>
> Thanks,
> Shawn
>

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Shawn Heisey


On 1/25/2019 7:47 AM, Zheng Lin Edwin Yeo wrote:

Below is the command that we used to start Solr:

cd solr-7.5.0
bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z
"localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
pause


Can you gather the screenshot mentioned here and send us a link to a 
file sharing site so we can view the screenshot?


https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_Windows

Thanks,
Shawn

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo

Hi Jan,

Below is the command that we used to start Solr:

cd solr-7.5.0
bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z
"localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
pause


We also have a replica, and in this development setting, we put it in the
same PC to simulate it.
Below is the command to start the replica, which uses Port 8984.

cd solr-7.5.0
bin\solr.cmd start -cloud -p 8984 -s solrMain\node2 -m 6g -z
"localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
pause

Regards,
Edwin

On Fri, 25 Jan 2019 at 22:35, Jan Høydahl  wrote:

> How do you start Solr, cause the solr.in.cmd you sent does not contain the
> memory settings. What other parameters do you start Solr with?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 25. jan. 2019 kl. 15:28 skrev Zheng Lin Edwin Yeo  >:
> >
> > Hi Jan,
> >
> > We are using 64 bit Java, version 1.8.0_191.
> > We started Solr with 6 GB heap size.
> >
> > Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++
> running
> > on the machine. There is still 22 GB of memory left on the server, out of
> > the 32 GB available on the machine.
> >
> > Regards,
> > Edwin
> >
> > On Fri, 25 Jan 2019 at 21:04, Jan Høydahl  wrote:
> >
> >> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
> >> size?
> >> Other software running on the machine?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>> :
> >>>
> >>> Hi Jan and Shawn,
> >>>
> >>> For your info, this is another debug query.
> >>>
> >>> "debug":{
> >>>
> >>>   "rawquerystring":"johnny",
> >>>
> >>>   "querystring":"johnny",
> >>>
> >>>   "parsedquery":"searchFields_tcs:johnny",
> >>>
> >>>   "parsedquery_toString":"searchFields_tcs:johnny",
> >>>
> >>>   "explain":{
> >>>
> >>> "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
> >>> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
> >>> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n7.5108404
> >>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> >>> 0.5)) from:\n  473.0 = docFreq\n  865438.0 = docCount\n
> >>> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> >>> b + b * fieldLength / avgFieldLength)) from:\n  4.0 =
> >>> termFreq=4.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
> >>> 26.66791 = avgFieldLength\n  25.0 = fieldLength\n”,
> >>>
> >>>   "QParser":"LuceneQParser",
> >>>
> >>>   "timing":{
> >>>
> >>> "time":350.0,
> >>>
> >>> "prepare":{
> >>>
> >>>   "time":0.0,
> >>>
> >>>   "query":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "facet":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "facet_module":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "mlt":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "highlight":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "stats":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "expand":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "terms":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "debug":{
> >>>
> >>> "time":0.0}},
> >>>
> >>> "process":{
> >>>
> >>>   "time":348.0,
> >>>
> >>>   "query":{
> >>>
> >>> "time":287.0},
> >>>
> >>>   "facet":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "facet_module":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "mlt":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "highlight":{
> >>>
> >>> "time":54.0},
> >>>
> >>>   "stats":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "expand":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "terms":{
> >>>
> >>> "time":0.0},
> >>>
> >>>   "debug":{
> >>>
> >>> "time":6.0}},
> >>>
> >>> "loadFieldValues":{
> >>>
> >>>   "time":0.0
> >>>
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >>> wrote:
> >>>
>  Hi Jan and Shawn,
> 
>  Please focus on the strange issue that I have described above in more
>  details, summary is as follows:
> 
>  1. Index customers data, then queries from highlight, select, and all
>  handlers are very fast (less than 50ms)
> 
>  2. Now index policies data, then queries on polices are very fast BUT
>  queries on customers becomes slow
> 
>  3. If I reindex customers data, then again queries for customers are
> >> very
>  fast BUT queries on policies becomes slow.
> 
>  How can you explain this behavior?
> 
>  We have never experienced such a strange behavior before Solr 7.
> 
>  Regards,
>  Edwin
> 
>  On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
>  wrote:
> 
> > Hi Jan,
> >
> > Referring to

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Jan Høydahl

How do you start Solr, cause the solr.in.cmd you sent does not contain the 
memory settings. What other parameters do you start Solr with?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jan. 2019 kl. 15:28 skrev Zheng Lin Edwin Yeo :
> 
> Hi Jan,
> 
> We are using 64 bit Java, version 1.8.0_191.
> We started Solr with 6 GB heap size.
> 
> Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++  running
> on the machine. There is still 22 GB of memory left on the server, out of
> the 32 GB available on the machine.
> 
> Regards,
> Edwin
> 
> On Fri, 25 Jan 2019 at 21:04, Jan Høydahl  wrote:
> 
>> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
>> size?
>> Other software running on the machine?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo >> :
>>> 
>>> Hi Jan and Shawn,
>>> 
>>> For your info, this is another debug query.
>>> 
>>> "debug":{
>>> 
>>>   "rawquerystring":"johnny",
>>> 
>>>   "querystring":"johnny",
>>> 
>>>   "parsedquery":"searchFields_tcs:johnny",
>>> 
>>>   "parsedquery_toString":"searchFields_tcs:johnny",
>>> 
>>>   "explain":{
>>> 
>>> "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
>>> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
>>> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n7.5108404
>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>> 0.5)) from:\n  473.0 = docFreq\n  865438.0 = docCount\n
>>> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
>>> b + b * fieldLength / avgFieldLength)) from:\n  4.0 =
>>> termFreq=4.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
>>> 26.66791 = avgFieldLength\n  25.0 = fieldLength\n”,
>>> 
>>>   "QParser":"LuceneQParser",
>>> 
>>>   "timing":{
>>> 
>>> "time":350.0,
>>> 
>>> "prepare":{
>>> 
>>>   "time":0.0,
>>> 
>>>   "query":{
>>> 
>>> "time":0.0},
>>> 
>>>   "facet":{
>>> 
>>> "time":0.0},
>>> 
>>>   "facet_module":{
>>> 
>>> "time":0.0},
>>> 
>>>   "mlt":{
>>> 
>>> "time":0.0},
>>> 
>>>   "highlight":{
>>> 
>>> "time":0.0},
>>> 
>>>   "stats":{
>>> 
>>> "time":0.0},
>>> 
>>>   "expand":{
>>> 
>>> "time":0.0},
>>> 
>>>   "terms":{
>>> 
>>> "time":0.0},
>>> 
>>>   "debug":{
>>> 
>>> "time":0.0}},
>>> 
>>> "process":{
>>> 
>>>   "time":348.0,
>>> 
>>>   "query":{
>>> 
>>> "time":287.0},
>>> 
>>>   "facet":{
>>> 
>>> "time":0.0},
>>> 
>>>   "facet_module":{
>>> 
>>> "time":0.0},
>>> 
>>>   "mlt":{
>>> 
>>> "time":0.0},
>>> 
>>>   "highlight":{
>>> 
>>> "time":54.0},
>>> 
>>>   "stats":{
>>> 
>>> "time":0.0},
>>> 
>>>   "expand":{
>>> 
>>> "time":0.0},
>>> 
>>>   "terms":{
>>> 
>>> "time":0.0},
>>> 
>>>   "debug":{
>>> 
>>> "time":6.0}},
>>> 
>>> "loadFieldValues":{
>>> 
>>>   "time":0.0
>>> 
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
>>> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo 
>>> wrote:
>>> 
 Hi Jan and Shawn,
 
 Please focus on the strange issue that I have described above in more
 details, summary is as follows:
 
 1. Index customers data, then queries from highlight, select, and all
 handlers are very fast (less than 50ms)
 
 2. Now index policies data, then queries on polices are very fast BUT
 queries on customers becomes slow
 
 3. If I reindex customers data, then again queries for customers are
>> very
 fast BUT queries on policies becomes slow.
 
 How can you explain this behavior?
 
 We have never experienced such a strange behavior before Solr 7.
 
 Regards,
 Edwin
 
 On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo >> 
 wrote:
 
> Hi Jan,
> 
> Referring to what you have mentioned that the highlighting takes up
>> most
> of the time in the first query from the policies collection, the
> highlighting was very fast (less than 50ms) from the time it was
>> indexed,
> till the time after customers collection gets indexed, in which it
>> slowed
> down tremendously.
> 
> Also, the slow down does not just affect on the highlight
>> requestHandler.
> It also affects other requestHandler like select and clustering as
>> well.
> All of them gets the QTime of more than 500ms. This is also proven in
>> the
> latest debug query that we have sent earlier, in which we have set
>> hl=false
> and fl=null.
> 
> Any idea or explanation on this strange behavior?
> Thank you for your support, as we look forward to shed some lights on
> this issue and to resolve it.
> 
> Regards,
> Edwin
> 
> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <
>>

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo

Hi Jan,

We are using 64 bit Java, version 1.8.0_191.
We started Solr with 6 GB heap size.

Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++  running
on the machine. There is still 22 GB of memory left on the server, out of
the 32 GB available on the machine.

Regards,
Edwin

On Fri, 25 Jan 2019 at 21:04, Jan Høydahl  wrote:

> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
> size?
> Other software running on the machine?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo  >:
> >
> > Hi Jan and Shawn,
> >
> > For your info, this is another debug query.
> >
> >  "debug":{
> >
> >"rawquerystring":"johnny",
> >
> >"querystring":"johnny",
> >
> >"parsedquery":"searchFields_tcs:johnny",
> >
> >"parsedquery_toString":"searchFields_tcs:johnny",
> >
> >"explain":{
> >
> >  "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
> > 75730) [SchemaSimilarity], result of:\n  12.8497505 =
> > score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n7.5108404
> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> > 0.5)) from:\n  473.0 = docFreq\n  865438.0 = docCount\n
> > 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> > b + b * fieldLength / avgFieldLength)) from:\n  4.0 =
> > termFreq=4.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
> > 26.66791 = avgFieldLength\n  25.0 = fieldLength\n”,
> >
> >"QParser":"LuceneQParser",
> >
> >"timing":{
> >
> >  "time":350.0,
> >
> >  "prepare":{
> >
> >"time":0.0,
> >
> >"query":{
> >
> >  "time":0.0},
> >
> >"facet":{
> >
> >  "time":0.0},
> >
> >"facet_module":{
> >
> >  "time":0.0},
> >
> >"mlt":{
> >
> >  "time":0.0},
> >
> >"highlight":{
> >
> >  "time":0.0},
> >
> >"stats":{
> >
> >  "time":0.0},
> >
> >"expand":{
> >
> >  "time":0.0},
> >
> >"terms":{
> >
> >  "time":0.0},
> >
> >"debug":{
> >
> >  "time":0.0}},
> >
> >  "process":{
> >
> >"time":348.0,
> >
> >"query":{
> >
> >  "time":287.0},
> >
> >"facet":{
> >
> >  "time":0.0},
> >
> >"facet_module":{
> >
> >  "time":0.0},
> >
> >"mlt":{
> >
> >  "time":0.0},
> >
> >"highlight":{
> >
> >  "time":54.0},
> >
> >"stats":{
> >
> >  "time":0.0},
> >
> >"expand":{
> >
> >  "time":0.0},
> >
> >"terms":{
> >
> >  "time":0.0},
> >
> >"debug":{
> >
> >  "time":6.0}},
> >
> >  "loadFieldValues":{
> >
> >"time":0.0
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi Jan and Shawn,
> >>
> >> Please focus on the strange issue that I have described above in more
> >> details, summary is as follows:
> >>
> >> 1. Index customers data, then queries from highlight, select, and all
> >> handlers are very fast (less than 50ms)
> >>
> >> 2. Now index policies data, then queries on polices are very fast BUT
> >> queries on customers becomes slow
> >>
> >> 3. If I reindex customers data, then again queries for customers are
> very
> >> fast BUT queries on policies becomes slow.
> >>
> >> How can you explain this behavior?
> >>
> >> We have never experienced such a strange behavior before Solr 7.
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo  >
> >> wrote:
> >>
> >>> Hi Jan,
> >>>
> >>> Referring to what you have mentioned that the highlighting takes up
> most
> >>> of the time in the first query from the policies collection, the
> >>> highlighting was very fast (less than 50ms) from the time it was
> indexed,
> >>> till the time after customers collection gets indexed, in which it
> slowed
> >>> down tremendously.
> >>>
> >>> Also, the slow down does not just affect on the highlight
> requestHandler.
> >>> It also affects other requestHandler like select and clustering as
> well.
> >>> All of them gets the QTime of more than 500ms. This is also proven in
> the
> >>> latest debug query that we have sent earlier, in which we have set
> hl=false
> >>> and fl=null.
> >>>
> >>> Any idea or explanation on this strange behavior?
> >>> Thank you for your support, as we look forward to shed some lights on
> >>> this issue and to resolve it.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >>> wrote:
> >>>
>  Hi Jan,
> 
>  Thanks for your reply.
> 
>  However, we are still getting a slow QTime of 517ms even after we set
>  hl=false=null.
> 
>  Below is the debug query:
> 
>   "debug":{
> "rawquerystring":"cherry",
> "querystring":"cherry",
>

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Jan Høydahl

Which java version? 32 or 64 bit? You start Solr with default 512Mb heap size?
Other software running on the machine?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo :
> 
> Hi Jan and Shawn,
> 
> For your info, this is another debug query.
> 
>  "debug":{
> 
>"rawquerystring":"johnny",
> 
>"querystring":"johnny",
> 
>"parsedquery":"searchFields_tcs:johnny",
> 
>"parsedquery_toString":"searchFields_tcs:johnny",
> 
>"explain":{
> 
>  "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n7.5108404
> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> 0.5)) from:\n  473.0 = docFreq\n  865438.0 = docCount\n
> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:\n  4.0 =
> termFreq=4.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
> 26.66791 = avgFieldLength\n  25.0 = fieldLength\n”,
> 
>"QParser":"LuceneQParser",
> 
>"timing":{
> 
>  "time":350.0,
> 
>  "prepare":{
> 
>"time":0.0,
> 
>"query":{
> 
>  "time":0.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":0.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":0.0}},
> 
>  "process":{
> 
>"time":348.0,
> 
>"query":{
> 
>  "time":287.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":54.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":6.0}},
> 
>  "loadFieldValues":{
> 
>"time":0.0
> 
> 
> Regards,
> Edwin
> 
> 
> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi Jan and Shawn,
>> 
>> Please focus on the strange issue that I have described above in more
>> details, summary is as follows:
>> 
>> 1. Index customers data, then queries from highlight, select, and all
>> handlers are very fast (less than 50ms)
>> 
>> 2. Now index policies data, then queries on polices are very fast BUT
>> queries on customers becomes slow
>> 
>> 3. If I reindex customers data, then again queries for customers are very
>> fast BUT queries on policies becomes slow.
>> 
>> How can you explain this behavior?
>> 
>> We have never experienced such a strange behavior before Solr 7.
>> 
>> Regards,
>> Edwin
>> 
>> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo 
>> wrote:
>> 
>>> Hi Jan,
>>> 
>>> Referring to what you have mentioned that the highlighting takes up most
>>> of the time in the first query from the policies collection, the
>>> highlighting was very fast (less than 50ms) from the time it was indexed,
>>> till the time after customers collection gets indexed, in which it slowed
>>> down tremendously.
>>> 
>>> Also, the slow down does not just affect on the highlight requestHandler.
>>> It also affects other requestHandler like select and clustering as well.
>>> All of them gets the QTime of more than 500ms. This is also proven in the
>>> latest debug query that we have sent earlier, in which we have set hl=false
>>> and fl=null.
>>> 
>>> Any idea or explanation on this strange behavior?
>>> Thank you for your support, as we look forward to shed some lights on
>>> this issue and to resolve it.
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo 
>>> wrote:
>>> 
 Hi Jan,
 
 Thanks for your reply.
 
 However, we are still getting a slow QTime of 517ms even after we set
 hl=false=null.
 
 Below is the debug query:
 
  "debug":{
"rawquerystring":"cherry",
"querystring":"cherry",
"parsedquery":"searchFields_tcs:cherry",
"parsedquery_toString":"searchFields_tcs:cherry",
"explain":{
  "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) 
 [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = 
 termFreq=3.0\n), product of:\n9.614556 = idf, computed as log(1 + 
 (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  400.0 = 
 docFreq\n  600.0 = docCount\n1.4798305 = tfNorm, computed as 
 (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / 
 avgFieldLength)) from:\n  3.0 = termFreq=3.0\n  1.2 = parameter 
 k1\n  0.75 = parameter b\n  19.397041 =

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Jörn Franke

Have you done a correct sizing wrt to memory / CPU?

Check also the data model if you have a lot of queried stored fields that may 
contain a lot of data. 

You may also split those two collections on different nodes.

> Am 23.01.2019 um 18:01 schrieb Zheng Lin Edwin Yeo :
> 
> Hi,
> 
> I am using Solr 7.5.0, and currently I am facing an issue of when I am
> indexing in collection2, the indexing affects the records in collection1.
> Although the records are still intact, it seems that the settings of the
> termVecotrs get wipe out, and the index size of collection1 reduced from
> 3.3GB to 2.1GB after I do the indexing in collection2. Also, the search in
> collection1, which was originall very fast, becomes very slow after the
> indexing is done is collection2.
> 
> Anyone has faced such issues before or have any idea on what may have gone
> wrong?
> 
> Regards,
> Edwin

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo

Hi Jan and Shawn,

For your info, this is another debug query.

  "debug":{

"rawquerystring":"johnny",

"querystring":"johnny",

"parsedquery":"searchFields_tcs:johnny",

"parsedquery_toString":"searchFields_tcs:johnny",

"explain":{

  "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
75730) [SchemaSimilarity], result of:\n  12.8497505 =
score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n7.5108404
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
0.5)) from:\n  473.0 = docFreq\n  865438.0 = docCount\n
1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:\n  4.0 =
termFreq=4.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
 26.66791 = avgFieldLength\n  25.0 = fieldLength\n”,

"QParser":"LuceneQParser",

"timing":{

  "time":350.0,

  "prepare":{

"time":0.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":348.0,

"query":{

  "time":287.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":54.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":6.0}},

  "loadFieldValues":{

"time":0.0


Regards,
Edwin


On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo 
wrote:

> Hi Jan and Shawn,
>
> Please focus on the strange issue that I have described above in more
> details, summary is as follows:
>
> 1. Index customers data, then queries from highlight, select, and all
> handlers are very fast (less than 50ms)
>
> 2. Now index policies data, then queries on polices are very fast BUT
> queries on customers becomes slow
>
> 3. If I reindex customers data, then again queries for customers are very
> fast BUT queries on policies becomes slow.
>
> How can you explain this behavior?
>
> We have never experienced such a strange behavior before Solr 7.
>
> Regards,
> Edwin
>
> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Jan,
>>
>> Referring to what you have mentioned that the highlighting takes up most
>> of the time in the first query from the policies collection, the
>> highlighting was very fast (less than 50ms) from the time it was indexed,
>> till the time after customers collection gets indexed, in which it slowed
>> down tremendously.
>>
>> Also, the slow down does not just affect on the highlight requestHandler.
>> It also affects other requestHandler like select and clustering as well.
>> All of them gets the QTime of more than 500ms. This is also proven in the
>> latest debug query that we have sent earlier, in which we have set hl=false
>> and fl=null.
>>
>> Any idea or explanation on this strange behavior?
>> Thank you for your support, as we look forward to shed some lights on
>> this issue and to resolve it.
>>
>> Regards,
>> Edwin
>>
>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo 
>> wrote:
>>
>>> Hi Jan,
>>>
>>> Thanks for your reply.
>>>
>>> However, we are still getting a slow QTime of 517ms even after we set
>>> hl=false=null.
>>>
>>> Below is the debug query:
>>>
>>>   "debug":{
>>> "rawquerystring":"cherry",
>>> "querystring":"cherry",
>>> "parsedquery":"searchFields_tcs:cherry",
>>> "parsedquery_toString":"searchFields_tcs:cherry",
>>> "explain":{
>>>   "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) 
>>> [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = 
>>> termFreq=3.0\n), product of:\n9.614556 = idf, computed as log(1 + 
>>> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  400.0 = 
>>> docFreq\n  600.0 = docCount\n1.4798305 = tfNorm, computed as 
>>> (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / 
>>> avgFieldLength)) from:\n  3.0 = termFreq=3.0\n  1.2 = parameter 
>>> k1\n  0.75 = parameter b\n  19.397041 = avgFieldLength\n  25.0 
>>> = fieldLength\n",
>>>   "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) 
>>> [SchemaSimilarity], result of:\n  13.937909 = score(doc=4840794,freq=3.0 = 
>>> termFreq=3.0\n), product of:\n9.614556 = idf, computed as log(1 + 
>>> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  400.0 = 
>>> docFreq\n  600.0 = docCount\n1.4496675 = tfNorm, computed as 
>>> (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / 
>>> avgFieldLength)) from:\n

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo

Hi Jan and Shawn,

Please focus on the strange issue that I have described above in more
details, summary is as follows:

1. Index customers data, then queries from highlight, select, and all
handlers are very fast (less than 50ms)

2. Now index policies data, then queries on polices are very fast BUT
queries on customers becomes slow

3. If I reindex customers data, then again queries for customers are very
fast BUT queries on policies becomes slow.

How can you explain this behavior?

We have never experienced such a strange behavior before Solr 7.

Regards,
Edwin

On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo 
wrote:

> Hi Jan,
>
> Referring to what you have mentioned that the highlighting takes up most
> of the time in the first query from the policies collection, the
> highlighting was very fast (less than 50ms) from the time it was indexed,
> till the time after customers collection gets indexed, in which it slowed
> down tremendously.
>
> Also, the slow down does not just affect on the highlight requestHandler.
> It also affects other requestHandler like select and clustering as well.
> All of them gets the QTime of more than 500ms. This is also proven in the
> latest debug query that we have sent earlier, in which we have set hl=false
> and fl=null.
>
> Any idea or explanation on this strange behavior?
> Thank you for your support, as we look forward to shed some lights on this
> issue and to resolve it.
>
> Regards,
> Edwin
>
> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Jan,
>>
>> Thanks for your reply.
>>
>> However, we are still getting a slow QTime of 517ms even after we set
>> hl=false=null.
>>
>> Below is the debug query:
>>
>>   "debug":{
>> "rawquerystring":"cherry",
>> "querystring":"cherry",
>> "parsedquery":"searchFields_tcs:cherry",
>> "parsedquery_toString":"searchFields_tcs:cherry",
>> "explain":{
>>   "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) 
>> [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = 
>> termFreq=3.0\n), product of:\n9.614556 = idf, computed as log(1 + 
>> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  400.0 = docFreq\n 
>>  600.0 = docCount\n1.4798305 = tfNorm, computed as (freq * (k1 + 
>> 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n  
>> 3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n 
>>  19.397041 = avgFieldLength\n  25.0 = fieldLength\n",
>>   "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) 
>> [SchemaSimilarity], result of:\n  13.937909 = score(doc=4840794,freq=3.0 = 
>> termFreq=3.0\n), product of:\n9.614556 = idf, computed as log(1 + 
>> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  400.0 = docFreq\n 
>>  600.0 = docCount\n1.4496675 = tfNorm, computed as (freq * (k1 + 
>> 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n  
>> 3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n 
>>  19.397041 = avgFieldLength\n  27.0 = fieldLength\n",
>> "QParser":"LuceneQParser",
>> "timing":{
>>   "time":517.0,
>>   "prepare":{
>> "time":0.0,
>> "query":{
>>   "time":0.0},
>> "facet":{
>>   "time":0.0},
>> "facet_module":{
>>   "time":0.0},
>> "mlt":{
>>   "time":0.0},
>> "highlight":{
>>   "time":0.0},
>> "stats":{
>>   "time":0.0},
>> "expand":{
>>   "time":0.0},
>> "terms":{
>>   "time":0.0},
>> "debug":{
>>   "time":0.0}},
>>   "process":{
>> "time":516.0,
>> "query":{
>>   "time":15.0},
>> "facet":{
>>   "time":0.0},
>> "facet_module":{
>>   "time":0.0},
>> "mlt":{
>>   "time":0.0},
>> "highlight":{
>>   "time":0.0},
>> "stats":{
>>   "time":0.0},
>> "expand":{
>>   "time":0.0},
>> "terms":{
>>   "time":0.0},
>> "debug":{
>>   "time":500.0}
>>
>> Regards,
>> Edwin
>>
>>
>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl  wrote:
>>
>>> Looks like highlighting takes most of the time on the first query
>>> (680ms). You config seems to ask for a lot of highlighting here, like 100
>>> snippets of max 10 characters etc.
>>> Sounds to me that this might be a highlighting configuration problem.
>>> Try to disable highlighting (hl=false) and see if you get back your speed.
>>> Also, I see fl=* in your config, which is really asking for all fields.
>>> Are you sure you want that, that may also be slow. Try to ask for just the
>>> fields you will be using.
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
>>> edwinye...@gmail.com>:
>>> >
>>> > Thanks for your

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo

Hi Jan,

Referring to what you have mentioned that the highlighting takes up most of
the time in the first query from the policies collection, the highlighting
was very fast (less than 50ms) from the time it was indexed, till the time
after customers collection gets indexed, in which it slowed down
tremendously.

Also, the slow down does not just affect on the highlight requestHandler.
It also affects other requestHandler like select and clustering as well.
All of them gets the QTime of more than 500ms. This is also proven in the
latest debug query that we have sent earlier, in which we have set hl=false
and fl=null.

Any idea or explanation on this strange behavior?
Thank you for your support, as we look forward to shed some lights on this
issue and to resolve it.

Regards,
Edwin

On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo 
wrote:

> Hi Jan,
>
> Thanks for your reply.
>
> However, we are still getting a slow QTime of 517ms even after we set
> hl=false=null.
>
> Below is the debug query:
>
>   "debug":{
> "rawquerystring":"cherry",
> "querystring":"cherry",
> "parsedquery":"searchFields_tcs:cherry",
> "parsedquery_toString":"searchFields_tcs:cherry",
> "explain":{
>   "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) 
> [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = 
> termFreq=3.0\n), product of:\n9.614556 = idf, computed as log(1 + 
> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  400.0 = docFreq\n  
> 600.0 = docCount\n1.4798305 = tfNorm, computed as (freq * (k1 + 
> 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n  
> 3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n  
> 19.397041 = avgFieldLength\n  25.0 = fieldLength\n",
>   "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) 
> [SchemaSimilarity], result of:\n  13.937909 = score(doc=4840794,freq=3.0 = 
> termFreq=3.0\n), product of:\n9.614556 = idf, computed as log(1 + 
> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  400.0 = docFreq\n  
> 600.0 = docCount\n1.4496675 = tfNorm, computed as (freq * (k1 + 
> 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n  
> 3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n  
> 19.397041 = avgFieldLength\n  27.0 = fieldLength\n",
> "QParser":"LuceneQParser",
> "timing":{
>   "time":517.0,
>   "prepare":{
> "time":0.0,
> "query":{
>   "time":0.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":0.0}},
>   "process":{
> "time":516.0,
> "query":{
>   "time":15.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":500.0}
>
> Regards,
> Edwin
>
>
> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl  wrote:
>
>> Looks like highlighting takes most of the time on the first query
>> (680ms). You config seems to ask for a lot of highlighting here, like 100
>> snippets of max 10 characters etc.
>> Sounds to me that this might be a highlighting configuration problem. Try
>> to disable highlighting (hl=false) and see if you get back your speed.
>> Also, I see fl=* in your config, which is really asking for all fields.
>> Are you sure you want that, that may also be slow. Try to ask for just the
>> fields you will be using.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo > >:
>> >
>> > Thanks for your reply.
>> >
>> > Below are what you have requested about our Solr setup, configurations
>> > files ,schema and results of debug queries:
>> >
>> > Looking forward to your advice and support on our problem.
>> >
>> > 1. System configurations
>> > OS: Windows 10 Pro 64 bit
>> > System Memory: 32GB
>> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
>> > Processor(s)
>> > HDD: 3.0 TB (free 2.1 TB)  SATA
>> >
>> > 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
>> > which can be download from the following link:
>> >
>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>> >
>> > 3. The debug queries from both collections
>> >
>> > *3.1. Debug Query From Policies ( which is Slow)*
>> >
>> >  "debug":{
>> >
>> >

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo

Hi Jan,

Thanks for your reply.

However, we are still getting a slow QTime of 517ms even after we set
hl=false=null.

Below is the debug query:

  "debug":{
"rawquerystring":"cherry",
"querystring":"cherry",
"parsedquery":"searchFields_tcs:cherry",
"parsedquery_toString":"searchFields_tcs:cherry",
"explain":{
  "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
5747763) [SchemaSimilarity], result of:\n  14.227914 =
score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n  400.0 = docFreq\n  600.0 =
docCount\n1.4798305 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter
b\n  19.397041 = avgFieldLength\n  25.0 = fieldLength\n",
  "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
4840794) [SchemaSimilarity], result of:\n  13.937909 =
score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n  400.0 = docFreq\n  600.0 =
docCount\n1.4496675 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter
b\n  19.397041 = avgFieldLength\n  27.0 = fieldLength\n",
"QParser":"LuceneQParser",
"timing":{
  "time":517.0,
  "prepare":{
"time":0.0,
"query":{
  "time":0.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":516.0,
"query":{
  "time":15.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":500.0}

Regards,
Edwin


On Thu, 24 Jan 2019 at 22:43, Jan Høydahl  wrote:

> Looks like highlighting takes most of the time on the first query (680ms).
> You config seems to ask for a lot of highlighting here, like 100 snippets
> of max 10 characters etc.
> Sounds to me that this might be a highlighting configuration problem. Try
> to disable highlighting (hl=false) and see if you get back your speed.
> Also, I see fl=* in your config, which is really asking for all fields.
> Are you sure you want that, that may also be slow. Try to ask for just the
> fields you will be using.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo  >:
> >
> > Thanks for your reply.
> >
> > Below are what you have requested about our Solr setup, configurations
> > files ,schema and results of debug queries:
> >
> > Looking forward to your advice and support on our problem.
> >
> > 1. System configurations
> > OS: Windows 10 Pro 64 bit
> > System Memory: 32GB
> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> > Processor(s)
> > HDD: 3.0 TB (free 2.1 TB)  SATA
> >
> > 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> > which can be download from the following link:
> >
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> >
> > 3. The debug queries from both collections
> >
> > *3.1. Debug Query From Policies ( which is Slow)*
> >
> >  "debug":{
> >
> >"rawquerystring":"sherry",
> >
> >"querystring":"sherry",
> >
> >"parsedquery":"searchFields_tcs:sherry",
> >
> >"parsedquery_toString":"searchFields_tcs:sherry",
> >
> >"explain":{
> >
> >  "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> > 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> > (docFreq + 0.5)) from:\n  812.0 = docFreq\n  600.0 =
> > docCount\n1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> > 5.0 = termFreq=5.0\n  1.2 = parameter k1\n  0.75 = parameter
> > b\n  19.397041 = avgFieldLength\n  31.0 = fieldLength\n”,..
> >
> >"QParser":"LuceneQParser",
> >
> >"timing":{
> >
> >  "time":681.0,
> >
> >  "prepare":{
> >
> >"time":0.0,
> >
> >"query":{
> >
> >  "time":0.0},
> >
> >"facet":{
> >
> >  "time":0.0},
> >
> >

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Jan Høydahl

Looks like highlighting takes most of the time on the first query (680ms). You 
config seems to ask for a lot of highlighting here, like 100 snippets of max 
10 characters etc.
Sounds to me that this might be a highlighting configuration problem. Try to 
disable highlighting (hl=false) and see if you get back your speed.
Also, I see fl=* in your config, which is really asking for all fields. Are you 
sure you want that, that may also be slow. Try to ask for just the fields you 
will be using.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo :
> 
> Thanks for your reply.
> 
> Below are what you have requested about our Solr setup, configurations
> files ,schema and results of debug queries:
> 
> Looking forward to your advice and support on our problem.
> 
> 1. System configurations
> OS: Windows 10 Pro 64 bit
> System Memory: 32GB
> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> Processor(s)
> HDD: 3.0 TB (free 2.1 TB)  SATA
> 
> 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> which can be download from the following link:
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> 
> 3. The debug queries from both collections
> 
> *3.1. Debug Query From Policies ( which is Slow)*
> 
>  "debug":{
> 
>"rawquerystring":"sherry",
> 
>"querystring":"sherry",
> 
>"parsedquery":"searchFields_tcs:sherry",
> 
>"parsedquery_toString":"searchFields_tcs:sherry",
> 
>"explain":{
> 
>  "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:\n  812.0 = docFreq\n  600.0 =
> docCount\n1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> 5.0 = termFreq=5.0\n  1.2 = parameter k1\n  0.75 = parameter
> b\n  19.397041 = avgFieldLength\n  31.0 = fieldLength\n”,..
> 
>"QParser":"LuceneQParser",
> 
>"timing":{
> 
>  "time":681.0,
> 
>  "prepare":{
> 
>"time":0.0,
> 
>"query":{
> 
>  "time":0.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":0.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":0.0}},
> 
>  "process":{
> 
>"time":680.0,
> 
>"query":{
> 
>  "time":19.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":651.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":8.0}},
> 
>  "loadFieldValues":{
> 
>"time":12.0
> 
> 
> 
> *3.2. Debug Query From Customers (which is fast because we index it after
> indexing Policies):*
> 
> 
> 
>  "debug":{
> 
>"rawquerystring":"sherry",
> 
>"querystring":"sherry",
> 
>"parsedquery":"searchFields_tcs:sherry",
> 
>"parsedquery_toString":"searchFields_tcs:sherry",
> 
>"explain":{
> 
>  "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n9.08604
> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> 0.5)) from:\n  428.0 = docFreq\n  3784142.0 = docCount\n
> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:\n  3.0 =
> termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
> 20.22558 = avgFieldLength\n  28.0 = fieldLength\n”, ..
> 
>"QParser":"LuceneQParser",
> 
>"timing":{
> 
>  "time":38.0,
> 
>  "prepare":{
> 
>"time":1.0,
> 
>"query":{
> 
>  "time":1.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>  "time":0.0},
> 
>"mlt":{
> 
>  "time":0.0},
> 
>"highlight":{
> 
>  "time":0.0},
> 
>"stats":{
> 
>  "time":0.0},
> 
>"expand":{
> 
>  "time":0.0},
> 
>"terms":{
> 
>  "time":0.0},
> 
>"debug":{
> 
>  "time":0.0}},
> 
>  "process":{
> 
>"time":36.0,
> 
>"query":{
> 
>  "time":1.0},
> 
>"facet":{
> 
>  "time":0.0},
> 
>"facet_module":{
> 
>

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo

Thanks for your reply.

Below are what you have requested about our Solr setup, configurations
files ,schema and results of debug queries:

Looking forward to your advice and support on our problem.

1. System configurations
OS: Windows 10 Pro 64 bit
System Memory: 32GB
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
Processor(s)
HDD: 3.0 TB (free 2.1 TB)  SATA

2. solrconfig.xml of customers and policies collection, and solr.in,cmd
which can be download from the following link:
https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing

3. The debug queries from both collections

*3.1. Debug Query From Policies ( which is Slow)*

  "debug":{

"rawquerystring":"sherry",

"querystring":"sherry",

"parsedquery":"searchFields_tcs:sherry",

"parsedquery_toString":"searchFields_tcs:sherry",

"explain":{

  "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
3097315) [SchemaSimilarity], result of:\n  14.540428 =
score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n  812.0 = docFreq\n  600.0 =
docCount\n1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
5.0 = termFreq=5.0\n  1.2 = parameter k1\n  0.75 = parameter
b\n  19.397041 = avgFieldLength\n  31.0 = fieldLength\n”,..

"QParser":"LuceneQParser",

"timing":{

  "time":681.0,

  "prepare":{

"time":0.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":680.0,

"query":{

  "time":19.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":651.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":8.0}},

  "loadFieldValues":{

"time":12.0



*3.2. Debug Query From Customers (which is fast because we index it after
indexing Policies):*



  "debug":{

"rawquerystring":"sherry",

"querystring":"sherry",

"parsedquery":"searchFields_tcs:sherry",

"parsedquery_toString":"searchFields_tcs:sherry",

"explain":{

  "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
2453665) [SchemaSimilarity], result of:\n  13.191501 =
score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n9.08604
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
0.5)) from:\n  428.0 = docFreq\n  3784142.0 = docCount\n
1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:\n  3.0 =
termFreq=3.0\n  1.2 = parameter k1\n  0.75 = parameter b\n
 20.22558 = avgFieldLength\n  28.0 = fieldLength\n”, ..

"QParser":"LuceneQParser",

"timing":{

  "time":38.0,

  "prepare":{

"time":1.0,

"query":{

  "time":1.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":36.0,

"query":{

  "time":1.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":31.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":3.0}},

  "loadFieldValues":{

"time":13.0



Best Regards,
Edwin

On Thu, 24 Jan 2019 at 20:57, Jan Høydahl  wrote:

> It would be useful if you can disclose the machine configuration, OS,
> memory, settings etc, as well as solr config including solr.in <
> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture
> of memory, GC, etc.
> You could also specify debugQuery=true on a slow search and check the
> timings section for clues. What QTime are you seeing on the slow queries in
> solr.log?
> If that does not reveal the reason, I'd connect to your solr instance with
> a tool like jVisualVM or similar, to inspect what takes time. Or better,
> hook up to DataDog, SPM or some other cloud

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Jan Høydahl

It would be useful if you can disclose the machine configuration, OS, memory, 
settings etc, as well as solr config including solr.in .sh, 
solrconfig.xml etc, so we can see the whole picture of memory, GC, etc.
You could also specify debugQuery=true on a slow search and check the timings 
section for clues. What QTime are you seeing on the slow queries in solr.log? 
If that does not reveal the reason, I'd connect to your solr instance with a 
tool like jVisualVM or similar, to inspect what takes time. Or better, hook up 
to DataDog, SPM or some other cloud tool to get a full view of the system.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo :
> 
> Hi Shawn,
> 
> Unfortunately your reply of memory may not be valid. Please refer to my
> explanation below of the strange behaviors (is it much more like a BUG than
> anything else that is explainable):
> 
> Note that we still have 18GB of free unused memory on the server.
> 
> 1. We indexed the first collection called customers (3.7 millioin records
> from CSV data), index size is 2.09GB. The search in customers for any
> keyword is returned within 50ms (QTime) for using highlight (unified
> highlighter, posting, light term vectors)
> 
> 2. Then we indexed the second collection called policies (6 million records
> from CSV data), index size is 2.55GB. The search in policies for any
> keyword is returned within 50ms (QTime) for using highlight (unified
> highlighter, posting, light term vectors)
> 
> 3. But now any search in customers for any keywords (not from cache) takes
> as high as 1200ms (QTime). But still policies search remains very fast
> (50ms).
> 
> 4. So we decided to run the force optimize command on customers collection (
> https://localhost:8983/edm/customers/update?optimize=true=1=false),
> surprisingly after optimization the search on customers collection for any
> keywords become very fast again (less than 50ms). BUT strangely, the search
> in policies collection become very slow (around 1200ms) without any changes
> to the policies collection.
> 
> 5. Based on above result, we decided to run the force optimize command on
> policies collection (
> https://localhost:8983/edm/policies/update?optimize=true=1=false).
> More surprisingly, after optimization the search on policies collection for
> any keywords become very fast again (less than 50ms). BUT more strangely,
> the search in customers collection again become very slow (around 1200ms)
> without any changes to the customers collection.
> 
> What a strange and unexpected behavior! If this is not a bug, how could you
> explain the above very strange behavior in Solr 7.5. Could it be a bug?
> 
> We would appreciate any support or help on our above situation.
> 
> Thank you.
> 
> Regards,
> Edwin
> 
> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo 
> wrote:
> 
>> Hi Shawn,
>> 
>>> If the two collections have data on the same server(s), I can see this
>>> happening.  More memory is consumed when there is additional data, and
>>> when Solr needs more memory, performance might be affected.  The
>>> solution is generally to install more memory in the server.
>> 
>> I have found that even after we delete the index in collection2, the query
>> QTime for collection1 still remains slow. It does not goes back to its
>> previous fast speed before we index collection2.
>> 
>> Regards,
>> Edwin
>> 
>> 
>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo 
>> wrote:
>> 
>>> Hi Shawn,
>>> 
>>> Thanks for your reply.
>>> 
>>> The log only shows a list  the following and I don't see any other logs
>>> besides these.
>>> 
>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245417
>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245430
>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245435
>>> 
>>> There is no change to the segments info. but the slowdown in the first
>>> collection is very drastic.
>>> Before the indexing of collection2, the collection1 query QTime are in
>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>> collection1 query QTime increases to more than 1000 ms. The index are done
>>> in CSV format, and the size of the index is 3GB.
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> 
>>> 
>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey  wrote:
>>> 
 On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> I am using Solr 7.5.0, and currently I am facing an issue of when I am
> indexing in

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo

Hi Shawn,

Unfortunately your reply of memory may not be valid. Please refer to my
explanation below of the strange behaviors (is it much more like a BUG than
anything else that is explainable):

Note that we still have 18GB of free unused memory on the server.

1. We indexed the first collection called customers (3.7 millioin records
from CSV data), index size is 2.09GB. The search in customers for any
keyword is returned within 50ms (QTime) for using highlight (unified
highlighter, posting, light term vectors)

2. Then we indexed the second collection called policies (6 million records
from CSV data), index size is 2.55GB. The search in policies for any
keyword is returned within 50ms (QTime) for using highlight (unified
highlighter, posting, light term vectors)

3. But now any search in customers for any keywords (not from cache) takes
as high as 1200ms (QTime). But still policies search remains very fast
(50ms).

4. So we decided to run the force optimize command on customers collection (
https://localhost:8983/edm/customers/update?optimize=true=1=false),
surprisingly after optimization the search on customers collection for any
keywords become very fast again (less than 50ms). BUT strangely, the search
in policies collection become very slow (around 1200ms) without any changes
to the policies collection.

5. Based on above result, we decided to run the force optimize command on
policies collection (
https://localhost:8983/edm/policies/update?optimize=true=1=false).
More surprisingly, after optimization the search on policies collection for
any keywords become very fast again (less than 50ms). BUT more strangely,
the search in customers collection again become very slow (around 1200ms)
without any changes to the customers collection.

What a strange and unexpected behavior! If this is not a bug, how could you
explain the above very strange behavior in Solr 7.5. Could it be a bug?

We would appreciate any support or help on our above situation.

Thank you.

Regards,
Edwin

On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> > If the two collections have data on the same server(s), I can see this
> > happening.  More memory is consumed when there is additional data, and
> > when Solr needs more memory, performance might be affected.  The
> > solution is generally to install more memory in the server.
>
> I have found that even after we delete the index in collection2, the query
> QTime for collection1 still remains slow. It does not goes back to its
> previous fast speed before we index collection2.
>
> Regards,
> Edwin
>
>
> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Shawn,
>>
>> Thanks for your reply.
>>
>> The log only shows a list  the following and I don't see any other logs
>> besides these.
>>
>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245417
>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245430
>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245435
>>
>> There is no change to the segments info. but the slowdown in the first
>> collection is very drastic.
>> Before the indexing of collection2, the collection1 query QTime are in
>> the range of 4 to 50 ms. However, after indexing collection2, the
>> collection1 query QTime increases to more than 1000 ms. The index are done
>> in CSV format, and the size of the index is 3GB.
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey  wrote:
>>
>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
>>> > indexing in collection2, the indexing affects the records in
>>> collection1.
>>> > Although the records are still intact, it seems that the settings of
>>> the
>>> > termVecotrs get wipe out, and the index size of collection1 reduced
>>> from
>>> > 3.3GB to 2.1GB after I do the indexing in collection2.
>>>
>>> This should not be possible.  Indexing in one collection should have
>>> absolutely no effect on another collection.
>>>
>>> If logging has been left at its default settings, the solr.log file
>>> should have enough info to show what actually happened.
>>>
>>> > Also, the search in
>>> > collection1, which was originall very fast, becomes very slow after the
>>> > indexing is done is collection2.
>>>
>>> If the two collections have data on the same server(s), I can see this
>>> happening.  More memory is consumed when there is additional data, and
>>> when Solr needs more memory, performance might be affected.

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo

Hi Shawn,

> If the two collections have data on the same server(s), I can see this
> happening.  More memory is consumed when there is additional data, and
> when Solr needs more memory, performance might be affected.  The
> solution is generally to install more memory in the server.

I have found that even after we delete the index in collection2, the query
QTime for collection1 still remains slow. It does not goes back to its
previous fast speed before we index collection2.

Regards,
Edwin


On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> The log only shows a list  the following and I don't see any other logs
> besides these.
>
> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245417
> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245430
> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245435
>
> There is no change to the segments info. but the slowdown in the first
> collection is very drastic.
> Before the indexing of collection2, the collection1 query QTime are in the
> range of 4 to 50 ms. However, after indexing collection2, the collection1
> query QTime increases to more than 1000 ms. The index are done in CSV
> format, and the size of the index is 3GB.
>
> Regards,
> Edwin
>
>
>
> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey  wrote:
>
>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
>> > indexing in collection2, the indexing affects the records in
>> collection1.
>> > Although the records are still intact, it seems that the settings of the
>> > termVecotrs get wipe out, and the index size of collection1 reduced from
>> > 3.3GB to 2.1GB after I do the indexing in collection2.
>>
>> This should not be possible.  Indexing in one collection should have
>> absolutely no effect on another collection.
>>
>> If logging has been left at its default settings, the solr.log file
>> should have enough info to show what actually happened.
>>
>> > Also, the search in
>> > collection1, which was originall very fast, becomes very slow after the
>> > indexing is done is collection2.
>>
>> If the two collections have data on the same server(s), I can see this
>> happening.  More memory is consumed when there is additional data, and
>> when Solr needs more memory, performance might be affected.  The
>> solution is generally to install more memory in the server.  If the
>> system is working, there should be no need to increase the heap size
>> when the memory size increases ... but there can be situations where the
>> heap is a little bit too small, where you WOULD want to increase the
>> heap size.
>>
>> Thanks,
>> Shawn
>>
>>

Re: Indexing in one collection affect index in another collection

2019-01-23 Thread Zheng Lin Edwin Yeo

Hi Shawn,

Thanks for your reply.

The log only shows a list  the following and I don't see any other logs
besides these.

2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1 s:shard1
r:core_node4 x:policies_shard1_replica_n2]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
id=13245417
2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1 s:shard1
r:core_node4 x:policies_shard1_replica_n2]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
id=13245430
2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1 s:shard1
r:core_node4 x:policies_shard1_replica_n2]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
id=13245435

There is no change to the segments info. but the slowdown in the first
collection is very drastic.
Before the indexing of collection2, the collection1 query QTime are in the
range of 4 to 50 ms. However, after indexing collection2, the collection1
query QTime increases to more than 1000 ms. The index are done in CSV
format, and the size of the index is 3GB.

Regards,
Edwin



On Thu, 24 Jan 2019 at 01:09, Shawn Heisey  wrote:

> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
> > indexing in collection2, the indexing affects the records in collection1.
> > Although the records are still intact, it seems that the settings of the
> > termVecotrs get wipe out, and the index size of collection1 reduced from
> > 3.3GB to 2.1GB after I do the indexing in collection2.
>
> This should not be possible.  Indexing in one collection should have
> absolutely no effect on another collection.
>
> If logging has been left at its default settings, the solr.log file
> should have enough info to show what actually happened.
>
> > Also, the search in
> > collection1, which was originall very fast, becomes very slow after the
> > indexing is done is collection2.
>
> If the two collections have data on the same server(s), I can see this
> happening.  More memory is consumed when there is additional data, and
> when Solr needs more memory, performance might be affected.  The
> solution is generally to install more memory in the server.  If the
> system is working, there should be no need to increase the heap size
> when the memory size increases ... but there can be situations where the
> heap is a little bit too small, where you WOULD want to increase the
> heap size.
>
> Thanks,
> Shawn
>
>

Re: Indexing in one collection affect index in another collection

2019-01-23 Thread Shawn Heisey


On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:

I am using Solr 7.5.0, and currently I am facing an issue of when I am
indexing in collection2, the indexing affects the records in collection1.
Although the records are still intact, it seems that the settings of the
termVecotrs get wipe out, and the index size of collection1 reduced from
3.3GB to 2.1GB after I do the indexing in collection2.


This should not be possible.  Indexing in one collection should have 
absolutely no effect on another collection.


If logging has been left at its default settings, the solr.log file 
should have enough info to show what actually happened.



Also, the search in
collection1, which was originall very fast, becomes very slow after the
indexing is done is collection2.


If the two collections have data on the same server(s), I can see this 
happening.  More memory is consumed when there is additional data, and 
when Solr needs more memory, performance might be affected.  The 
solution is generally to install more memory in the server.  If the 
system is working, there should be no need to increase the heap size 
when the memory size increases ... but there can be situations where the 
heap is a little bit too small, where you WOULD want to increase the 
heap size.


Thanks,
Shawn

Indexing in one collection affect index in another collection

2019-01-23 Thread Zheng Lin Edwin Yeo

Hi,

I am using Solr 7.5.0, and currently I am facing an issue of when I am
indexing in collection2, the indexing affects the records in collection1.
Although the records are still intact, it seems that the settings of the
termVecotrs get wipe out, and the index size of collection1 reduced from
3.3GB to 2.1GB after I do the indexing in collection2. Also, the search in
collection1, which was originall very fast, becomes very slow after the
indexing is done is collection2.

Anyone has faced such issues before or have any idea on what may have gone
wrong?

Regards,
Edwin

39 matches

Mail list logo