AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-13 Thread paul.dodd
Hi Edwin,
With \W you will also replace non-word characters such as punktuation. If 
that's OK fine. Otherwise you need to identify the white space characters that 
are causing the problem.

Von: Zheng Lin Edwin Yeo 
Gesendet: Mittwoch, 13. März 2019 03:25:39
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n

Hi,

We have managed to resolve the issue, by changing the \s to \W. The reason
could be due to that some of the spaces and white space instead of just a
space. Using \s will only remove the spaces and not the white spaces, but
using \W will remove the white spaces as well.

We have used this config, and it works.


   content
   (\n\W*){2,}
   brbr
   true


   content
   (\n\W*){1,}
   br
   true


Regards,
Edwin

On Tue, 12 Mar 2019 at 10:49, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Has anyone else faced the same issue before?
> So far all the regex patterns that we tried in this thread are not able to
> resolve the issue.
>
> Regards,
> Edwin
>
> On Fri, 8 Mar 2019 at 12:17, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Paul,
>>
>> Sorry, I realized there is an extra ']' in the pattern provided, which is
>> why there are so many  in the output.
>>
>> The output is exactly the same as previously (previous index result) if
>> we remove the extra ']', as shown in the configuration below.
>>
>>  
>>content
>>[ \t\x0b\f]*\r?\n
>>br
>>true
>>  
>>  
>>content
>>(br[ \t\x0b\f]*){3,}
>>brbr
>>true
>>  
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On Thu, 7 Mar 2019 at 22:51, Zheng Lin Edwin Yeo 
>> wrote:
>>
>>> Hi Paul,
>>>
>>> Thanks for the reply.
>>>
>>> For the 2nd pattern, if we put this pattern >> name="pattern">(br[ \t\x0b\f]]*){3,}, which is like the
>>> configurations below:
>>>
>>> 
>>>content
>>>[ \t\x0b\f]*\r?\n
>>>br
>>>true
>>> 
>>> 
>>>content
>>>(br[ \t\x0b\f]]*){3,}
>>>brbr
>>>true
>>> 
>>>
>>> It will not be able to change all those more than 3  to 2 .
>>>
>>> We will end up with many  in the output, like the example below:
>>>
>>>  http://www.concorded.com/  
>>> 
>>>  On Tue, Dec 18, 2018
>>>
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>>
>>>
>>> On Thu, 7 Mar 2019 at 20:44,  wrote:
>>>
 Hi Edwin



 I can’t understand why the pattern is not working and where the spaces
 between the  are coming from. It should be possible to allow for spaces
 between the  in the second match pattern however i.e. 2nd pattern



 (br[ \t\x0b\f]]*){3,}



 /Paul



 Gesendet von Mail für
 Windows 10



 Von: Zheng Lin Edwin Yeo
 Gesendet: Mittwoch, 6. März 2019 16:28
 An: solr-user@lucene.apache.org
 Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



 Hi Paul,

 I have tried with the first match pattern to be [
 \t\x0b\f]*\r?\n, like the configuration below:

 
content
[ \t\x0b\f]*\r?\n
br
true
 
 
content
(br){3,}
brbr
true
 

 However, the result is still the same as before (previous index
 results),
 with the 4 .

 Regards,
 Edwin


 On Wed, 6 Mar 2019 at 18:23,  wrote:

 > Hi Edwin
 >
 >
 >
 > You are correct  re the 2nd pattern – my bad. Looking at the 4 ,
 it’s
 > actually the sequence «  »? So perhaps the first match
 > pattern could be [ \t\x0b\f]*\r?\n
 >
 >
 >
 > i.e. [space tab vertical-tab formfeed]
 >
 >
 >
 > Regards,
 >
 > Paul
 >
 >
 >
 > Gesendet von Mail für
 > Windows 10
 >
 >
 >
 > Von: Zheng Lin Edwin Yeo
 > Gesendet: Mittwoch, 6. März 2019 07:44
 > An: solr-user@lucene.apache.org
 > Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple
 \n
 >
 >
 >
 > Hi Paul,
 >
 > I have modified the second pattern to be (br){3,}, instead of
 > (brbr){3,}. This pattern of
 (brbr){3,}
 > will actually look for 6 or more  instead of 3 ,  as we have
 put
 > the  two times in the pattern, which is the reason that there are
 more
 >  in the result, as cases where there are less than 6  are not
 being
 > replaced, so we ended up having up to 5  in the index.
 >
 > Modified configuration:
 >  
 >content
 >(br){3,}
 >brbr
 >true
 >  
 >
 > This will bring us back to the result of the previous index content,
 > meaning the issue of having the 4  is still there.
 >
 > Regards,
 > Edwin
 >
 >
 >
 > 

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-07 Thread paul.dodd
Hi Edwin



I can’t understand why the pattern is not working and where the spaces between 
the  are coming from. It should be possible to allow for spaces between the 
 in the second match pattern however i.e. 2nd pattern



(br[ \t\x0b\f]]*){3,}



/Paul



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Mittwoch, 6. März 2019 16:28
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Paul,

I have tried with the first match pattern to be [
\t\x0b\f]*\r?\n, like the configuration below:


   content
   [ \t\x0b\f]*\r?\n
   br
   true


   content
   (br){3,}
   brbr
   true


However, the result is still the same as before (previous index results),
with the 4 .

Regards,
Edwin


On Wed, 6 Mar 2019 at 18:23,  wrote:

> Hi Edwin
>
>
>
> You are correct  re the 2nd pattern – my bad. Looking at the 4 , it’s
> actually the sequence «  »? So perhaps the first match
> pattern could be [ \t\x0b\f]*\r?\n
>
>
>
> i.e. [space tab vertical-tab formfeed]
>
>
>
> Regards,
>
> Paul
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Mittwoch, 6. März 2019 07:44
> An: solr-user@lucene.apache.org
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi Paul,
>
> I have modified the second pattern to be (br){3,}, instead of
> (brbr){3,}. This pattern of  (brbr){3,}
> will actually look for 6 or more  instead of 3 ,  as we have put
> the  two times in the pattern, which is the reason that there are more
>  in the result, as cases where there are less than 6  are not being
> replaced, so we ended up having up to 5  in the index.
>
> Modified configuration:
>  
>content
>(br){3,}
>brbr
>true
>  
>
> This will bring us back to the result of the previous index content,
> meaning the issue of having the 4  is still there.
>
> Regards,
> Edwin
>
>
>
> Regards,
> Edwin
>
> On Wed, 6 Mar 2019 at 11:37, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Paul,
> >
> > Further to my previous email, which there was an extra "}" in the
> > configuration, I have changed to use the below configuration based on
> your
> > suggestion.
> >
> > 
> >content
> >[ \t]*\r?\n
> >br
> >true
> > 
> > 
> >content
> >(brbr){3,}
> >brbr
> >true
> > 
> >
> > However, the result that I get still has more than 2 . In fact, the
> > result become worse, as you can see from the comparison below.
> >
> > Example 1: The sentence that the regex pattern used to work correctly.
> But
> > with the latest pattern, it has now changed from 2  to become 5 ,
> > which is wrong.
> > *Original content in EML file:*
> > Dear Sir,
> >
> >
> > I am terminating
> > *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> > *Previous Index content: *Dear Sir,  I am terminating
> > *Current Index content*:   Dear Sir,  I am
> terminating
> >
> > Example 2: The sentence that the above regex pattern is partially working
> > (as you can see, instead of 2 , there are 4 )
> > *Original content in EML file:*
> >
> > *exalted*
> >
> > *Psalm 89:17*
> >
> >
> > 3 Choa Chu Kang Avenue 4
> > *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
> > Chu Kang Avenue 4, Singapore
> > *Previous Index content: *exalted  Psalm 89:17   
> > 3 Choa Chu Kang Avenue 4, Singapore
> > *Current Index content*:Psalm 89:173
> > Choa Chu Kang Avenue 3, Singapor4
> >
> > Example 3: The sentence that the above regex pattern is partially working
> > (as you can see, instead of 2 , there are 4 ). For the latest
> code,
> > there are now 5 
> > *Original content in EML file:*
> >
> > http://www.concorded.com/
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Dec 18, 2018 at 10:07 AM
> > *Original content:* http://www.concorded.com/   \n\n   \n\n \n \n\n \n\n
> > \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018 at
> > 10:07 AM
> > *Previous Index content: *http://www.concorded.com/   
> > On Tue, Dec 18, 2018 at 10:07 AM
> > *Current Index content:* http://www.concorded.com/  
> > On Tue, Dec 18, 2018 at 10:07 AM
> >
> >
> > Regards,
> > Edwin
> >
> > On Wed, 6 Mar 2019 at 00:29, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi Paul,
> >>
> >> Thank you for the reply.
> >>
> >> I have tried to add the following configuration according to your
> >> suggestion:
> >>
> >> 
> >>content
> >>[ \t]*\r?\n}
> >>br
> >>true
> >> 
> >>
> >> 
> >>content
> >>(brbr){3,}
> >>brbr
> >>true
> >> 
> >>
> >> However, none of the \n is being removed this time round.
> >> Is the order and/or the pattern correct?
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Tue, 5 Mar 2019 at 19:54,  wrote:
> >>
> >>> Hi Edwin
> >>>
> >>>
> >>>
> >>> Try for the 

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-06 Thread paul.dodd
Hi Edwin



You are correct  re the 2nd pattern – my bad. Looking at the 4 , it’s 
actually the sequence «  »? So perhaps the first match pattern 
could be [ \t\x0b\f]*\r?\n



i.e. [space tab vertical-tab formfeed]



Regards,

Paul



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Mittwoch, 6. März 2019 07:44
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Paul,

I have modified the second pattern to be (br){3,}, instead of
(brbr){3,}. This pattern of  (brbr){3,}
will actually look for 6 or more  instead of 3 ,  as we have put
the  two times in the pattern, which is the reason that there are more
 in the result, as cases where there are less than 6  are not being
replaced, so we ended up having up to 5  in the index.

Modified configuration:
 
   content
   (br){3,}
   brbr
   true
 

This will bring us back to the result of the previous index content,
meaning the issue of having the 4  is still there.

Regards,
Edwin



Regards,
Edwin

On Wed, 6 Mar 2019 at 11:37, Zheng Lin Edwin Yeo 
wrote:

> Hi Paul,
>
> Further to my previous email, which there was an extra "}" in the
> configuration, I have changed to use the below configuration based on your
> suggestion.
>
> 
>content
>[ \t]*\r?\n
>br
>true
> 
> 
>content
>(brbr){3,}
>brbr
>true
> 
>
> However, the result that I get still has more than 2 . In fact, the
> result become worse, as you can see from the comparison below.
>
> Example 1: The sentence that the regex pattern used to work correctly. But
> with the latest pattern, it has now changed from 2  to become 5 ,
> which is wrong.
> *Original content in EML file:*
> Dear Sir,
>
>
> I am terminating
> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> *Previous Index content: *Dear Sir,  I am terminating
> *Current Index content*:   Dear Sir,  I am terminating
>
> Example 2: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content in EML file:*
>
> *exalted*
>
> *Psalm 89:17*
>
>
> 3 Choa Chu Kang Avenue 4
> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
> Chu Kang Avenue 4, Singapore
> *Previous Index content: *exalted  Psalm 89:17   
> 3 Choa Chu Kang Avenue 4, Singapore
> *Current Index content*:Psalm 89:173
> Choa Chu Kang Avenue 3, Singapor4
>
> Example 3: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 ). For the latest code,
> there are now 5 
> *Original content in EML file:*
>
> http://www.concorded.com/
>
>
>
>
>
>
>
>
> On Tue, Dec 18, 2018 at 10:07 AM
> *Original content:* http://www.concorded.com/   \n\n   \n\n \n \n\n \n\n
> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018 at
> 10:07 AM
> *Previous Index content: *http://www.concorded.com/   
> On Tue, Dec 18, 2018 at 10:07 AM
> *Current Index content:* http://www.concorded.com/  
> On Tue, Dec 18, 2018 at 10:07 AM
>
>
> Regards,
> Edwin
>
> On Wed, 6 Mar 2019 at 00:29, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Paul,
>>
>> Thank you for the reply.
>>
>> I have tried to add the following configuration according to your
>> suggestion:
>>
>> 
>>content
>>[ \t]*\r?\n}
>>br
>>true
>> 
>>
>> 
>>content
>>(brbr){3,}
>>brbr
>>true
>> 
>>
>> However, none of the \n is being removed this time round.
>> Is the order and/or the pattern correct?
>>
>> Regards,
>> Edwin
>>
>> On Tue, 5 Mar 2019 at 19:54,  wrote:
>>
>>> Hi Edwin
>>>
>>>
>>>
>>> Try for the first pattern/replacement
>>>
>>>
>>>
>>> [ \t]*\r?\n
>>>
>>> br
>>>
>>>
>>>
>>> Now all line endings and preceding whitespace characters should be
>>> changed to ‘’.
>>>
>>>
>>>
>>> The second pattern replacement should replace 3 or more ‘’ sequences
>>> to 2 ‘’ sequences:
>>>
>>>
>>>
>>> (brbr){3,}
>>>
>>> brbr
>>>
>>>
>>>
>>> Hope this approach works. Sorry for not replying earlier and best
>>> regards,
>>>
>>> Paul
>>>
>>>
>>>
>>>
>>>
>>> Gesendet von Mail für
>>> Windows 10
>>>
>>>
>>>
>>> Von: Zheng Lin Edwin Yeo
>>> Gesendet: Dienstag, 5. März 2019 03:35
>>> An: solr-user@lucene.apache.org
>>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>>>
>>>
>>>
>>> Hi,
>>>
>>> For your info, this issue is occurring in the new Solr 7.7.1 as well.
>>>
>>> Regards,
>>> Edwin
>>>
>>> On Mon, 25 Feb 2019 at 10:28, Zheng Lin Edwin Yeo 
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > Anyone else has other suggestions or have faced the same problem?
>>> >
>>> > Regards,
>>> > Edwin
>>> >
>>> > On Wed, 20 Feb 2019 at 16:58, Zheng Lin Edwin Yeo <
>>> edwinye...@gmail.com>
>>> > wrote:
>>> >
>>> >> Hi Paul,
>>> >>
>>> >> If I 

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-03-05 Thread paul.dodd
Hi Edwin



Try for the first pattern/replacement



[ \t]*\r?\n

br



Now all line endings and preceding whitespace characters should be changed to 
‘’.



The second pattern replacement should replace 3 or more ‘’ sequences to 2 
‘’ sequences:



(brbr){3,}

brbr



Hope this approach works. Sorry for not replying earlier and best regards,

Paul





Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Dienstag, 5. März 2019 03:35
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi,

For your info, this issue is occurring in the new Solr 7.7.1 as well.

Regards,
Edwin

On Mon, 25 Feb 2019 at 10:28, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Anyone else has other suggestions or have faced the same problem?
>
> Regards,
> Edwin
>
> On Wed, 20 Feb 2019 at 16:58, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi Paul,
>>
>> If I tried to execute the second step first, then I will only get a
>> single  for those with 2 .
>> For those that we originally get 4 , there will be 2  with a
>> space in between.
>>
>> This is just changing the 2  to be a single , since the second
>> step is to replace with a single .
>> But it has not solved the underlying problem yet.
>>
>> Regards,
>> Edwin
>>
>>
>> On Wed, 20 Feb 2019 at 16:41,  wrote:
>>
>>> If the second step is executed first, then you will get the unwanted 4
>>> 
>>>
>>>
>>>
>>> Gesendet von Mail für
>>> Windows 10
>>>
>>>
>>>
>>> Von: Zheng Lin Edwin Yeo
>>> Gesendet: Mittwoch, 20. Februar 2019 09:29
>>> An: solr-user@lucene.apache.org
>>> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>>>
>>>
>>>
>>> Hi Jörn ,
>>>
>>> Do you mean the regex is not correct?
>>>
>>> We are already using two RegexReplaceProcessorFactory steps, like the one
>>> shown below. The output that we get is still the same.
>>>
>>> 
>>>  content
>>>  ([ \t]*\r?\n){2,}
>>>  brbr
>>>  true
>>> 
>>>
>>> 
>>>  content
>>>  ([ \t]*\r?\n){1,}
>>>  br
>>>  true
>>> 
>>>
>>> Regards,
>>> Edwin
>>>
>>> On Wed, 20 Feb 2019 at 16:03, Jörn Franke  wrote:
>>>
>>> > Then you need two regexprocessfactory steps
>>> >
>>> > > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo <
>>> edwinye...@gmail.com
>>> > >:
>>> > >
>>> > > Hi,
>>> > >
>>> > > Thanks for the reply.
>>> > >
>>> > > Do you know of any regex online tool that works correctly for Java
>>> regex?
>>> > > I tried to find some, but they are not working properly.
>>> > >
>>> > > Yes, our plan is to replace more than one \n with , and
>>> single \n
>>> > > with single .
>>> > >
>>> > > Regards,
>>> > > Edwin
>>> > >
>>> > >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke 
>>> wrote:
>>> > >>
>>> > >> Solr uses Java regex matching, so i doubt there is a bug - it would
>>> then
>>> > >> be in the JDK. Try out in a regex online Tool that supports Java
>>> regex
>>> > for
>>> > >> your solution.
>>> > >>
>>> > >> I believe you want to have 2 regex process factories:
>>> > >> One that deals with single \n and one that deals with more than one
>>> \n
>>> > >>
>>> > >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
>>> > edwinye...@gmail.com
>>> > >>> :
>>> > >>>
>>> > >>> Hi,
>>> > >>>
>>> > >>> We have tried with the following pattern ([ \t]*\r?\n){2,} and
>>> > >>> configuration:
>>> > >>>
>>> > >>> 
>>> > >>>  content
>>> > >>>  ([ \t]*\r?\n){2,}
>>> > >>>  brbr
>>> > >>>  true
>>> > >>> 
>>> > >>>
>>> > >>> However, the issue is still occurring.
>>> > >>>
>>> > >>> Anyone else is able to help?
>>> > >>>
>>> > >>> Regards,
>>> > >>> Edwin
>>> > >>>
>>> > >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
>>> > edwinye...@gmail.com>
>>> > >>> wrote:
>>> > >>>
>>> >  Hi,
>>> > 
>>> >  For your info, this issue is occurring in Solr 7.7.0 as well.
>>> > 
>>> >  Regards,
>>> >  Edwin
>>> > 
>>> >  On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <
>>> > edwinye...@gmail.com
>>> > >>>
>>> >  wrote:
>>> > 
>>> > > Hi,
>>> > >
>>> > > Should we report this as a bug in Solr?
>>> > >
>>> > > Regards,
>>> > > Edwin
>>> > >
>>> > > On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <
>>> > edwinye...@gmail.com
>>> > >>>
>>> > > wrote:
>>> > >
>>> > >> Hi Paul,
>>> > >>
>>> > >> Regarding the regex (\n\s*){2,} that we are using, when we try
>>> in on
>>> > >> https://regex101.com/, it is able to give us the correct
>>> result for
>>> > >> all
>>> > >> the examples (ie: All of them will only have , and not
>>> more
>>> > >> than
>>> > >> that like what we are getting in Solr in our earlier examples).
>>> > >>
>>> > >> Could there be a possibility of a bug in Solr?
>>> > >>
>>> > >> Regards,
>>> > 

AW: %solr_logs_dir% does not like spaces

2019-02-26 Thread paul.dodd
Perhaps the instances of %SOLR_LOGS_DIR% in the solr.cmd files should be quoted 
i.e. "%SOLR_LOGS_DIR%" ??



Gesendet von Mail für Windows 10



Von: Arturas Mazeika
Gesendet: Dienstag, 26. Februar 2019 15:10
An: solr-user@lucene.apache.org
Betreff: Re: %solr_logs_dir% does not like spaces



Hi Paul,

getting rid of space in "program files" is doable, you are right. One way
to do it is through

   - echo %programfiles% ==> C:\Program Files
   - echo %programfiles(x86)% ==> C:\Program Files (x86)

Getting rid of spaces in sub directories is very difficult as we use tons
of those for different components of our suite.

Any other options to set it in some XML file or something?

Cheers,
Arturas


On Tue, Feb 26, 2019 at 3:03 PM  wrote:

> Looks like a bug in solr.cmd. You could try eliminating the spaces and/or
> opening an issue.
>
>
>
> Instead of ‘Program Files (x86)’ use ‘PROGRA~2’
>
> And don’t have spaces in your subdirectory…
>
>
>
> NB: Depending on your Windows Version you may Have another alias for
> ‘Program Files (x86)’; use «dir /X» to view the aliases.
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Arturas Mazeika
> Gesendet: Dienstag, 26. Februar 2019 14:41
> An: solr-user@lucene.apache.org
> Betreff: %solr_logs_dir% does not like spaces
>
>
>
> Hi All,
>
> I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
> into a subdirectory that contains spaces of a directory that contains
> spaces.
>
> If I set on windows:
>
> setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"
>
> and start a solr instance:
>
> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> F:\solr_deployment\solr_data -m 1g
>
> this goes smoothly.
>
> However If I set the logging directory to:
>
> setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
> Directory\logs\solr"
>
> then I get a cryptic error:
>
> F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
> F:\solr_deployment\solr_data -m 1g
> Files was unexpected at this time.
>
> If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that
> it
> dies around those lines in solr.cmd:
>
> F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
> Files was unexpected at this time.
>
> In the solr.cmd the following block is shown:
>
> IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks
>
> @REM This is quite hacky, but examples rely on a different log4j2.xml
> @REM so that we can write logs for examples to %SOLR_HOME%\..\logs
> IF [%SOLR_LOGS_DIR%] == [] (
>   set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
> ) ELSE (
>   set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
> )
>
> comments?
>
> Cheers,
> Arturas
>


AW: %solr_logs_dir% does not like spaces

2019-02-26 Thread paul.dodd
Looks like a bug in solr.cmd. You could try eliminating the spaces and/or 
opening an issue.



Instead of ‘Program Files (x86)’ use ‘PROGRA~2’

And don’t have spaces in your subdirectory…



NB: Depending on your Windows Version you may Have another alias for ‘Program 
Files (x86)’; use «dir /X» to view the aliases.



Gesendet von Mail für Windows 10



Von: Arturas Mazeika
Gesendet: Dienstag, 26. Februar 2019 14:41
An: solr-user@lucene.apache.org
Betreff: %solr_logs_dir% does not like spaces



Hi All,

I am testing solr 7.7 (and 7.6) under windows. My aim is to set logging
into a subdirectory that contains spaces of a directory that contains
spaces.

If I set on windows:

setx /m SOLR_LOGS_DIR "f:\solr_deployment\logs"

and start a solr instance:

F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
F:\solr_deployment\solr_data -m 1g

this goes smoothly.

However If I set the logging directory to:

setx /m SOLR_LOGS_DIR  "C:\Program Files (x86)\My Directory\Another
Directory\logs\solr"

then I get a cryptic error:

F:\solr_deployment\solr-7.7.0\bin\solr.cmd start -h localhost -p 8983 -s
F:\solr_deployment\solr_data -m 1g
Files was unexpected at this time.

If I comment "@echo off" in both solr.cmd and solr.cmd.in, it shows that it
dies around those lines in solr.cmd:

F:\solr_deployment\solr-7.7.0\bin>IF "" == "" set STOP_KEY=solrrocks
Files was unexpected at this time.

In the solr.cmd the following block is shown:

IF "%STOP_KEY%"=="" set STOP_KEY=solrrocks

@REM This is quite hacky, but examples rely on a different log4j2.xml
@REM so that we can write logs for examples to %SOLR_HOME%\..\logs
IF [%SOLR_LOGS_DIR%] == [] (
  set "SOLR_LOGS_DIR=%SOLR_SERVER_DIR%\logs"
) ELSE (
  set SOLR_LOGS_DIR=%SOLR_LOGS_DIR:"=%
)

comments?

Cheers,
Arturas


AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread paul.dodd
If the second step is executed first, then you will get the unwanted 4 



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Mittwoch, 20. Februar 2019 09:29
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Jörn ,

Do you mean the regex is not correct?

We are already using two RegexReplaceProcessorFactory steps, like the one
shown below. The output that we get is still the same.


 content
 ([ \t]*\r?\n){2,}
 brbr
 true



 content
 ([ \t]*\r?\n){1,}
 br
 true


Regards,
Edwin

On Wed, 20 Feb 2019 at 16:03, Jörn Franke  wrote:

> Then you need two regexprocessfactory steps
>
> > Am 20.02.2019 um 08:12 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > Thanks for the reply.
> >
> > Do you know of any regex online tool that works correctly for Java regex?
> > I tried to find some, but they are not working properly.
> >
> > Yes, our plan is to replace more than one \n with , and single \n
> > with single .
> >
> > Regards,
> > Edwin
> >
> >> On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:
> >>
> >> Solr uses Java regex matching, so i doubt there is a bug - it would then
> >> be in the JDK. Try out in a regex online Tool that supports Java regex
> for
> >> your solution.
> >>
> >> I believe you want to have 2 regex process factories:
> >> One that deals with single \n and one that deals with more than one \n
> >>
> >>> Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>> :
> >>>
> >>> Hi,
> >>>
> >>> We have tried with the following pattern ([ \t]*\r?\n){2,} and
> >>> configuration:
> >>>
> >>> 
> >>>  content
> >>>  ([ \t]*\r?\n){2,}
> >>>  brbr
> >>>  true
> >>> 
> >>>
> >>> However, the issue is still occurring.
> >>>
> >>> Anyone else is able to help?
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >>> wrote:
> >>>
>  Hi,
> 
>  For your info, this issue is occurring in Solr 7.7.0 as well.
> 
>  Regards,
>  Edwin
> 
>  On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
>  wrote:
> 
> > Hi,
> >
> > Should we report this as a bug in Solr?
> >
> > Regards,
> > Edwin
> >
> > On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
> > wrote:
> >
> >> Hi Paul,
> >>
> >> Regarding the regex (\n\s*){2,} that we are using, when we try in on
> >> https://regex101.com/, it is able to give us the correct result for
> >> all
> >> the examples (ie: All of them will only have , and not more
> >> than
> >> that like what we are getting in Solr in our earlier examples).
> >>
> >> Could there be a possibility of a bug in Solr?
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> wrote:
> >>
> >>> Hi Paul,
> >>>
> >>> We have tried it with the space preceeding the \n i.e.  >>> name="pattern">(\s*\n){2,}, with the following regex pattern:
> >>>
> >>> 
> >>>  content
> >>>  (\s*\n){2,}
> >>>  brbr
> >>> 
> >>>
> >>> However, we are also getting the exact same results as the earlier
> >>> Example 1, 2 and 3.
> >>>
> >>> As for your point 2 on perhaps in the data you have other (non
> >>> printing) characters than \n, we have find that there are no non
> >> printing
> >>> characters. It is just next line with a space. You can refer to the
> >>> original content in the same examples below.
> >>>
> >>>
> >>> Example 1: The sentence that the above regex pattern is working
> >>> correctly
> >>> *Original content in EML file:*
> >>> Dear Sir,
> >>>
> >>>
> >>> I am terminating
> >>> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> >>> *Index content: *Dear Sir,  I am terminating
> >>>
> >>> Example 2: The sentence that the above regex pattern is partially
> >>> working (as you can see, instead of 2 , there are 4 )
> >>> *Original content in EML file:*
> >>>
> >>> *exalted*
> >>>
> >>> *Psalm 89:17*
> >>>
> >>>
> >>> 3 Choa Chu Kang Avenue 4
> >>> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
> >>> Choa Chu Kang Avenue 4, Singapore
> >>> *Index content: *exalted  Psalm 89:17 3
> >>> Choa Chu Kang Avenue 4, Singapore
> >>>
> >>> Example 3: The sentence that the above regex pattern is partially
> >>> working (as you can see, instead of 2 , there are 4 )
> >>> *Original content in EML file:*
> >>>
> >>> http://www.concordpri.moe.edu.sg/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> 

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-20 Thread paul.dodd
BTW, which Java Version are you using?



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Mittwoch, 20. Februar 2019 08:13
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi,

Thanks for the reply.

Do you know of any regex online tool that works correctly for Java regex?
I tried to find some, but they are not working properly.

Yes, our plan is to replace more than one \n with , and single \n
with single .

Regards,
Edwin

On Wed, 20 Feb 2019 at 14:59, Jörn Franke  wrote:

> Solr uses Java regex matching, so i doubt there is a bug - it would then
> be in the JDK. Try out in a regex online Tool that supports Java regex for
> your solution.
>
> I believe you want to have 2 regex process factories:
> One that deals with single \n and one that deals with more than one \n
>
> > Am 20.02.2019 um 06:17 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > We have tried with the following pattern ([ \t]*\r?\n){2,} and
> > configuration:
> >
> > 
> >   content
> >   ([ \t]*\r?\n){2,}
> >   brbr
> >   true
> > 
> >
> > However, the issue is still occurring.
> >
> > Anyone else is able to help?
> >
> > Regards,
> > Edwin
> >
> > On Fri, 15 Feb 2019 at 11:47, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi,
> >>
> >> For your info, this issue is occurring in Solr 7.7.0 as well.
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Tue, 12 Feb 2019 at 00:10, Zheng Lin Edwin Yeo  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Should we report this as a bug in Solr?
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Fri, 8 Feb 2019 at 22:18, Zheng Lin Edwin Yeo  >
> >>> wrote:
> >>>
>  Hi Paul,
> 
>  Regarding the regex (\n\s*){2,} that we are using, when we try in on
>  https://regex101.com/, it is able to give us the correct result for
> all
>  the examples (ie: All of them will only have , and not more
> than
>  that like what we are getting in Solr in our earlier examples).
> 
>  Could there be a possibility of a bug in Solr?
> 
>  Regards,
>  Edwin
> 
>  On Fri, 8 Feb 2019 at 00:33, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
>  wrote:
> 
> > Hi Paul,
> >
> > We have tried it with the space preceeding the \n i.e.  > name="pattern">(\s*\n){2,}, with the following regex pattern:
> >
> > 
> >   content
> >   (\s*\n){2,}
> >   brbr
> > 
> >
> > However, we are also getting the exact same results as the earlier
> > Example 1, 2 and 3.
> >
> > As for your point 2 on perhaps in the data you have other (non
> > printing) characters than \n, we have find that there are no non
> printing
> > characters. It is just next line with a space. You can refer to the
> > original content in the same examples below.
> >
> >
> > Example 1: The sentence that the above regex pattern is working
> > correctly
> > *Original content in EML file:*
> > Dear Sir,
> >
> >
> > I am terminating
> > *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> > *Index content: *Dear Sir,  I am terminating
> >
> > Example 2: The sentence that the above regex pattern is partially
> > working (as you can see, instead of 2 , there are 4 )
> > *Original content in EML file:*
> >
> > *exalted*
> >
> > *Psalm 89:17*
> >
> >
> > 3 Choa Chu Kang Avenue 4
> > *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3
> > Choa Chu Kang Avenue 4, Singapore
> > *Index content: *exalted  Psalm 89:17 3
> > Choa Chu Kang Avenue 4, Singapore
> >
> > Example 3: The sentence that the above regex pattern is partially
> > working (as you can see, instead of 2 , there are 4 )
> > *Original content in EML file:*
> >
> > http://www.concordpri.moe.edu.sg/
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Dec 18, 2018 at 10:07 AM
> > *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n
> \n
> > \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue,
> Dec 18,
> > 2018 at 10:07 AM
> > *Index content: *http://www.concordpri.moe.edu.sg/   
> > On Tue, Dec 18, 2018 at 10:07 AM
> >
> >
> > Appreciate any other ideas or suggestions that you may have.
> >
> > Thank you.
> >
> > Regards,
> > Edwin
> >
> >> On Thu, 7 Feb 2019 at 22:49,  wrote:
> >>
> >> Hi Edwin
> >>
> >>
> >>
> >>  1.  Sorry, the pattern was wrong, the space should preceed the \n
> >> i.e. (\s*\n){2,}
> >>  2.  Perhaps in the data you have other (non printing) characters
> >> than \n?
> >>
> >>
> >>
> >> Gesendet von Mail
> für
> >> Windows 10
> >>

AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
Hi Edwin



  1.  Sorry, the pattern was wrong, the space should preceed the \n i.e. (\s*\n){2,}
  2.  Perhaps in the data you have other (non printing) characters than \n?



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Donnerstag, 7. Februar 2019 15:23
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Paul,

We have tried this suggested regex pattern as follow:

   content
   (\n\s*){2,}
   brbr


But we still have exactly the same problem of Example 1,2 and 3 below.

Example 1: The sentence that the above regex pattern is working correctly
*Original content:*Dear Sir,  \n\n \n \n\n I am terminating
*Index content: *Dear Sir,  I am terminating

Example 2: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
Chu Kang Avenue 4, Singapore
*Index content: *exalted  Psalm 89:17 3 Choa
Chu Kang Avenue 4, Singapore

Example 3: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n \n\n
\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
at 10:07 AM
*Index content: *http://www.concordpri.moe.edu.sg/ On
Tue, Dec 18, 2018 at 10:07 AM

Any further suggestion?

Thank you.

Regards,
Edwin

On Thu, 7 Feb 2019 at 22:20,  wrote:

> To avoid the «\n+\s*» matching too many \n and then failing on the {2,}
> part you could try
>
>
>
> (\n\s*){2,}
>
>
>
> If you also want to match CRLF then
>
> (\r?\n\s*){2,}
>
>
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Donnerstag, 7. Februar 2019 15:10
> An: solr-user@lucene.apache.org
> Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi Paul,
>
> Thanks for your reply.
>
> When I use this pattern:
> 
>content
>(\n+\s*){2,}
>brbr
> 
>
> It is working for some sentence within the same content and not working for
> some sentences. Please see below for the one that is working and another
> that is not working (partially working):
>
> Example 1: The sentence that the above regex pattern is working correctly
> *Original content:*Dear Sir,  \n\n \n \n\n I am terminating
> *Index content: *Dear Sir,  I am terminating
>
> Example 2: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
> Chu Kang Avenue 4, Singapore
> *Index content: *exalted  Psalm 89:17 3 Choa
> Chu Kang Avenue 4, Singapore
>
> Example 3: The sentence that the above regex pattern is partially working
> (as you can see, instead of 2 , there are 4 )
> *Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n
> \n\n
> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
> at 10:07 AM
> *Index content: *http://www.concordpri.moe.edu.sg/ On
> Tue, Dec 18, 2018 at 10:07 AM
>
> We would appreciate your help to see what is wrong?
>
> Thank you.
>
> Regards,
> Edwin
>
> On Thu, 7 Feb 2019 at 21:24,  wrote:
>
> > You don’t say what happens, just that it is not working. I assume nothing
> > is replaced? Perhaps the pattern should be
> >
> >
> >
> >"(\n\s*){2,}"
> >
> >
> >
> > ??
> >
> >
> >
> > Gesendet von Mail für
> > Windows 10
> >
> >
> >
> > Von: Zheng Lin Edwin Yeo
> > Gesendet: Donnerstag, 7. Februar 2019 14:08
> > An: solr-user@lucene.apache.org
> > Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n
> >
> >
> >
> > Hi,
> >
> > I am trying to use the RegexReplaceProcessorFactory to remove more than
> two
> > \n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n
> \n),
> > and replace it with two .
> >
> > I use the following regex pattern and it is working when I test it in
> > regex101.com. But it is not working when I put it inside the
> > RegexReplaceProcessorFactory as below:
> >
> > 
> > 
> >content
> >"(\\n\s*){2,}"
> >brbr
> > 
> >   
> >
> > To explain further about my regex pattern, \s* is instructing the regex
> to
> > match any \n that have space after and {2,} is instructing the regex to
> > match 2 or more occurrence of such pattern (\n).
> >
> > Please kindly let me know what is wrong and how should I do it?
> >
> > I am using Solr 7.6.0.
> >
> > Regards,
> > Edwin
> >
>


AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
To avoid the «\n+\s*» matching too many \n and then failing on the {2,} part 
you could try



(\n\s*){2,}



If you also want to match CRLF then

(\r?\n\s*){2,}





Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Donnerstag, 7. Februar 2019 15:10
An: solr-user@lucene.apache.org
Betreff: Re: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi Paul,

Thanks for your reply.

When I use this pattern:

   content
   (\n+\s*){2,}
   brbr


It is working for some sentence within the same content and not working for
some sentences. Please see below for the one that is working and another
that is not working (partially working):

Example 1: The sentence that the above regex pattern is working correctly
*Original content:*Dear Sir,  \n\n \n \n\n I am terminating
*Index content: *Dear Sir,  I am terminating

Example 2: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa
Chu Kang Avenue 4, Singapore
*Index content: *exalted  Psalm 89:17 3 Choa
Chu Kang Avenue 4, Singapore

Example 3: The sentence that the above regex pattern is partially working
(as you can see, instead of 2 , there are 4 )
*Original content:* http://www.concordpri.moe.edu.sg/   \n\n   \n\n \n \n\n
\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018
at 10:07 AM
*Index content: *http://www.concordpri.moe.edu.sg/ On
Tue, Dec 18, 2018 at 10:07 AM

We would appreciate your help to see what is wrong?

Thank you.

Regards,
Edwin

On Thu, 7 Feb 2019 at 21:24,  wrote:

> You don’t say what happens, just that it is not working. I assume nothing
> is replaced? Perhaps the pattern should be
>
>
>
>"(\n\s*){2,}"
>
>
>
> ??
>
>
>
> Gesendet von Mail für
> Windows 10
>
>
>
> Von: Zheng Lin Edwin Yeo
> Gesendet: Donnerstag, 7. Februar 2019 14:08
> An: solr-user@lucene.apache.org
> Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n
>
>
>
> Hi,
>
> I am trying to use the RegexReplaceProcessorFactory to remove more than two
> \n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n \n),
> and replace it with two .
>
> I use the following regex pattern and it is working when I test it in
> regex101.com. But it is not working when I put it inside the
> RegexReplaceProcessorFactory as below:
>
> 
> 
>content
>"(\\n\s*){2,}"
>brbr
> 
>   
>
> To explain further about my regex pattern, \s* is instructing the regex to
> match any \n that have space after and {2,} is instructing the regex to
> match 2 or more occurrence of such pattern (\n).
>
> Please kindly let me know what is wrong and how should I do it?
>
> I am using Solr 7.6.0.
>
> Regards,
> Edwin
>


AW: RegexReplaceProcessorFactory pattern to detect multiple \n

2019-02-07 Thread paul.dodd
You don’t say what happens, just that it is not working. I assume nothing is 
replaced? Perhaps the pattern should be



   "(\n\s*){2,}"



??



Gesendet von Mail für Windows 10



Von: Zheng Lin Edwin Yeo
Gesendet: Donnerstag, 7. Februar 2019 14:08
An: solr-user@lucene.apache.org
Betreff: RegexReplaceProcessorFactory pattern to detect multiple \n



Hi,

I am trying to use the RegexReplaceProcessorFactory to remove more than two
\n with any number of spaces between them (Eg: \n\n, \n \n, \n \n  \n \n),
and replace it with two .

I use the following regex pattern and it is working when I test it in
regex101.com. But it is not working when I put it inside the
RegexReplaceProcessorFactory as below:



   content
   "(\\n\s*){2,}"
   brbr

  

To explain further about my regex pattern, \s* is instructing the regex to
match any \n that have space after and {2,} is instructing the regex to
match 2 or more occurrence of such pattern (\n).

Please kindly let me know what is wrong and how should I do it?

I am using Solr 7.6.0.

Regards,
Edwin


AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd
Hi

If the reason for the difference in speed is that the index is being read from 
disk, I would expect that the first query would be slow, but subsequent queries 
on the same collection should speed up. A query on the other collection could 
then be slower. In this case I would say that this is normal behavior. The OS 
file cache cannot be relied upon to give the same results in different 
circumstances, including different software  versions.

You may wish to install the RamMap tool[1], [2], although you may be having the 
inverse problem to that described in [1]. You can then see how much space is 
used by the cache and other demands.

If subsequent queries are fast, then to me it does not seem like a problem for 
a development machine.  For production you may wish to store  the indices in 
ram and/or change from windows to linux, id it is important that all queries 
including the first are very fast.

Have a nice day
Paul

-Ursprüngliche Nachricht-
Von: Shawn Heisey  
Gesendet: Dienstag, 29. Januar 2019 13:25
An: solr-user@lucene.apache.org
Betreff: Re: Indexing in one collection affect index in another collection

On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
> My guess is after we change our searchFields_tcs schema which is:
> 
> *From*:
>  stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
> 
> *To:*
>  stored="true" multiValued="true" storeOffsetsWithPositions="true"
> termVectors="true" termPositions="false" termOffsets="false"/>

Adding termVectors will make the index bigger.  Potentially much bigger. 
  This will increase the overall RAM requirement of the server, especially if 
the server is handling software other than Solr.  Anything that makes the index 
bigger can affect performance.

> The above change was done in order to use the Solr recommended unified 
> highlighter (Posting with light term vectors) with Solr's 
> documentation claimed it is the fastest.
> 
> My best guess is Solr 7.5.0 has some bugs that slowed down the whole 
> index and queries with the new approach (above new dynamicField 
> schema), which it affects the index OS filecaching or any other issues.
> 
> So I kindly suggest you look deeper and see whether such bugs are exists?

I know almost nothing about highlighting.  I wouldn't be able to look for bugs.

Thanks,
Shawn


AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd
References, sorry:

[1] 
https://support.microsoft.com/en-ca/help/976618/you-experience-performance-issues-in-applications-and-services-when-th
[2] https://docs.microsoft.com/en-us/sysinternals/downloads/rammap

-Ursprüngliche Nachricht-
Von: Dodd, Paul Sutton (UB) 
Gesendet: Dienstag, 29. Januar 2019 13:31
An: 'solr-user@lucene.apache.org' 
Betreff: AW: Indexing in one collection affect index in another collection

Hi

If the reason for the difference in speed is that the index is being read from 
disk, I would expect that the first query would be slow, but subsequent queries 
on the same collection should speed up. A query on the other collection could 
then be slower. In this case I would say that this is normal behavior. The OS 
file cache cannot be relied upon to give the same results in different 
circumstances, including different software  versions.

You may wish to install the RamMap tool[1], [2], although you may be having the 
inverse problem to that described in [1]. You can then see how much space is 
used by the cache and other demands.

If subsequent queries are fast, then to me it does not seem like a problem for 
a development machine.  For production you may wish to store  the indices in 
ram and/or change from windows to linux, id it is important that all queries 
including the first are very fast.

Have a nice day
Paul

-Ursprüngliche Nachricht-
Von: Shawn Heisey 
Gesendet: Dienstag, 29. Januar 2019 13:25
An: solr-user@lucene.apache.org
Betreff: Re: Indexing in one collection affect index in another collection

On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote:
> My guess is after we change our searchFields_tcs schema which is:
> 
> *From*:
>  stored="true" multiValued="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
> 
> *To:*
>  stored="true" multiValued="true" storeOffsetsWithPositions="true"
> termVectors="true" termPositions="false" termOffsets="false"/>

Adding termVectors will make the index bigger.  Potentially much bigger. 
  This will increase the overall RAM requirement of the server, especially if 
the server is handling software other than Solr.  Anything that makes the index 
bigger can affect performance.

> The above change was done in order to use the Solr recommended unified 
> highlighter (Posting with light term vectors) with Solr's 
> documentation claimed it is the fastest.
> 
> My best guess is Solr 7.5.0 has some bugs that slowed down the whole 
> index and queries with the new approach (above new dynamicField 
> schema), which it affects the index OS filecaching or any other issues.
> 
> So I kindly suggest you look deeper and see whether such bugs are exists?

I know almost nothing about highlighting.  I wouldn't be able to look for bugs.

Thanks,
Shawn