Re: Solr - complete setup (update)

2019-01-29 Thread Joan Moreau via dovecot

On 2019-01-30 07:33, Stephan Bosch wrote:


(forgot to CC mailing list)

Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot: 


*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong) *You can check that 
regularly in dovecot log file. My guess is the mix of Unicode which is not properly 
addressed here.*


Does this happen with specific messages? Do you have a sample message
for me? I don't see how Unicode could cause this. 


MY ONLY GUESS IS THAT IT REFERS TO SOME 'STRLEN', WHICH IS WRONG OF
COURSE IN CASE OF UNICODE EMAILS. THIS IS JUST A GUESS. 


BUT DO A GREP FOR "HUGE" IN THE DOVECOT LOG OF A BUSY SERVER TO FIND
EXAMPLES. 


(SORRY, I SWITCHED TO XAPIAN, AS SOLR IS CREATING TOO MUCH TROUBLES FOR
MY SERVER, SO NO MORE CONCRETE EXAMPLE) 


-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)

*This is just highly visible in Solr schema.xml. Swithcing it to "long" in 
schema.xml returns plenty of errors.*


I cannot reproduce this so far (see modified schema below). In a simple
test I just get the desired results and no errors logged. 


I got this with large mailboxes (where UID seems not acceptable for Solr
). The fault is not on Dovecot side but Solr, and the returned UID(s)
for a search is garbage instead of a proper value -> Putting it as
string solves this


-> Java errors : A lot of non sense for me, I am not expert in Java. But, with 
increased memory, it seems not crashing, even if complaining quite a lot in the 
logs

Can you elaborate on the errors you have seen so far? When do these happen? How 
can I reproduce them?

*Honestly, I have no clue what the problems are. I just increased the memory of 
the JVM and the systems stopped crashing. Log files are huge anyway.*


What errors do you see? I see only INFO entries in my
/var/solr/logs/solr.log. Looks like Solr is pretty verbose by default
(lots of INFO output), but there must be a way to reduce that. 

I DELETED SOLR. NO MORE LOGS. MAYBE SOMEONE ELSE CAN TELL. 




id
















































Re: Solr - complete setup (update)

2019-01-29 Thread Stephan Bosch

(forgot to CC mailing list)

Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot:



*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is 
improperly calculated ("huge header" warning for a simple email, 
which kilss the index of that considered email, so basically MOST 
emails as the calculation is wrong)
*You can check that regularly in dovecot log file. My guess is the mix 
of Unicode which is not properly addressed here.*


Does this happen with specific messages? Do you have a sample message 
for me? I don't see how Unicode could cause this.




-> The UID returned by SOlr is to be considered as a STRING (and that 
is maybe the source of problem of the "out of bound" errors in 
fts_solr dovecot, as "long" is not enough)
*This is just highly visible in Solr schema.xml. Swithcing it to 
"long" in schema.xml returns plenty of errors.*


I cannot reproduce this so far (see modified schema below). In a simple 
test I just get the desired results and no errors logged.




-> Java errors : A lot of non sense for me, I am not expert in Java. 
But, with increased memory, it seems not crashing, even if 
complaining quite a lot in the logs


Can you elaborate on the errors you have seen so far? When do these 
happen? How can I reproduce them?


*Honestly, I have no clue what the problems are. I just increased the 
memory of the JVM and the systems stopped crashing. Log files are huge 
anyway.*


What errors do you see? I see only INFO entries in my 
/var/solr/logs/solr.log. Looks like Solr is pretty verbose by default 
(lots of INFO output), but there must be a way to reduce that.


Regards,

Stephan.




id
positionIncrementGap="0"/>
autoGeneratePhraseQueries="true" positionIncrementGap="100">



generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" 
splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/>













autoGeneratePhraseQueries="true">




















stored="true"/>




stored="true"/>




stored="true"/>








Re: Solr - complete setup (update)

2019-01-26 Thread Joan Moreau via dovecot

*- Installation:*

-> Create a clean install using the default, (at least in the Archlinux package), and do 
a "sudo -u solr solr create -c dovecot ". The config files are then in 
/opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data


On my system (Debian) these directories are wildly different (e.g. data
is under /var), but other than that, this information is OK.

Used this as a side-reference for Debian installation:
https://tecadmin.net/install-apache-solr-on-debian/

Accessed http://solr-host.tld:8983/solr/ to check whether all is OK. 


MAKE SURE YOU HAVE A DOVECOT INSTANCE (NOT THE DEFAULT INSTANCE) , WITH
THE FUNCTION BELOW: 

SOLR CREATE -C DOVECOT (OR WHATEVER NAME) 


Weirdly, rescan returns immediately here. When I perform `doveadm index INBOX` 
for my test user, I do see a lot of fts and HTTP activity.


THE SOLR PLUGIN IS NOT CODED ENTIRELY, REFRESH AND RESCAN FUNCTIONS ARE
MISSING : 


https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/fts-backend-solr.c


static int fts_backend_solr_refresh(struct fts_backend *backend
ATTR_UNUSED)
{
return 0;
} 


static int fts_backend_solr_rescan(struct fts_backend *backend)
{
/* FIXME: proper rescan needed. for now we'll just reset the
last-uids */
return fts_backend_reset_last_uids(backend);
} 


*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong)


YOU CAN CHECK THAT REGULARLY IN DOVECOT LOG FILE. MY GUESS IS THE MIX OF
UNICODE WHICH IS NOT PROPERLY ADDRESSED HERE. 


-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)


THIS IS JUST HIGHLY VISIBLE IN SOLR SCHEMA.XML. SWITHCING IT TO "LONG"
IN SCHEMA.XML RETURNS PLENTY OF ERRORS. 

-> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs 


Can you elaborate on the errors you have seen so far? When do these happen? How 
can I reproduce them?


HONESTLY, I HAVE NO CLUE WHAT THE PROBLEMS ARE. I JUST INCREASED THE
MEMORY OF THE JVM AND THE SYSTEMS STOPPED CRASHING. LOG FILES ARE HUGE
ANYWAY.

Re: Solr - complete setup (update)

2019-01-26 Thread Stephan Bosch




Op 26/01/2019 om 15:24 schreef Hendrik Boom:

On Sat, Jan 26, 2019 at 01:44:16PM +0100, Stephan Bosch wrote:

Hi Joan,

Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:

Hi Stephan,

What's up with that ?

Thank you so much

On 2019-01-05 02:04, Stephan Bosch wrote:

Debian does something weird here. It doesn't use an explicit systemd unit.
It is generated from the SysV init file. I ended up setting the ulimits in
/etc/security/limits.conf for user solr.

Please make sure the changes you make don't make your Debian package
*require* systemd.  There are Debian-derived distros that avoid systemd.


Don't worry, I am not working on packaging this. I just want to know 
what the problems are and how these can be solved, so that we can update 
the wiki.


Regards,

Stephan.


Re: Solr - complete setup (update)

2019-01-26 Thread Hendrik Boom
On Sat, Jan 26, 2019 at 01:44:16PM +0100, Stephan Bosch wrote:
> Hi Joan,
> 
> Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:
> > 
> > Hi Stephan,
> > 
> > What's up with that ?
> > 
> > Thank you so much
> > 
> > On 2019-01-05 02:04, Stephan Bosch wrote:
> > 
> > > Hi,
> > > 
> > > Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:
> > > > 
...
...
> > > > 
> > > > -> The systemd unit shall specify high ulimit for files and proc
> > > > (see below)
> 
> Debian does something weird here. It doesn't use an explicit systemd unit.
> It is generated from the SysV init file. I ended up setting the ulimits in
> /etc/security/limits.conf for user solr.

Please make sure the changes you make don't make your Debian package 
*require* systemd.  There are Debian-derived distros that avoid systemd.

-- hendrik


Re: Solr - complete setup (update)

2019-01-26 Thread Stephan Bosch

Hi Joan,

Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:


Hi Stephan,

What's up with that ?

Thank you so much

On 2019-01-05 02:04, Stephan Bosch wrote:


Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:


Hi

This is the summary of my work with SOLR-Dovecot, in my *quest to 
reproduce the previoulsy excellent work of fts_squat*



@Aki : Based on the time I have spent on this, I would love to see 
you updating the Wiki with those improvements, and adding my name 
somewhere


@All : Hope it helps








*- Installation:*

-> Create a clean install using the default, (at least in the 
Archlinux package), and do a "sudo -u solr solr create -c dovecot ". 
The config files are then in /opt/solr/server/solr/dovecot/conf and 
datafiles in /opt/solr/server/solr/dovecot/data


On my system (Debian) these directories are wildly different (e.g. data 
is under /var), but other than that, this information is OK.


Used this as a side-reference for Debian installation: 
https://tecadmin.net/install-apache-solr-on-debian/


Accessed http://solr-host.tld:8983/solr/ to check whether all is OK.



-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

 * around line 313, change false to 
true


 * around line 147, set 
2000 (or above)


 * around line 696 : uncomment hdr

 * around line 1127, before class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add 



 * around line 1161, delete the whole class="solr.AddSchemaFieldsUpdateProcessorFactory" 
name="add-schema-fields">


    * around line 1192, remove the whole 
... />


Applied these changes. We should probably provide an example config file 
on the Wiki that incorporates all this.. or maybe a diff.


We also need to evaluate what the merit of all of this is. I did 
something similar in my previous effort, but it was all based on getting 
an error from Solr and then removing that section of the config file 
with the assumption it wasn't needed. So far, I have little clue what 
these things are and why these things are enabled by default. As I said 
in an earlier mail, there is an option to leave some of this cruft out 
at backend initialization, but I haven't tried that yet.




-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat 
behavior  (equivalent to " fts_squat = partial=3 full=25" in 
dovecot.conf) (note : such a huge trouble to replace a single line 
setup, anyway...)


Did that too.



-> Move /opt/solr/server/solr (or the subfolder data) to a partition 
with *space*, ideally ext4 or faster file system (it looks like Solr 
is not considering using a simple mysql database, which would make 
sense to avoid all the fuzz and let it transit to a non-java state, 
but that is another story)


Skipped that.


-> Config of dovecot.conf is as below


I also enabled debug for fts_solr.



-> The systemd unit shall specify high ulimit for files and proc 
(see below)


Debian does something weird here. It doesn't use an explicit systemd 
unit. It is generated from the SysV init file. I ended up setting the 
ulimits in /etc/security/limits.conf for user solr.




-> Increase the memory available for the JavaVM (I put 12Gb as I 
have quite a space on my server, but you may adapt it as per your 
specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"


Skipped that.



-> As Solr is complaining a lot, you may consider a filter for it in 
your syslog-ng or journald as it pollutes greatly your audit files


What does it complain about and when does it happen? I haven't seen much 
logging from Solr so far.




-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes


Weirdly, rescan returns immediately here. When I perform `doveadm index 
INBOX` for my test user, I do see a lot of fts and HTTP activity.



*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is 
improperly calculated ("huge header" warning for a simple email, 
which kilss the index of that considered email, so basically MOST 
emails as the calculation is wrong)


-> The UID returned by SOlr is to be considered as a STRING (and 
that is maybe the source of problem of the "out of bound" errors in 
fts_solr dovecot, as "long" is not enough)


-> Java errors : A lot of non sense for me, I am not expert in Java. 
But, with increased memory, it seems not crashing, even if 
complaining quite a lot in the logs


Can you elaborate on the errors you have seen so far? When do these 
happen? How can I reproduce them?


Regards,

Stephan.





*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id
autoGeneratePhraseQueries="true" positionIncrementGap="100">



catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" 
generateWordParts="1" splitOnNumerics="1" catenateAll="1" 
catenateWords="1" preserveOriginal="1"/>














Re: Solr - complete setup (update)

2019-01-18 Thread Joan Moreau via dovecot

Yes, the " -property update.autoCreateFields -value false " seems
interesting 

However, we smash the created schema just after 


On 2019-01-14 23:25, Stephan Bosch wrote:

Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot: 


Hi Stephan,

What's up with that ?

Thank you so much


Working on it, somewhat anyway.

BTW, did you see this ? :

"""
$ sudo -u solr /opt/solr/bin/solr create -c dovecot
WARNING: Using _default configset with data driven schema functionality. NOT 
RECOMMENDED for production use.
To turn off: bin/solr config -c dovecot -p 8983 -action set-user-property 
-property update.autoCreateFields -value false
INFO  - 2019-01-14 23:19:56.831; 
org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL 
Credential Provider chain: env;sysprop

Created new core 'dovecot'
"""

I'll be trying your steps first, but the mentioned command might at least get 
rid of some of the cruft in the default config file.

Regards,

Stephan.

On 2019-01-05 02:04, Stephan Bosch wrote:

Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot: 
Hi


This is the summary of my work with SOLR-Dovecot, in my *quest to reproduce the 
previoulsy excellent work of fts_squat*

@Aki : Based on the time I have spent on this, I would love to see you updating 
the Wiki with those improvements, and adding my name somewhere

@All : Hope it helps

I'll be going through the description below soon. I've recently independently 
installed fts-solr from scratch. Although this wasn't a flawless effort, I 
managed to get some basic indexing going. From this mail thread I understand 
that there are quite a few more problems than I've seen myself so far. Then 
again, I didn't perform extensive tests with actual searches.

Maybe we can turn all this into a test suite that we can run internally here at 
Dovecot. At the very least, the described Dovecot bugs need to be addressed and 
the wiki needs to be updated.

I'll get back to you.

Regards,

Stephan.

*- Installation:*

-> Create a clean install using the default, (at least in the Archlinux package), and do 
a "sudo -u solr solr create -c dovecot ". The config files are then in 
/opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data

-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

* around line 313, change false to 
true

* around line 147, set 2000 (or above)

* around line 696 : uncomment hdr

* around line 1127, before , 
add 

* around line 1161, delete the whole 

* around line 1192, remove the whole 

-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat behavior  (equivalent to 
" fts_squat = partial=3 full=25" in dovecot.conf) (note : such a huge trouble to replace a 
single line setup, anyway...)

-> Move /opt/solr/server/solr (or the subfolder data) to a partition with 
*space*, ideally ext4 or faster file system (it looks like Solr is not considering 
using a simple mysql database, which would make sense to avoid all the fuzz and 
let it transit to a non-java state, but that is another story)

-> Config of dovecot.conf is as below

-> The systemd unit shall specify high ulimit for files and proc (see below)

-> Increase the memory available for the JavaVM (I put 12Gb as I have quite a space on my 
server, but you may adapt it as per your specs) : in /opt/solr/bin/solr.in.sh, set 
SOLR_HEAP="12288m"

-> As Solr is complaining a lot, you may consider a filter for it in your 
syslog-ng or journald as it pollutes greatly your audit files

-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes

*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong)

-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)

-> Java errors : A lot of non sense for me, I am not expert in Java. But, with 
increased memory, it seems not crashing, even if complaining quite a lot in the 
logs

*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id















































*-- DOVECOT.CONF*

mail_plugins = fts fts_solr

plugin {
plugin = fts fts_solr managesieve sieve

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/

(replace 127.0.0.1 by your solr server if you want to use an external server)
(...)

}

*-- /etc/systemd/system/multi-user.target.wants/solr.service*

[Unit]
Description=Solr full text search engine
After=network.target

[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
*LimitNOFILE=65000*
*LimitNPROC=65000*

Re: Solr - complete setup (update)

2019-01-14 Thread Stephan Bosch




Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:


Hi Stephan,

What's up with that ?

Thank you so much



Working on it, somewhat anyway.

BTW, did you see this ? :

"""
$ sudo -u solr /opt/solr/bin/solr create -c dovecot
WARNING: Using _default configset with data driven schema functionality. 
NOT RECOMMENDED for production use.
 To turn off: bin/solr config -c dovecot -p 8983 -action 
set-user-property -property update.autoCreateFields -value false
INFO  - 2019-01-14 23:19:56.831; 
org.apache.solr.util.configuration.SSLCredentialProviderFactory; 
Processing SSL Credential Provider chain: env;sysprop


Created new core 'dovecot'
"""

I'll be trying your steps first, but the mentioned command might at 
least get rid of some of the cruft in the default config file.


Regards,

Stephan.



On 2019-01-05 02:04, Stephan Bosch wrote:


Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:


Hi

This is the summary of my work with SOLR-Dovecot, in my *quest to 
reproduce the previoulsy excellent work of fts_squat*



@Aki : Based on the time I have spent on this, I would love to see 
you updating the Wiki with those improvements, and adding my name 
somewhere


@All : Hope it helps

I'll be going through the description below soon. I've recently 
independently installed fts-solr from scratch. Although this wasn't a 
flawless effort, I managed to get some basic indexing going. From 
this mail thread I understand that there are quite a few more 
problems than I've seen myself so far. Then again, I didn't perform 
extensive tests with actual searches.


Maybe we can turn all this into a test suite that we can run 
internally here at Dovecot. At the very least, the described Dovecot 
bugs need to be addressed and the wiki needs to be updated.


I'll get back to you.


Regards,

Stephan.





*- Installation:*

-> Create a clean install using the default, (at least in the 
Archlinux package), and do a "sudo -u solr solr create -c dovecot ". 
The config files are then in /opt/solr/server/solr/dovecot/conf and 
datafiles in /opt/solr/server/solr/dovecot/data


-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

 * around line 313, change false to 
true


 * around line 147, set 
2000 (or above)


 * around line 696 : uncomment hdr

 * around line 1127, before class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add 



 * around line 1161, delete the whole class="solr.AddSchemaFieldsUpdateProcessorFactory" 
name="add-schema-fields">


    * around line 1192, remove the whole 
... />


-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat 
behavior  (equivalent to " fts_squat = partial=3 full=25" in 
dovecot.conf) (note : such a huge trouble to replace a single line 
setup, anyway...)


-> Move /opt/solr/server/solr (or the subfolder data) to a partition 
with *space*, ideally ext4 or faster file system (it looks like Solr 
is not considering using a simple mysql database, which would make 
sense to avoid all the fuzz and let it transit to a non-java state, 
but that is another story)


-> Config of dovecot.conf is as below

-> The systemd unit shall specify high ulimit for files and proc 
(see below)


-> Increase the memory available for the JavaVM (I put 12Gb as I 
have quite a space on my server, but you may adapt it as per your 
specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"


-> As Solr is complaining a lot, you may consider a filter for it in 
your syslog-ng or journald as it pollutes greatly your audit files


-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes


*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is 
improperly calculated ("huge header" warning for a simple email, 
which kilss the index of that considered email, so basically MOST 
emails as the calculation is wrong)


-> The UID returned by SOlr is to be considered as a STRING (and 
that is maybe the source of problem of the "out of bound" errors in 
fts_solr dovecot, as "long" is not enough)


-> Java errors : A lot of non sense for me, I am not expert in Java. 
But, with increased memory, it seems not crashing, even if 
complaining quite a lot in the logs





*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id
autoGeneratePhraseQueries="true" positionIncrementGap="100">



catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" 
generateWordParts="1" splitOnNumerics="1" catenateAll="1" 
catenateWords="1" preserveOriginal="1"/>













autoGeneratePhraseQueries="true">




















stored="true"/>




stored="true"/>



stored="true"/>
stored="true"/>




*-- DOVECOT.CONF*

mail_plugins = fts fts_solr

plugin {
plugin = fts fts_solr managesieve sieve

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = 

Re: Solr - complete setup (update)

2019-01-13 Thread Joan Moreau via dovecot
Hi Stephan, 

What's up with that ? 

Thank you so much 


On 2019-01-05 02:04, Stephan Bosch wrote:


Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot: 


Hi

This is the summary of my work with SOLR-Dovecot, in my *quest to reproduce the 
previoulsy excellent work of fts_squat*

@Aki : Based on the time I have spent on this, I would love to see you updating 
the Wiki with those improvements, and adding my name somewhere

@All : Hope it helps

I'll be going through the description below soon. I've recently independently 
installed fts-solr from scratch. Although this wasn't a flawless effort, I 
managed to get some basic indexing going. From this mail thread I understand 
that there are quite a few more problems than I've seen myself so far. Then 
again, I didn't perform extensive tests with actual searches.

Maybe we can turn all this into a test suite that we can run internally here at 
Dovecot. At the very least, the described Dovecot bugs need to be addressed and 
the wiki needs to be updated.

I'll get back to you.

Regards,

Stephan.


*- Installation:*

-> Create a clean install using the default, (at least in the Archlinux package), and do 
a "sudo -u solr solr create -c dovecot ". The config files are then in 
/opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data

-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

* around line 313, change false to 
true

* around line 147, set 2000 (or above)

* around line 696 : uncomment hdr

* around line 1127, before , 
add 

* around line 1161, delete the whole 

* around line 1192, remove the whole 

-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat behavior  (equivalent to 
" fts_squat = partial=3 full=25" in dovecot.conf) (note : such a huge trouble to replace a 
single line setup, anyway...)

-> Move /opt/solr/server/solr (or the subfolder data) to a partition with 
*space*, ideally ext4 or faster file system (it looks like Solr is not considering 
using a simple mysql database, which would make sense to avoid all the fuzz and 
let it transit to a non-java state, but that is another story)

-> Config of dovecot.conf is as below

-> The systemd unit shall specify high ulimit for files and proc (see below)

-> Increase the memory available for the JavaVM (I put 12Gb as I have quite a space on my 
server, but you may adapt it as per your specs) : in /opt/solr/bin/solr.in.sh, set 
SOLR_HEAP="12288m"

-> As Solr is complaining a lot, you may consider a filter for it in your 
syslog-ng or journald as it pollutes greatly your audit files

-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes

*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong)

-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)

-> Java errors : A lot of non sense for me, I am not expert in Java. But, with 
increased memory, it seems not crashing, even if complaining quite a lot in the 
logs

*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id















































*-- DOVECOT.CONF*

mail_plugins = fts fts_solr

plugin {
plugin = fts fts_solr managesieve sieve

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/

(replace 127.0.0.1 by your solr server if you want to use an external server)
(...)

}

*-- /etc/systemd/system/multi-user.target.wants/solr.service*

[Unit]
Description=Solr full text search engine
After=network.target

[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
*LimitNOFILE=65000*
*LimitNPROC=65000*
ExecStart=/opt/solr/bin/solr start -f

[Install]
WantedBy=multi-user.target

Re: Solr - complete setup (update)

2019-01-04 Thread Stephan Bosch

Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:


Hi

This is the summary of my work with SOLR-Dovecot, in my *quest to 
reproduce the previoulsy excellent work of fts_squat*



@Aki : Based on the time I have spent on this, I would love to see you 
updating the Wiki with those improvements, and adding my name somewhere


@All : Hope it helps

I'll be going through the description below soon. I've recently 
independently installed fts-solr from scratch. Although this wasn't a 
flawless effort, I managed to get some basic indexing going. From this 
mail thread I understand that there are quite a few more problems than 
I've seen myself so far. Then again, I didn't perform extensive tests 
with actual searches.


Maybe we can turn all this into a test suite that we can run internally 
here at Dovecot. At the very least, the described Dovecot bugs need to 
be addressed and the wiki needs to be updated.


I'll get back to you.


Regards,

Stephan.





*- Installation:*

-> Create a clean install using the default, (at least in the 
Archlinux package), and do a "sudo -u solr solr create -c dovecot ". 
The config files are then in /opt/solr/server/solr/dovecot/conf and 
datafiles in /opt/solr/server/solr/dovecot/data


-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

 * around line 313, change false to 
true


 * around line 147, set 2000 
(or above)


 * around line 696 : uncomment hdr

 * around line 1127, before class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add 



 * around line 1161, delete the whole class="solr.AddSchemaFieldsUpdateProcessorFactory" 
name="add-schema-fields">


    * around line 1192, remove the whole name="add-unknown-fields-to-the-schema" ... />


-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat 
behavior  (equivalent to " fts_squat = partial=3 full=25" in 
dovecot.conf) (note : such a huge trouble to replace a single line 
setup, anyway...)


-> Move /opt/solr/server/solr (or the subfolder data) to a partition 
with *space*, ideally ext4 or faster file system (it looks like Solr 
is not considering using a simple mysql database, which would make 
sense to avoid all the fuzz and let it transit to a non-java state, 
but that is another story)


-> Config of dovecot.conf is as below

-> The systemd unit shall specify high ulimit for files and proc (see 
below)


-> Increase the memory available for the JavaVM (I put 12Gb as I have 
quite a space on my server, but you may adapt it as per your specs) : 
in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"


-> As Solr is complaining a lot, you may consider a filter for it in 
your syslog-ng or journald as it pollutes greatly your audit files


-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes


*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is 
improperly calculated ("huge header" warning for a simple email, which 
kilss the index of that considered email, so basically MOST emails as 
the calculation is wrong)


-> The UID returned by SOlr is to be considered as a STRING (and that 
is maybe the source of problem of the "out of bound" errors in 
fts_solr dovecot, as "long" is not enough)


-> Java errors : A lot of non sense for me, I am not expert in Java. 
But, with increased memory, it seems not crashing, even if complaining 
quite a lot in the logs





*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id
autoGeneratePhraseQueries="true" positionIncrementGap="100">



catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" 
generateWordParts="1" splitOnNumerics="1" catenateAll="1" 
catenateWords="1" preserveOriginal="1"/>













autoGeneratePhraseQueries="true">




















stored="true"/>




stored="true"/>



stored="true"/>
stored="true"/>




*-- DOVECOT.CONF*

mail_plugins = fts fts_solr

plugin {
plugin = fts fts_solr managesieve sieve

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/

(replace 127.0.0.1 by your solr server if you want to use an external 
server)

(...)

}



*-- /etc/systemd/system/multi-user.target.wants/solr.service*

[Unit]
Description=Solr full text search engine
After=network.target

[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
*LimitNOFILE=65000*
*LimitNPROC=65000*
ExecStart=/opt/solr/bin/solr start -f

[Install]
WantedBy=multi-user.target