Re: Solr/Tika patches for dovecot 2.3.21

2024-01-03 Thread John Fawcett

On 03/01/2024 10:16, Aki Tuomi via dovecot wrote:

On 09/12/2023 15:02 EET John Fawcett  wrote:

  
Hi


I've just made updated versions of 3 patches against the latest release
2.3.21 in case they are useful to someone or might get considered for
official inclusion.

John
___

Could you please post these into https://github.com/dovecot/core against main 
please?

Aki
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org

Thanks Aki, please see

https://github.com/dovecot/core/pull/215

All three patches are in a single pull request, but each patch has a 
separate commit. Also just noticed that pull request 213 contains a fix 
about solr "rows" query parameter with a different approach. If you 
intend to merge #213, let me know and I can either remove my patch or 
adjust it, whichever is preferred, since both approaches could co-exist.


John


John
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: Solr/Tika patches for dovecot 2.3.21

2024-01-03 Thread Aki Tuomi via dovecot


> On 09/12/2023 15:02 EET John Fawcett  wrote:
> 
>  
> Hi
> 
> I've just made updated versions of 3 patches against the latest release 
> 2.3.21 in case they are useful to someone or might get considered for 
> official inclusion.
> 
> John
> 
> *dovecot-2.3.21-tika-http-auth.patch*
> 
> Allows specification of username and password in the fts_tika setting 
> for basic auth against tika server. For example
> 
> fts_tika = https://user:password@tika_server:443/tika
> 
> *dovecot-2.3.21-solr-max-size.patch
> *
> 
> This is a simplified version of my previous patch. Sets a size limit 
> (configuration fts_max_size) on message bodies that are to be indexed. 
> Message bodies for messages larger than fts_max_size are not sent to 
> solr. Defaults to zero which means no limit. For example
> 
> fts_max_size = 10M
> 
> *dovecot-2.3.21-solr-max-rows.patch
> *
> 
> When dovecot sends a search to solr it uses the rows parameter. For 
> multiple mailbox search the value used is SOLR_MAX_MULTI_ROWS , 
> hardcoded to 10. For single mailbox search the value is uidnext. 
> This patch introduces an upper limit for single mailbox search using the 
> same value as SOLR_MAX_MULTI_ROWS, while leaving the existing 
> functionality of sending the uidnext value if it is smaller. This is 
> just to place a more reasonable upper bound since uidnext can get much 
> larger.
> 
> Hi
> I've just made updated versions of 3 patches against the latest release 2.3.21
> in case they are useful to someone or might get considered for official
> inclusion.
> John
> dovecot-2.3.21-tika-http-auth.patch
> Allows specification of username and password in the fts_tika setting for 
> basic
> auth against tika server. For example
> fts_tika = https://user:password@tika_server:443/tika
> dovecot-2.3.21-solr-max-size.patch
> This is a simplified version of my previous patch. Sets a size limit
> (configuration fts_max_size) on message bodies that are to be indexed. Message
> bodies for messages larger than fts_max_size are not sent to solr. Defaults to
> zero which means no limit. For example
> fts_max_size = 10M
> dovecot-2.3.21-solr-max-rows.patch
> When dovecot sends a search to solr it uses the rows parameter. For multiple
> mailbox search the value used is SOLR_MAX_MULTI_ROWS , hardcoded to 10. 
> For
> single mailbox search the value is uidnext. This patch introduces an upper
> limit for single mailbox search using the same value as SOLR_MAX_MULTI_ROWS,
> while leaving the existing functionality of sending the uidnext value if it is
> smaller. This is just to place a more reasonable upper bound since uidnext can
> get much larger.
> 
> ___

Could you please post these into https://github.com/dovecot/core against main 
please?

Aki
___
dovecot mailing list -- dovecot@dovecot.org
To unsubscribe send an email to dovecot-le...@dovecot.org


Re: solr-jetty package in Debian 11 bullseye

2022-05-25 Thread Shawn Heisey

On 5/25/22 13:30, John Gateley wrote:

I am in the process of upgrading the OS on my mailserver, currently 
Debian 9, to Debian 11.


The solr-jetty package does not appear to exist in Debian 11.


The Solr package that was included in Debian and derivatives was REALLY 
ancient -- Solr 3.6.  Looks like they chose to remove it rather than 
update it.


Current Solr versions have a service installer script in the binary 
download that works on modern Linux distros.  If you need some 
assistance with it, I can help, but it would be off-list as it's not 
really Dovecot related.


Thanks,
Shawn



Re: Solr FTS - message deletes not working as expected

2021-11-19 Thread William Edwards


> Op 18 nov. 2021 om 17:15 heeft Shawn Heisey  het 
> volgende geschreven:
> 
> On 11/3/21 11:45 PM, Shawn Heisey wrote:
>> Manual expunges of existing messages also are not sending a delete request 
>> to Solr.  I waited several minutes for that too. 
> 
> Update on this, since you're all on the edge of your seat waiting. :)

Thanks for reporting back to the list.

> 
> I did a Send test with TypeApp, a mail client for Android.  It behaved 
> completely as expected:  Dovecot immediately sent a delete request to Solr 
> for the temporary message in Drafts.
> 
> So I went to Mozilla forums and started a discussion about what I am seeing 
> with Thunderbird.
> 
> I did another test run with sending a message with unique text, and doing a 
> manual Solr query for that unique text, I saw one result before sending and 
> two results after sending.  Watching the Solr log, no delete request was sent.
> 
> Then something completely unexpected:  I needed to reboot the laptop, so I 
> quit Thunderbird, and I hadn't yet closed my ssh session to the mail server, 
> which was still tailing the solr log with a grep for delete messages.  As 
> soon as I did the quit, Solr got a delete request for the Drafts message.  I 
> did another test -- shift-delete a message, see that Solr did not see the 
> delete request.  And then after I waited a while, I quit Thunderbird again.  
> Instantly got a delete request in Solr's log.
> 
> Definitely a Thunderbird problem, not dovecot.  Thanks for your patience, 
> everyone.  I will pursue it with Mozilla now.
> 
> Thanks,
> Shawn
> 
> 
> 



Re: Solr FTS - message deletes not working as expected

2021-11-18 Thread Shawn Heisey

On 11/3/21 11:45 PM, Shawn Heisey wrote:
Manual expunges of existing messages also are not sending a delete 
request to Solr.  I waited several minutes for that too. 


Update on this, since you're all on the edge of your seat waiting. :)

I did a Send test with TypeApp, a mail client for Android.  It behaved 
completely as expected:  Dovecot immediately sent a delete request to 
Solr for the temporary message in Drafts.


So I went to Mozilla forums and started a discussion about what I am 
seeing with Thunderbird.


I did another test run with sending a message with unique text, and 
doing a manual Solr query for that unique text, I saw one result before 
sending and two results after sending.  Watching the Solr log, no delete 
request was sent.


Then something completely unexpected:  I needed to reboot the laptop, so 
I quit Thunderbird, and I hadn't yet closed my ssh session to the mail 
server, which was still tailing the solr log with a grep for delete 
messages.  As soon as I did the quit, Solr got a delete request for the 
Drafts message.  I did another test -- shift-delete a message, see that 
Solr did not see the delete request.  And then after I waited a while, I 
quit Thunderbird again.  Instantly got a delete request in Solr's log.


Definitely a Thunderbird problem, not dovecot.  Thanks for your 
patience, everyone.  I will pursue it with Mozilla now.


Thanks,
Shawn




Re: Solr FTS - message deletes not working as expected

2021-11-04 Thread Shawn Heisey

On 11/3/21 12:38 PM, Michael Slusarz wrote:

Have you tried another client?



I tried evolution on Linux.

I can't work out how to do an expunge in evolution.  I'm not sure that 
it CAN do it.  Whenever I delete a message, even with shift-delete, it 
is moved to the Trash folder (with delete and add sent to Solr), and 
then emptying the trash does delete from Solr.  This is exactly how 
Thunderbird works with "normal" deletes.


I use shift-delete in Thunderbird a lot.  Because when I delete a 
message, I really want it gone, I do not want another step to remember 
later -- emptying the trash folder.


The FTS behavior on sending a message with Thunderbird seems completely 
wrong.  When I send a message with evolution, I do see a delete in the 
solr log, but there are still two results in a query for the special 
text I included in the message, so something is still not quite right.


I couldn't get Windows Mail to connect to my email account. Even with 
their advanced settings.  I enter all the right information, and it just 
fails to connect.  There are only "Use SSL" checkboxes, and I have no 
idea what port/security settings it's using.  I can't tell it to use 
STARTTLS instead of mandatory TLS, or to use port 143 for imap and 587 
for submission.  I think that I did see a successful imap login in the 
mail.log from the attempt to add the account, but I had other clients 
running so it is difficult to be sure about that. I will have to close 
all my other clients (which includes an Android client) and try the 
connect again.  When the account setup fails to connect, it doesn't have 
an option to add the account anyway (which Thunderbird does have), so I 
can't edit the account settings.


The android client seems to not have an expunge option for deleting 
messages either.  It goes through Trash.


Thanks,
Shawn




Re: Solr FTS - message deletes not working as expected

2021-11-03 Thread Shawn Heisey

On 11/3/2021 11:10 PM, Shawn Heisey wrote:
Then I downloaded the source archive from the main site, extracted it, 
and the configure script was included in that.  After compiling, I found 
the .so file and moved it into place, and I will be testing it.


The new library is significantly larger than the one it replaced, 332 KB 
versus 55KB.  I'm guessing it has things like debug symbols that get 
stripped when it's packaged.


I see no change in behavior even with that section of code removed.  I 
included some unique text in the body of the message I wrote and sent, 
and after sending I find both the copy in Drafts and the copy in Sent 
when I do a manual Solr query for that unique text.  I waited several 
minutes ... I didn't hit Send and then immediately do the query.


Manual expunges of existing messages also are not sending a delete 
request to Solr.  I waited several minutes for that too.


Thanks,
Shawn


Re: Solr FTS - message deletes not working as expected

2021-11-03 Thread Shawn Heisey

On 11/3/2021 10:12 PM, Shawn Heisey wrote:
Maybe I can do a custom compile of the source code and replace the 
/usr/lib/dovecot/modules/lib21_fts_solr_plugin.so file with what the 
compile produces.  I'm going to try that, and see if it explodes. :)


Bit of a problem trying to do this.  I pulled the source code from 
github, switched to the release-2.3.17 branch, edited the source code 
file you referenced, and found that the "configure" script that is 
mentioned as the first step in the INSTALL.md file is not present.  It's 
not present in any of the other branches I looked at either.


Then I downloaded the source archive from the main site, extracted it, 
and the configure script was included in that.  After compiling, I found 
the .so file and moved it into place, and I will be testing it.


There should maybe be something in the README that describes what is 
needed to go from 'git clone' to the point where the instructions in 
INSTALL will work.


Thanks,
Shawn


Re: Solr FTS - message deletes not working as expected

2021-11-03 Thread Shawn Heisey

On 11/3/21 1:09 PM, Michael Slusarz wrote:

For Solr, there's a code path in the FTS expunge code that will silently toss 
expunge requests:

 if (ctx->last_indexed_uid == 0 ||
 uid > ctx->last_indexed_uid + 100) {
 /* don't waste time asking Solr to expunge a message that is
highly unlikely to be indexed at this time. */
 return;
 }

So it's possible you are running into that.


Interesting.  I don't know dovecot code well enough to figure out if 
maybe *every* expunge that happens is classified by that code as 
"unlikely to be indexed at this time."  For the situations I have tried, 
I know that the message HAS been indexed and should be deleted from the 
index.


I was going to try removing that whole code construct and running it to 
see what happens, but it looks like the dovecot apt repo does not have 
source packages.  When I add a deb-src line to the dovecot repo config, 
then do "apt update" (which works without complaint) followed by 
"apt-get source dovecot-solr" it pulls source packages from the Ubuntu 
repo, not the dovecot repo, so the source I end up with is for version 
2.3.7, not the 2.3.17 that I am running.


Maybe I can do a custom compile of the source code and replace the 
/usr/lib/dovecot/modules/lib21_fts_solr_plugin.so file with what the 
compile produces.  I'm going to try that, and see if it explodes. :)


Thanks,
Shawn



Re: Solr FTS - message deletes not working as expected

2021-11-03 Thread Michael Slusarz
> On 11/03/2021 12:56 PM Shawn Heisey  wrote:
> 
> > Thunderbird does NOT necessarily process expunges immediately.  Depends on 
> > what else it is doing in the background.  So you can't click delete in the 
> > UI and not immediately see anything on the backend and definitively 
> > correlate the two.
> 
> The message is deleted by dovecot immediately.  I double checked this by 
> purging a message on my Linux client and saw the message immediately 
> disappear on my Windows client.  It happened even faster than I would 
> have expected.  IMAP seems to be a very good protocol!  I can't say it 
> definitively without more evidence, but the problem *seems* to be in 
> FTS, not imap or core dovecot.

For Solr, there's a code path in the FTS expunge code that will silently toss 
expunge requests:

if (ctx->last_indexed_uid == 0 ||
uid > ctx->last_indexed_uid + 100) {
/* don't waste time asking Solr to expunge a message that is
   highly unlikely to be indexed at this time. */
return;
}

So it's possible you are running into that.

michael


Re: Solr FTS - message deletes not working as expected

2021-11-03 Thread Shawn Heisey

On 11/3/21 12:38 PM, Michael Slusarz wrote:

Have you tried another client?


I only have two clients configured:  Thunderbird 78.13.0 on Linux and 
Thunderbird 91.2.1 on Windows.  Behavior is the same on both.


I will see if I can get another client configured.  Windows Mail is 
included on Windows 10, so I can try that.  I have a client system 
running Ubuntu, and I bet that there are a LOT of mail clients available 
there.



Thunderbird does NOT necessarily process expunges immediately.  Depends on what 
else it is doing in the background.  So you can't click delete in the UI and 
not immediately see anything on the backend and definitively correlate the two.


The message is deleted by dovecot immediately.  I double checked this by 
purging a message on my Linux client and saw the message immediately 
disappear on my Windows client.  It happened even faster than I would 
have expected.  IMAP seems to be a very good protocol!  I can't say it 
definitively without more evidence, but the problem *seems* to be in 
FTS, not imap or core dovecot.



Another option is to ensure debug logging is enabled for Dovecot so you can see what the FTS code 
is doing.  "log_debug = category=fts" and/or "mail_debug = yes" will help in 
that regard.



When I have some real time available, I will look into debug logging.  
I'm going to have to research exactly where to put those config options.


Thanks,
Shawn




Re: Solr FTS - message deletes not working as expected

2021-11-03 Thread Michael Slusarz
Have you tried another client?

Thunderbird does NOT necessarily process expunges immediately.  Depends on what 
else it is doing in the background.  So you can't click delete in the UI and 
not immediately see anything on the backend and definitively correlate the two.

Another option is to ensure debug logging is enabled for Dovecot so you can see 
what the FTS code is doing.  "log_debug = category=fts" and/or "mail_debug = 
yes" will help in that regard.

michael


> On 11/02/2021 9:16 AM Shawn Heisey  wrote:
> 
>  
> On 10/28/21 8:00 AM, Shawn Heisey wrote:
> > Also, when I send a message with Thunderbird, which deletes the 
> > message in Drafts and adds one to Sent, I am not seeing a delete 
> > request in the Solr log.  I do see the add. So this isn't limited to 
> > just the shift-delete workflow. 
> 
> 
> I have confirmed this with multiple attempts.
> 
> I start a new message in Thunderbird.  Then I wait around for that 
> message to be auto-saved to Drafts.  When that happens, I see an "add" 
> request in the solr log.
> 
> Then I send the message.  At that point, I see another add in Solr's 
> log.  Based on the message number in the add request, I know that this 
> time the add happens in the Sent folder.  But despite the fact that 
> Thunderbird deletes the message from Drafts, Solr never sees a delete 
> request.  My dovecot version has been updated since the last time I 
> indicated what version it is.  Now it is "2:2.3.17-3+ubuntu20.04" from 
> the dovecot repo, not the ubuntu repo.  The same thing happened with 2.3.16.
> 
> As I mentioned in the first message on this thread, when shift-delete is 
> used in Thunderbird to delete messages, that also never sends a delete 
> to Solr.
> 
> Something somewhere is misbehaving.  Is it Thunderbird accessing IMAP 
> incorrectly, or is it Dovecot?
> 
> I will do a packet capture to see if maybe dovecot is sending requests 
> that are not logged by Solr.  That seems unlikely -- even bad requests 
> should result in some kind of entry in the solr log.
> 
> Thanks,
> Shawn


Re: Solr FTS - message deletes not working as expected

2021-11-02 Thread Shawn Heisey

On 10/28/21 8:00 AM, Shawn Heisey wrote:
Also, when I send a message with Thunderbird, which deletes the 
message in Drafts and adds one to Sent, I am not seeing a delete 
request in the Solr log.  I do see the add. So this isn't limited to 
just the shift-delete workflow. 



I have confirmed this with multiple attempts.

I start a new message in Thunderbird.  Then I wait around for that 
message to be auto-saved to Drafts.  When that happens, I see an "add" 
request in the solr log.


Then I send the message.  At that point, I see another add in Solr's 
log.  Based on the message number in the add request, I know that this 
time the add happens in the Sent folder.  But despite the fact that 
Thunderbird deletes the message from Drafts, Solr never sees a delete 
request.  My dovecot version has been updated since the last time I 
indicated what version it is.  Now it is "2:2.3.17-3+ubuntu20.04" from 
the dovecot repo, not the ubuntu repo.  The same thing happened with 2.3.16.


As I mentioned in the first message on this thread, when shift-delete is 
used in Thunderbird to delete messages, that also never sends a delete 
to Solr.


Something somewhere is misbehaving.  Is it Thunderbird accessing IMAP 
incorrectly, or is it Dovecot?


I will do a packet capture to see if maybe dovecot is sending requests 
that are not logged by Solr.  That seems unlikely -- even bad requests 
should result in some kind of entry in the solr log.


Thanks,
Shawn




Re: Solr FTS - message deletes not working as expected

2021-10-28 Thread Shawn Heisey

On 10/26/21 12:18 PM, Shawn Heisey wrote:
But if I use shift-delete in Thunderbird, which deletes the message 
immediately without going through Trash, things are different. 



Also, when I send a message with Thunderbird, which deletes the message 
in Drafts and adds one to Sent, I am not seeing a delete request in the 
Solr log.  I do see the add.  So this isn't limited to just the 
shift-delete workflow.


Thanks,
Shawn




Re: Solr FTS - when does indexing happen?

2021-09-05 Thread Steve Dondley




Since most people will want fts_autoindex, the wiki page should
include it in its example configuration that goes into 90-plugin.conf.
 Possibly better ... maybe it should default to "yes".


It's probably a safe bet the developers, who are experts on these 
systems, probably have good reason not to make autoindexing the default.


Re: Solr FTS - when does indexing happen?

2021-09-04 Thread Shawn Heisey

On 9/4/2021 4:52 PM, Shawn Heisey wrote:
I see something talking about autoindex, but it does not have an example 
so that I can see where it needs to go.  I cannot work it out from what 
is there.


With a little googling, I was able to figure out where it needs to go. 
And now it acts like I was expecting.


Deletes are an interesting thing with autoindex.  If I use the "Del" key 
in Thunderbird (which moves the message to the Trash), I see an 
immediate delete (from the original folder) and add (to the Trash 
folder) in Solr's log.  And if I choose the "Empty Trash" option, I see 
those deletes in Solr's log immediately.


But if I press Shift-Del in Thunderbird (which immediately deletes the 
message, bypassing Trash), then it takes about 15 seconds before the 
Solr log shows the delete request.  Is that expected?  It's not causing 
me any problems, as it's highly unlikely that I'm going to do a query 
matching a message that I deleted ten seconds ago.  I can stand to wait 
15 seconds for the index to be updated.


Dovecot version is 2:2.3.16-2+ubuntu20.04, pulled from the Dovecot 
repository.


I have been doing some fiddling with the solrconfig and schema.  I have 
more fields stored now -- added from, to, and subject.  I couldn't tell 
what the matching messages were when accessing Solr directly.


I also implemented TrimFieldUpdateProcessorFactory which trims leading 
and trailing whitespace from fields before they are indexed.  I happened 
to notice that some of the new stored fields I added had EOL characters 
in them (not sure if it was \n or \r\n).


IMHO, a rather glaring omission from the fields in Solr is a 
timestamp/date field.  Does dovecot's FTS have the ability to send that 
data?  I know that Dovecot might not use it, but it would be a very 
useful thing to have for querying the dovecot index from something other 
than dovecot.  Not something I *NEED*, just nice to have.  I haven't 
looked at the fts or fts_solr code.


Thanks,
Shawn


Re: Solr FTS - when does indexing happen?

2021-09-04 Thread Shawn Heisey

On 9/4/2021 4:06 PM, Steve Dondley wrote:
As I recall, indexing an email is triggered immediately when an email is 
received if you have you dovecot settings set properly to trigger the 
indexing. The dovecot documentation for FTS, it spells it out.


See 
https://doc.dovecot.org/configuration_manual/fts/solr/?highlight=fts%20user%20plugin 


There is an autoindex setting that neeeds to be set to "yes".


I see something talking about autoindex, but it does not have an example 
so that I can see where it needs to go.  I cannot work it out from what 
is there.


With a little googling, I was able to figure out where it needs to go. 
And now it acts like I was expecting.


Since most people will want fts_autoindex, the wiki page should include 
it in its example configuration that goes into 90-plugin.conf.  Possibly 
better ... maybe it should default to "yes".


Thanks,
Shawn


Re: Solr FTS - when does indexing happen?

2021-09-04 Thread Steve Dondley

On 2021-09-03 12:43 PM, Shawn Heisey wrote:

I have Solr FTS on my dovecot install.  I followed the instructions on
the dovecot wiki.

How long a delay should I expect to see between new mail being
delivered with the dovecot LDA and an indexing request sent to Solr? 
Because I get a LOT of email from various mailing lists, and I do not
see any activity in Solr's log.  When I did doveadm index -A -q '*'
there was a lot of indexing activity in Solr's log, as expected.

One time I looked at the Solr index and it had been 23 hours since
it's last update ... I can guarantee that I received a lot of new
messages in that time.

What do I need to look at for further troubleshooting?

I can confirm that when I issue a search in the TypeApp app on my
phone (an IMAP app for android), I see the query in Solr's logfile.

Thanks,
Shawn


DISCLAIMER: I've only set up solr once with dovecot so take these words 
with a grain of salt.


As I recall, indexing an email is triggered immediately when an email is 
received if you have you dovecot settings set properly to trigger the 
indexing. The dovecot documentation for FTS, it spells it out.


See 
https://doc.dovecot.org/configuration_manual/fts/solr/?highlight=fts%20user%20plugin


There is an autoindex setting that neeeds to be set to "yes".


Re: Solr and FTS - assertion failure [proposed patch for upper bound on rows in solr search]

2020-12-31 Thread John Fawcett
On 30/12/2020 16:04, Antonino Esposito wrote:
> Hi,
>
> in the latest weeks i'm working on the Solr integration and
> immediately i've faced the assertion failure errors, on 2.0.19, 2.2.9
> and 2.3.11.3 servers in our network.
> Reading the thread on debian ML, I realize this issue is related to
> nested MIME and it affects large mailboxes
>
> In my case, the error in dovecot.log pairs with the following on
> solr.log and it seems the rows value has the same value of the last
> UID recorded in the mailbox. 
>
> For your reference, here is the Solr logs, where *2276996170* is the
> value passed by Dovecot as rows number and it clearly don't fit with
> the rows data type.
>
> Have you had experienced the same behaviour? Is there a workaround?
> Thanks
> Antonino

Whatever the reason for this happening, it would make sense not to
supply unbounded values to solr.

The "rows" value that is passed for a lookup on a single mailbox is the
value of the uidnext for the searched mailbox. For lookups on multiple
mailboxes there is a hard coded value:

#define SOLR_MAX_MULTI_ROWS 10

If 10 is a good maximum for lookups on multiple mailboxes it could
also be a good upper bound for lookups on single mailboxes too.

My proposed patch, which stops too large "rows" values going to solr is
as follows. This doesn't solve the issue of why uidnext is so large in
the first place for the specific mailbox. Nevertheless I think it makes
sense both as a potential workaround to the original issue and to
incorporate it as a safeguard. If the hard-coded value is too limiting,
it could be made configurable.

diff -ur dovecot-2.3.11.3-orig/src/plugins/fts-solr/fts-backend-solr.c
dovecot-2.3.11.3/src/plugins/fts-solr/fts-backend-solr.c
--- dovecot-2.3.11.3-orig/src/plugins/fts-solr/fts-backend-solr.c  
2020-08-12 14:20:41.0 +0200
+++ dovecot-2.3.11.3/src/plugins/fts-solr/fts-backend-solr.c   
2020-12-31 09:05:07.681897716 +0100
@@ -838,7 +838,7 @@

    str = t_str_new(256);
    str_printfa(str,
"wt=xml=uid,score=%u=uid+asc=%%7b!lucene+q.op%%3dAND%%7d",
-   status.uidnext);
+   I_MIN(status.uidnext,SOLR_MAX_MULTI_ROWS));
    prefix_len = str_len(str);

    if (solr_add_definite_query_args(str, args, and_args)) {

John




Re: Solr and FTS - assertion failure

2020-12-30 Thread John Fawcett
On 30/12/2020 16:04, Antonino Esposito wrote:
> Hi,
>
> in the latest weeks i'm working on the Solr integration and
> immediately i've faced the assertion failure errors, on 2.0.19, 2.2.9
> and 2.3.11.3 servers in our network.
> Reading the thread on debian ML, I realize this issue is related to
> nested MIME and it affects large mailboxes
>
> In my case, the error in dovecot.log pairs with the following on
> solr.log and it seems the rows value has the same value of the last
> UID recorded in the mailbox. 
>
> For your reference, here is the Solr logs, where *2276996170* is the
> value passed by Dovecot as rows number and it clearly don't fit with
> the rows data type.
>
> Have you had experienced the same behaviour? Is there a workaround?
> Thanks
> Antonino
>
Hi Antonio

out of curiosity, what does the dovecot log show for this issue?

John



Re: solr and dovecot 2.2.36

2020-08-18 Thread Alessio Cecchi

Hi Maciej,

version 6.6.x works fine, but probably also 7.7.x with schema from 
Dovecot 2.3.


Ciao

Il 18/08/20 14:00, Maciej Milaszewski ha scritto:

Hi
I have dovecot-2.2.36.4 (director) + 5 nodes dovecot (dovecot-2.2.36.4)

What version of Solr do you recommend ?


--
Alessio Cecchi
Postmaster @ http://www.qboxmail.it
https://www.linkedin.com/in/alessice



Re: solr and dovecot 2.2.36

2020-08-18 Thread Maciej Milaszewski
Hi
I tested ver solr-8.6.0 but not found schema for 2.2.x with version
6.6.x works fine


On 18.08.2020 14:59, Alessio Cecchi wrote:
>
> Hi Maciej,
>
> version 6.6.x works fine, but probably also 7.7.x with schema from
> Dovecot 2.3.
>
> Ciao
>
> Il 18/08/20 14:00, Maciej Milaszewski ha scritto:
>> Hi
>> I have dovecot-2.2.36.4 (director) + 5 nodes dovecot (dovecot-2.2.36.4)
>>
>> What version of Solr do you recommend ?
>>
> -- 
> Alessio Cecchi
> Postmaster @ http://www.qboxmail.it
> https://www.linkedin.com/in/alessice



Re: solr and dovecot 2.2.36

2020-08-18 Thread Thomas Zajic
* Maciej Milaszewski, 18.08.20 14:00

> I have dovecot-2.2.36.4 (director) + 5 nodes dovecot (dovecot-2.2.36.4)
> What version of Solr do you recommend ?

Don't know about 2.2.36.4, but for 2.3.11.3 both solr-7.7.3
and solr-8.6.0 appear to work fine. I'm only running a small
setup with a handful of users, though, so YMMV.

HTH nevertheless,
Thomas


Re: solr and dovecot-2.2.36.4

2020-07-20 Thread Shawn Heisey

On 7/17/2020 3:23 AM, Maciej Milaszewski wrote:

I try schema.xml and solrconfig.xml from working solr-6.6.5 (dovecot)
"dovecot:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Error initializing QueryElevationComponent"


That looks like a Solr error.  This list is not really the right place 
to get help on it ... but I have a lot of general experience with Solr, 
so I can try.


The error will be a LOT longer than this, probably dozens of lines that 
show a complete java stacktrace.  I will need that additional detail to 
make any determination about what went wrong.


You can generally find the additional detail in the solr.log file or one 
of its rotated cousins.  Exactly where this log file lives will vary 
depending on how you installed Solr.


If you are looking at the error on the "Logging" tab of the Solr admin 
UI, you can open the additional detail by clicking the little "i" icon 
on the log entry.  Be aware that it will close again VERY quickly -- so 
looking at the log file is usually a better option.


When I look at the solrconfig and schema available on the wiki page you 
linked, I do not find the word "elevation" in either file. 
QueryElevationComponent should not be necessary for a dovecot index.  I 
am guessing that your config contains some reference to it, and for a 
reason that I do not know at this time, initialization is failing.


Based on the limited information available, this would be my suggested 
course of action:  Wipe and rebuild the index using the solrconfig and 
schema found on the wiki page (which contain no mention of the elevation 
component), and then ask dovecot to force a full reindex.


It is *VERY* possible that if you provide additional information that my 
suggested course of action will change.


Thanks,
Shawn


Re: solr fts and removing accounts

2020-01-19 Thread azurit



Citát Shawn Heisey :


On 1/19/2020 3:31 AM, Wojciech Puchar wrote:
i use solr fts indexing. It worls very well but it have one  
database per system, not per user.


Lets suppose i delete one or more e-mail users in system.

How to remove them in solr database to reclaim space?


I cannot say whether there is anything that you can do from the  
Dovecot side, but I can explain how to delete them from the Solr  
side if you like.  The end result would likely be the same either way.


I'm not sure that doing so is worthwhile.  Deleting data from a Solr  
index just marks it as deleted.  Until the index segments are  
merged, space will not be reclaimed.  And it is entirely possible  
that the segment in question might never get merged during normal  
operation.


It is possible to force the merging ... the operation is called  
"optimize" by Solr.  But it is a heavyweight operation, not one that  
should be done frequently, and in general the Solr project doesn't  
recommend doing it at all.


Deleting one user's data, even if that user has a large amount of  
email and the delete is followed by an optimize, is not likely to  
make much of a change in the size of the inverted index, just due to  
how it works. Looking over the schema for fts_solr, the large fields  
are not stored, so there will not be all that much stored data.   
Stored data in the index is compressed as of Solr 4.1, which makes  
it even smaller.


Thanks,
Shawn



We are deleting all user data after it's removed, it can easily done  
with this HTTP GET command to solr:


/solr-url/update?stream.body=user:n...@domain.tld=false




Re: solr fts and removing accounts

2020-01-19 Thread Shawn Heisey

On 1/19/2020 3:31 AM, Wojciech Puchar wrote:
i use solr fts indexing. It worls very well but it have one database per 
system, not per user.


Lets suppose i delete one or more e-mail users in system.

How to remove them in solr database to reclaim space?


I cannot say whether there is anything that you can do from the Dovecot 
side, but I can explain how to delete them from the Solr side if you 
like.  The end result would likely be the same either way.


I'm not sure that doing so is worthwhile.  Deleting data from a Solr 
index just marks it as deleted.  Until the index segments are merged, 
space will not be reclaimed.  And it is entirely possible that the 
segment in question might never get merged during normal operation.


It is possible to force the merging ... the operation is called 
"optimize" by Solr.  But it is a heavyweight operation, not one that 
should be done frequently, and in general the Solr project doesn't 
recommend doing it at all.


Deleting one user's data, even if that user has a large amount of email 
and the delete is followed by an optimize, is not likely to make much of 
a change in the size of the inverted index, just due to how it works. 
Looking over the schema for fts_solr, the large fields are not stored, 
so there will not be all that much stored data.  Stored data in the 
index is compressed as of Solr 4.1, which makes it even smaller.


Thanks,
Shawn


Re: Solr commit and optimize: which user?

2019-12-15 Thread John Fawcett
On 15/12/2019 17:13, John Gateley wrote:
> Hi,
>
> I have Solr FTR working with dovecot, and need to do the commit and
> optimize recommended here:
> https://wiki.dovecot.org/Plugins/FTS/Solr
>
> Should this run as root, or as each mail user. The documentation above
> implies, but doesn't state,
> that it is root. The docs here are similar:
> http://grimore.org/networking/dovecot/full_text_search
>
> But the docs here say it has to be done for each user:
> http://www.unixsamurai.com/dovecot-full-text-search-jetty-solr/
>
> I don't know Solr well enough to answer. Which is correct?
>
> Thanks
>
> John
>
John

the optimize and commit commands don't specify any mail user in them:

# Optimize should be run somewhat rarely, e.g. once a day
curl https://:/solr/dovecot/update?optimize=true
# Commit should be run pretty often, e.g. every minute
curl https://:/solr/dovecot/update?commit=true

It doesn't actually matter which cron user you run them under so long as that 
user can execute the commands successfully. The idea (if you need them) is to 
run them globally under a single user. 
If you schedule them under more than one cron user it is just running the same 
commands more times, not doing anything specific per user.

John

0 1 * * * curl http://127.0.0.1:8080/solr/update?optimize=true 2 * * * *
curl http://127.0.0.1:8080/solr/update?commit=true

Copyright © UnixSamurai.com Read more at:
http://www.unixsamurai.com/dovecot-full-text-search-jetty-solr/
curl http://127.0.0.1:8080/solr/update?optimize=true 2 * * * * curl
http://127.0.0.1:8080/solr/update?commit=true

Copyright © UnixSamurai.com Read more at:
http://www.unixsamurai.com/dovecot-full-text-search-jetty-solr/
curl http://127.0.0.1:8080/solr/update?optimize=true 2 * * * * curl
http://127.0.0.1:8080/solr/update?commit=true

Copyright © UnixSamurai.com Read more at:
http://www.unixsamurai.com/dovecot-full-text-search-jetty-solr/


Re: Solr for Dovecot on Debian 9, package has correct binary?

2019-12-10 Thread Christian Kivalo via dovecot



>If I want to use solr, do I have to build dovecot myself? I'd prefer to
>use
>the debian package (Debian 9 for now).
No. Install solr support with

apt-get install dovecot-solr
-- 
Christian Kivalo


Re: Solr, Dovecot & macOS / iOS

2019-08-13 Thread @lbutlr via dovecot
On 13 Aug 19, at 05:58 , James Brown  wrote:
> 
> b) does Mail.app and other mail clients on Macs or iOS devices perform 
> searches on their local copy of mail or does it just send a search request to 
> the server?

Mail.app uses spotlight on the local data, so if your users are all Mac then 
solar is pointless.



Re: Solr, Dovecot & macOS / iOS

2019-08-13 Thread Jean-Daniel via dovecot


> Le 13 août 2019 à 14:53, Sami Ketola  a écrit :
> 
> 
> 
>> On 13 Aug 2019, at 15.37, Jean-Daniel via dovecot > > wrote:
>> 
>> 
>> 
>>> Le 13 août 2019 à 14:16, Sami Ketola via dovecot >> > a écrit :
>>> 
>>> 
>>> 
 On 13 Aug 2019, at 14.58, James Brown via dovecot >>> > wrote:
 
 I’m thinking of getting Solr working with my Dovecot server. Server is new 
 6-core Mac Mini, mail store of over 1/2 TB. Mailboxes with 100s of 
 thousands of messages.
 
 But I’m not sure if:
 
 a) it will make enough of a difference and
>>> 
>>> Choose mailbox format wisely. sdbox preferred unless HFS+ has problems with 
>>> 100s of thousands of small files in same directory. If so, then use mdbox 
>>> with periodic purges.
>>> 
 
 b) does Mail.app and other mail clients on Macs or iOS devices perform 
 searches on their local copy of mail or does it just send a search request 
 to the server?
>>> 
>>> None of the apple devices use IMAP SEARCH. They ALL maintain and use their 
>>> own local search database on the device. Also they seem to refresh the 
>>> database every now and then redownloading all emails.
>> 
>> Do you have a source for that. My experience is that without server search 
>> support, iOS is very slow at returning result. Moreover, it keep only latest 
>> messages and never download message until you read them.
> 
> I'm a apple device user myself. I have couple of iPhones, couple of iPads, 
> couple if MacBooks and Mail.app on any of them is not using IMAP SEARCH.
> And I cannot find any configuration option to enable it. Only spotlight index 
> is used. On Mac OS Mail.App seems to store the indexed data to:
> 
> samik@samikworkmac:~>ls -1 Library/Mail/V6/MailData/Envelope\ Index*
> Library/Mail/V6/MailData/Envelope Index
> Library/Mail/V6/MailData/Envelope Index-shm
> Library/Mail/V6/MailData/Envelope Index-wal
> 
> if those files are removed or spotlight search for mails is disabled Mail.App 
> can't find anything anymore. It does not fall back to IMAP SEARCH.
> 

My question was more about iOS. I know that macOS Mail does not rely on any way 
on remote indexing and has it’s own local index, but as it also store all 
messages locally, it’s an easy requirement. For iOS that only download messages 
meta-data by default, I was not so sure. 

I’m accessing my mail server using Apple devices only, and see some imap SEARCH 
requests in dovecot stats, but can’t figure out where they came from though. So 
you may be right.




Re: Solr, Dovecot & macOS / iOS

2019-08-13 Thread Sami Ketola via dovecot


> On 13 Aug 2019, at 15.37, Jean-Daniel via dovecot  wrote:
> 
> 
> 
>> Le 13 août 2019 à 14:16, Sami Ketola via dovecot  a 
>> écrit :
>> 
>> 
>> 
>>> On 13 Aug 2019, at 14.58, James Brown via dovecot  
>>> wrote:
>>> 
>>> I’m thinking of getting Solr working with my Dovecot server. Server is new 
>>> 6-core Mac Mini, mail store of over 1/2 TB. Mailboxes with 100s of 
>>> thousands of messages.
>>> 
>>> But I’m not sure if:
>>> 
>>> a) it will make enough of a difference and
>> 
>> Choose mailbox format wisely. sdbox preferred unless HFS+ has problems with 
>> 100s of thousands of small files in same directory. If so, then use mdbox 
>> with periodic purges.
>> 
>>> 
>>> b) does Mail.app and other mail clients on Macs or iOS devices perform 
>>> searches on their local copy of mail or does it just send a search request 
>>> to the server?
>> 
>> None of the apple devices use IMAP SEARCH. They ALL maintain and use their 
>> own local search database on the device. Also they seem to refresh the 
>> database every now and then redownloading all emails.
> 
> Do you have a source for that. My experience is that without server search 
> support, iOS is very slow at returning result. Moreover, it keep only latest 
> messages and never download message until you read them.

I'm a apple device user myself. I have couple of iPhones, couple of iPads, 
couple if MacBooks and Mail.app on any of them is not using IMAP SEARCH.
And I cannot find any configuration option to enable it. Only spotlight index 
is used. On Mac OS Mail.App seems to store the indexed data to:

samik@samikworkmac:~>ls -1 Library/Mail/V6/MailData/Envelope\ Index*
Library/Mail/V6/MailData/Envelope Index
Library/Mail/V6/MailData/Envelope Index-shm
Library/Mail/V6/MailData/Envelope Index-wal

if those files are removed or spotlight search for mails is disabled Mail.App 
can't find anything anymore. It does not fall back to IMAP SEARCH.

Sami



Re: Solr, Dovecot & macOS / iOS

2019-08-13 Thread Jean-Daniel via dovecot



> Le 13 août 2019 à 14:16, Sami Ketola via dovecot  a 
> écrit :
> 
> 
> 
>> On 13 Aug 2019, at 14.58, James Brown via dovecot  
>> wrote:
>> 
>> I’m thinking of getting Solr working with my Dovecot server. Server is new 
>> 6-core Mac Mini, mail store of over 1/2 TB. Mailboxes with 100s of thousands 
>> of messages.
>> 
>> But I’m not sure if:
>> 
>> a) it will make enough of a difference and
> 
> Choose mailbox format wisely. sdbox preferred unless HFS+ has problems with 
> 100s of thousands of small files in same directory. If so, then use mdbox 
> with periodic purges.
> 
>> 
>> b) does Mail.app and other mail clients on Macs or iOS devices perform 
>> searches on their local copy of mail or does it just send a search request 
>> to the server?
> 
> None of the apple devices use IMAP SEARCH. They ALL maintain and use their 
> own local search database on the device. Also they seem to refresh the 
> database every now and then redownloading all emails.

Do you have a source for that. My experience is that without server search 
support, iOS is very slow at returning result. Moreover, it keep only latest 
messages and never download message until you read them.



Re: Solr, Dovecot & macOS / iOS

2019-08-13 Thread Sami Ketola via dovecot



> On 13 Aug 2019, at 14.58, James Brown via dovecot  wrote:
> 
> I’m thinking of getting Solr working with my Dovecot server. Server is new 
> 6-core Mac Mini, mail store of over 1/2 TB. Mailboxes with 100s of thousands 
> of messages.
> 
> But I’m not sure if:
> 
> a) it will make enough of a difference and

Choose mailbox format wisely. sdbox preferred unless HFS+ has problems with 
100s of thousands of small files in same directory. If so, then use mdbox with 
periodic purges.

> 
> b) does Mail.app and other mail clients on Macs or iOS devices perform 
> searches on their local copy of mail or does it just send a search request to 
> the server?

None of the apple devices use IMAP SEARCH. They ALL maintain and use their own 
local search database on the device. Also they seem to refresh the database 
every now and then redownloading all emails.

> 
> I’m guessing the searches are done locally so no point in Solr?

No point if only apple client devices.

Sami



Re: solr

2019-07-10 Thread Shawn Heisey via dovecot

On 7/10/2019 2:49 AM, Maciej Milaszewski IQ PL via dovecot wrote:

On the other hand solr replication is quite complicated process and
rollback or master-slave switch in this case is non-trivial task, that
may have result in whole dataset inconsistency.

Do you have any experience in such cases ? Maby load-balance in HAProxy
colud do the thing ? Something like:

.
server search1 192.168.1.1:8983 check port 8983 inter 20s fastinter 2
server search2 192.168.1.2:8983 backup
.


If you are using master-slave replication, you do not want to do this. 
Say the master went down, and it took you a while to get it back up. 
Unless you change the replication config, any data indexed on the slave 
would be gone as soon as the master was started back up, because the 
index would be replicated from the master.  You are quite correct that 
switching replication roles is non-trivial.


Using a load balancer in this manner is a great option if Solr is 
running in SolrCloud mode.  SolrCloud is a true cluster -- no masters, 
no slaves.  You can send indexing or queries to any system in the cloud 
and everything works.  The only real downside is that ZooKeeper (which 
is what turns Solr into SolrCloud) requires three servers for high 
availability, not two.  The third server could have much lower specs 
than the other two, as ZooKeeper's system requirements are typically 
quite modest.


Thanks,
Shawn


Re: solr vs fts

2019-07-05 Thread Christian Kivalo via dovecot



On 2019-07-04 13:35, Felix Zielcke via dovecot wrote:

Am Donnerstag, den 04.07.2019, 12:27 +0300 schrieb Aki Tuomi via
dovecot:

On 4.7.2019 12.22, Maciej Milaszewski IQ PL via dovecot wrote:
> Hi
> So you're advised to use a solr or something else?
>

Using any FTS is advisable, currently suitable ones would be SOLR or
Xapian (see https://github.com/grosjo/fts-xapian)



Hi Aki,

I didn't yet think about using FTS either but followed a bit the thread
about developing the Xapian plugin.
How stable is that now?

https://wiki.dovecot.org/Plugins/FTS says above:

"The following FTS indexers (in preferred order) are supported"

but fts-xapian is listed below all others and Solr at the top.
Solr fts plugin is developed by dovecot developers therefore an 
"official" plugin. Xapian is an "unofficial" plugin developed by Joan 
Moreau.

Is the wiki just outdated?

Felix


--
 Christian Kivalo


Re: solr vs fts

2019-07-04 Thread David Mehler via dovecot
Hi,

Is Clucene no longer prefered/developed indexer?

Thanks.
Dave.


On 7/4/19, Felix Zielcke via dovecot  wrote:
> Am Donnerstag, den 04.07.2019, 12:27 +0300 schrieb Aki Tuomi via
> dovecot:
>> On 4.7.2019 12.22, Maciej Milaszewski IQ PL via dovecot wrote:
>> > Hi
>> > So you're advised to use a solr or something else?
>> >
>>
>> Using any FTS is advisable, currently suitable ones would be SOLR or
>> Xapian (see https://github.com/grosjo/fts-xapian)
>>
>
> Hi Aki,
>
> I didn't yet think about using FTS either but followed a bit the thread
> about developing the Xapian plugin.
> How stable is that now?
>
> https://wiki.dovecot.org/Plugins/FTS says above:
>
> "The following FTS indexers (in preferred order) are supported"
>
> but fts-xapian is listed below all others and Solr at the top.
>
> Is the wiki just outdated?
>
> Felix
>
>


Re: solr vs fts

2019-07-04 Thread Felix Zielcke via dovecot
Am Donnerstag, den 04.07.2019, 12:27 +0300 schrieb Aki Tuomi via
dovecot:
> On 4.7.2019 12.22, Maciej Milaszewski IQ PL via dovecot wrote:
> > Hi
> > So you're advised to use a solr or something else?
> > 
> 
> Using any FTS is advisable, currently suitable ones would be SOLR or
> Xapian (see https://github.com/grosjo/fts-xapian)
> 

Hi Aki,

I didn't yet think about using FTS either but followed a bit the thread
about developing the Xapian plugin.
How stable is that now?

https://wiki.dovecot.org/Plugins/FTS says above:

"The following FTS indexers (in preferred order) are supported"

but fts-xapian is listed below all others and Solr at the top.

Is the wiki just outdated?

Felix



Re: solr vs fts

2019-07-04 Thread Aki Tuomi via dovecot


On 4.7.2019 12.22, Maciej Milaszewski IQ PL via dovecot wrote:
>>> A few clients have 25K and more e-mail
>>>
>>> I thinking about use solr like:
>>>  fts = solr
>>>  fts_solr = debug url=http://IP:8983/solr/ (solr in external machine)
>>>
>>> Does it make sense ? use dovecot_indexes and fts ?
>>> What is the difference in performance?
>>>
>> Hi!
>>
>> Dovecot indexes are not actually related to FTS that much. Using FTS
>> usually makes sense since it speeds up IMAP fulltext searches.
>>
>> Aki
>>
> Hi
> So you're advised to use a solr or something else?
>

Using any FTS is advisable, currently suitable ones would be SOLR or
Xapian (see https://github.com/grosjo/fts-xapian)

Aki



Re: solr vs fts

2019-07-04 Thread Maciej Milaszewski IQ PL via dovecot


>> A few clients have 25K and more e-mail
>>
>> I thinking about use solr like:
>>  fts = solr
>>  fts_solr = debug url=http://IP:8983/solr/ (solr in external machine)
>>
>> Does it make sense ? use dovecot_indexes and fts ?
>> What is the difference in performance?
>>
> Hi!
>
> Dovecot indexes are not actually related to FTS that much. Using FTS
> usually makes sense since it speeds up IMAP fulltext searches.
>
> Aki
>
Hi
So you're advised to use a solr or something else?



Re: solr vs fts

2019-07-04 Thread Aki Tuomi via dovecot


On 4.7.2019 12.14, Maciej Milaszewski IQ PL via dovecot wrote:
> Hi
> I have a question about tunning dovecot-2.2.36.x
>
> Mail was stared in storage via nfs in MAILDIR via
> /home/us/usern...@domain.ltd/MAILDIR/
> I use additionally local dovecot_indexes via SSD disk
> (/var/dovecot_indexes%h)
>
> A few clients have 25K and more e-mail
>
> I thinking about use solr like:
>  fts = solr
>  fts_solr = debug url=http://IP:8983/solr/ (solr in external machine)
>
> Does it make sense ? use dovecot_indexes and fts ?
> What is the difference in performance?
>
Hi!

Dovecot indexes are not actually related to FTS that much. Using FTS
usually makes sense since it speeds up IMAP fulltext searches.

Aki



Re: SOLR/Index?

2019-04-15 Thread John Fawcett via dovecot
On 15/04/2019 11:38, Larry Rosenman via dovecot wrote:
> ⌂63% [l...@thebighonker.lerctr.org:~] $ grep fts1970 mail/INBOX
> ⌂67% [l...@thebighonker.lerctr.org:~] 1 $ mail -s "test fts1970"
> l...@lerctr.org 
> test fts1970
>
> test fts1970
> .
> EOT
> [l...@thebighonker.lerctr.org:~] $ mailq
> [l...@thebighonker.lerctr.org:~] $ grep fts1970 mail/INBOX
> Subject: test fts1970
> test fts1970
> test fts1970
>
>
> Apr 15 04:29:03 thebighonker exim[49528]: 1hFxvD-000Csq-P6 <=
> l...@lerctr.org  U=ler P=local S=388
> Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(49364): Connect from
> local
> Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(l...@lerctr.org/49364
> ): save: box=INBOX, uid=175402,
> msgid= >, size=640,
> vsize=660, from=Larry Rosenman  >, subject=test fts1970, flags=()
> Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(l...@lerctr.org/49364
> ): sieve:
> msgid= >: stored mail into
> mailbox 'INBOX' (subject=test fts1970 from=l...@lerctr.org
>  size=660)
> Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(49364): Disconnect
> from local: Client has quit the connection (state=READY)
> Apr 15 04:29:03 thebighonker exim[49535]: 1hFxvD-000Csq-P6 => ler
> mailto:l...@lerctr.org>> R=localuser T=dovecot_lmtp
> S=404 C="250 2.0.0 mailto:l...@lerctr.org>>
> 6ACWMN9OtFzUwAAAu+mOrA Saved" QT=0s DT=0s
> Apr 15 04:29:03 thebighonker exim[49535]: 1hFxvD-000Csq-P6 Completed QT=0s
> Apr 15 04:29:03 thebighonker dovecot[2507]:
> indexer-worker(l...@lerctr.org/49366 ):
> Indexed 1 messages in INBOX (UIDs 175402..175402)
>
>
> ⌂81% [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox INBOX 
> body 'fts1970'
> ⌂83% [l...@thebighonker.lerctr.org:~] $
>
>
> ⌂65% [l...@thebighonker.lerctr.org:~] 75 $ doveadm search -u
> l...@lerctr.org   mailbox INBOX body 'fts1970'
> a53a143be44bda5bd483bbe98eac 175402
> [l...@thebighonker.lerctr.org:~] $ doveadm index -q INBOX
> [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox INBOX  body
> 'fts1970'
> [l...@thebighonker.lerctr.org:~] $ doveadm fts rescan
> [l...@thebighonker.lerctr.org:~] $ doveadm index -q INBOX
> [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox INBOX  body
> 'fts1970'
> a53a143be44bda5bd483bbe98eac 175402
> [l...@thebighonker.lerctr.org:~] $ doveadm search -u l...@lerctr.org
>   mailbox INBOX body 'fts1970'
> a53a143be44bda5bd483bbe98eac 175402
> [l...@thebighonker.lerctr.org:~] $
>
> So, yes, your hypothesis is correct.
>
> Question: How can I make it consistent?  
>
> I have a script that runs on the first of the month that does
> archiving, and I have similar issues in that namespace:
> ⌂67% [l...@thebighonker.lerctr.org:~] $ cat bin/archive-mail
> #!/bin/sh
> PATH=$PATH:/usr/local/bin
> #Expects to be run after midnight on the first of the month
> #  to archive all the previous months mail
> #Date Run:
> TODAY=`date "+%Y-%m-%d"`
> #last month in /MM
> YEAR_LAST_MONTH=`date -v-1d "+%Y/%m"`
> #1st of last month as 01-Mon-
> FIRST_LAST_MONTH=`date -v-1d "+01-%b-%Y"`
> echo 'TODAY=' ${TODAY}
> echo 'YEAR_LAST_MONTH=' ${YEAR_LAST_MONTH}
> echo 'FIRST_LAST_MONTH=' ${FIRST_LAST_MONTH}
> # get a list of all the mailboxes with at least one real message
> doveadm -f tab mailbox status vsize \* 2>/dev/null |
>         sed -e 1d | sort -k 1,1 |
>         awk  'BEGIN {FS="\t"} {if ($2 > 0)  print $1}' |
> while read i
> do
>    echo `date` start ${i}
>    doveadm mailbox create "ARCHIVE/${YEAR_LAST_MONTH}/${i}"
>    doveadm -f tab mailbox status messages "${i}"
>    doveadm move "ARCHIVE/${YEAR_LAST_MONTH}/${i}" mailbox \
>             "${i}" BEFORE ${TODAY} SINCE ${FIRST_LAST_MONTH}
>    doveadm -f tab mailbox status messages "${i}"
>    echo `date` done  ${i}
> done
> ⌂64% [l...@thebighonker.lerctr.org:~] $
>
>
> The Exim config can be provided as well if necessary.
>
> ler & l...@lerctr.org  *ARE THE SAME MAILBOX*
>
At the moment it looks as though you have two sets of emails indexed in
solr. One is indexed under username (the one you are running mannually
and apparently the one used by roundcube too, but that's to be verified)
and another set being indexed by autoindex = yes option using the full
email address. Once you've got it working as you require, then you may
want to clean out solr and reindex with just one of them just to reduce
volumes.

Your setup seems to have a mix of users from mysql and from /etc/passwd.
Not sure if your mysql users are all mapped to real users or they have
their own mailboxes with domain included. Your solution will depend on
what you really need and if the setup is working correctly you may not
want to tweak it too much or other things may 

Re: SOLR/Index?

2019-04-15 Thread Larry Rosenman via dovecot
⌂63% [l...@thebighonker.lerctr.org:~] $ grep fts1970 mail/INBOX
⌂67% [l...@thebighonker.lerctr.org:~] 1 $ mail -s "test fts1970"
l...@lerctr.org
test fts1970

test fts1970
.
EOT
[l...@thebighonker.lerctr.org:~] $ mailq
[l...@thebighonker.lerctr.org:~] $ grep fts1970 mail/INBOX
Subject: test fts1970
test fts1970
test fts1970


Apr 15 04:29:03 thebighonker exim[49528]: 1hFxvD-000Csq-P6 <= l...@lerctr.org
U=ler P=local S=388
Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(49364): Connect from local
Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(l...@lerctr.org/49364):
save: box=INBOX, uid=175402, msgid=<
e1hfxvd-000csq...@thebighonker.lerctr.org>, size=640, vsize=660, from=Larry
Rosenman , subject=test fts1970, flags=()
Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(l...@lerctr.org/49364):
sieve: msgid=: stored mail into
mailbox 'INBOX' (subject=test fts1970 from=l...@lerctr.org size=660)
Apr 15 04:29:03 thebighonker dovecot[2507]: lmtp(49364): Disconnect from
local: Client has quit the connection (state=READY)
Apr 15 04:29:03 thebighonker exim[49535]: 1hFxvD-000Csq-P6 => ler <
l...@lerctr.org> R=localuser T=dovecot_lmtp S=404 C="250 2.0.0 <
l...@lerctr.org> 6ACWMN9OtFzUwAAAu+mOrA Saved" QT=0s DT=0s
Apr 15 04:29:03 thebighonker exim[49535]: 1hFxvD-000Csq-P6 Completed QT=0s
Apr 15 04:29:03 thebighonker dovecot[2507]: indexer-worker(
l...@lerctr.org/49366): Indexed 1 messages in INBOX (UIDs 175402..175402)


⌂81% [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox INBOX  body
'fts1970'
⌂83% [l...@thebighonker.lerctr.org:~] $


⌂65% [l...@thebighonker.lerctr.org:~] 75 $ doveadm search -u l...@lerctr.org
mailbox INBOX body 'fts1970'
a53a143be44bda5bd483bbe98eac 175402
[l...@thebighonker.lerctr.org:~] $ doveadm index -q INBOX
[l...@thebighonker.lerctr.org:~] $ doveadm search mailbox INBOX  body
'fts1970'
[l...@thebighonker.lerctr.org:~] $ doveadm fts rescan
[l...@thebighonker.lerctr.org:~] $ doveadm index -q INBOX
[l...@thebighonker.lerctr.org:~] $ doveadm search mailbox INBOX  body
'fts1970'
a53a143be44bda5bd483bbe98eac 175402
[l...@thebighonker.lerctr.org:~] $ doveadm search -u l...@lerctr.org  mailbox
INBOX body 'fts1970'
a53a143be44bda5bd483bbe98eac 175402
[l...@thebighonker.lerctr.org:~] $

So, yes, your hypothesis is correct.

Question: How can I make it consistent?

I have a script that runs on the first of the month that does archiving,
and I have similar issues in that namespace:
⌂67% [l...@thebighonker.lerctr.org:~] $ cat bin/archive-mail
#!/bin/sh
PATH=$PATH:/usr/local/bin
#Expects to be run after midnight on the first of the month
#  to archive all the previous months mail
#Date Run:
TODAY=`date "+%Y-%m-%d"`
#last month in /MM
YEAR_LAST_MONTH=`date -v-1d "+%Y/%m"`
#1st of last month as 01-Mon-
FIRST_LAST_MONTH=`date -v-1d "+01-%b-%Y"`
echo 'TODAY=' ${TODAY}
echo 'YEAR_LAST_MONTH=' ${YEAR_LAST_MONTH}
echo 'FIRST_LAST_MONTH=' ${FIRST_LAST_MONTH}
# get a list of all the mailboxes with at least one real message
doveadm -f tab mailbox status vsize \* 2>/dev/null |
sed -e 1d | sort -k 1,1 |
awk  'BEGIN {FS="\t"} {if ($2 > 0)  print $1}' |
while read i
do
   echo `date` start ${i}
   doveadm mailbox create "ARCHIVE/${YEAR_LAST_MONTH}/${i}"
   doveadm -f tab mailbox status messages "${i}"
   doveadm move "ARCHIVE/${YEAR_LAST_MONTH}/${i}" mailbox \
"${i}" BEFORE ${TODAY} SINCE ${FIRST_LAST_MONTH}
   doveadm -f tab mailbox status messages "${i}"
   echo `date` done  ${i}
done
⌂64% [l...@thebighonker.lerctr.org:~] $


The Exim config can be provided as well if necessary.

ler & l...@lerctr.org *ARE THE SAME MAILBOX*


On Mon, Apr 15, 2019 at 4:05 AM John Fawcett via dovecot <
dovecot@dovecot.org> wrote:

> On 15/04/2019 10:59, Larry Rosenman via dovecot wrote:
>
> I'll run a full test when I'm back in front of areal computer vs. My
> phone.(in a few hours)
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
> ------
> *From:* dovecot 
>  on behalf of John Fawcett via dovecot
>  
> *Sent:* Monday, April 15, 2019 3:57:08 AM
> *To:* Dovecot Mailing List
> *Subject:* Re: SOLR/Index?
>
> On 15/04/2019 10:31, Larry Rosenman via dovecot wrote:
>
> It always shows the autoindex. And yes built from sources.  I'm the
> FreeBSD port maintainer for mail/docecot.  This has been happening for
> several releases.
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
> --
> *From:* dovecot 
>  on behalf of John Fawcett via dovecot
>  
> *Sent:* Monday, April 15, 2019 2:06:55 AM
> *To:* dovecot@dovecot.org
> *Subject:* Re: SOLR/Index?
>
> On 15/04/2019 08:09, Larry Rosenman via dovecot wrote:
>
> Note the hits after the fts rescan/index.
>
> Get Outlo

Re: SOLR/Index?

2019-04-15 Thread John Fawcett via dovecot
On 15/04/2019 10:59, Larry Rosenman via dovecot wrote:
> I'll run a full test when I'm back in front of areal computer vs. My
> phone.(in a few hours)
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
> 
> *From:* dovecot  on behalf of John
> Fawcett via dovecot 
> *Sent:* Monday, April 15, 2019 3:57:08 AM
> *To:* Dovecot Mailing List
> *Subject:* Re: SOLR/Index?
>  
> On 15/04/2019 10:31, Larry Rosenman via dovecot wrote:
>> It always shows the autoindex. And yes built from sources.  I'm the
>> FreeBSD port maintainer for mail/docecot.  This has been happening
>> for several releases.
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>> 
>> *From:* dovecot  on behalf of John
>> Fawcett via dovecot 
>> *Sent:* Monday, April 15, 2019 2:06:55 AM
>> *To:* dovecot@dovecot.org
>> *Subject:* Re: SOLR/Index?
>>  
>> On 15/04/2019 08:09, Larry Rosenman via dovecot wrote:
>>> Note the hits after the fts rescan/index.
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>
>>> --------
>>> *From:* Aki Tuomi 
>>> *Sent:* Monday, April 15, 2019 12:55:07 AM
>>> *To:* Larry Rosenman; John Fawcett
>>> *Cc:* Dovecot Mailing List
>>> *Subject:* Re: SOLR/Index?
>>>  
>>>
>>>
>>> On 15.4.2019 3.33, Larry Rosenman via dovecot wrote:
>>>> ⌂72% [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox
>>>> lists/freebsd/ports-commiters  body 'sysutils'
>>>> [l...@thebighonker.lerctr.org:~] $ doveadm fts rescan
>>>> [l...@thebighonker.lerctr.org:~] $ doveadm index -q
>>>> lists/freebsd/ports-commiters
>>>> ⌂64% [l...@thebighonker.lerctr.org:~] $ tail -f /var/log/maillog
>>>> Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login:
>>>> Disconnected (auth failed, 1 attempts in 2 secs): user=,
>>>> method=PLAIN, rip=180.180.217.124, lip=192.147.25.65, TLS:
>>>> Connection closed, session=
>>>> Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login:
>>>> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
>>>> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS,
>>>> session=
>>>> Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged
>>>> out in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
>>>> Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP
>>>> address 23.100.68.192
>>>> Apr 14 19:30:55 thebighonker exim[14846]:
>>>> H=(DaVinci-MWare.prophet21lab.com
>>>> <http://DaVinci-MWare.prophet21lab.com>) [23.100.68.192]:52130
>>>> I=[192.147.25.65]:25 sender verify defer for >>> <mailto:i...@duke.org>>: host lookup did not complete
>>>> Apr 14 19:30:55 thebighonker exim[14846]:
>>>> H=(DaVinci-MWare.prophet21lab.com
>>>> <http://DaVinci-MWare.prophet21lab.com>) [23.100.68.192]:52130
>>>> I=[192.147.25.65]:25 F=mailto:i...@duke.org>>
>>>> temporarily rejected RCPT mailto:jpo...@why.net>>:
>>>> Could not complete sender verify
>>>> Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login:
>>>> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
>>>> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS,
>>>> session=
>>>> Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged
>>>> out in=169 out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
>>>> Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP
>>>> address 80.253.235.35
>>>> Apr 14 19:31:19 thebighonker dovecot[2507]:
>>>> indexer-worker(ler/14919): Indexed 1578 messages in
>>>> lists/freebsd/ports-commiters (UIDs 21067..22644)
>>>> ^C
>>>> [l...@thebighonker.lerctr.org:~] 130 $ doveadm search mailbox
>>>> lists/freebsd/ports-commiters  body 'sysutils/'
>>>
>>>
>>> Just minor nit, but you are searching for 'sysutils' first, then
>>> 'sysutils/'. FTS does not do substring searches by default.
>>>
>>> Aki
>>>
>>>>
>>>> -- 
>>>> Larry Rosenman                     http://www.lerctr.org/~ler
>>>> <http://www.lerctr.org/~ler>
>>>>

Re: SOLR/Index?

2019-04-15 Thread Larry Rosenman via dovecot
I'll run a full test when I'm back in front of areal computer vs. My phone.(in 
a few hours)

Get Outlook for Android<https://aka.ms/ghei36>


From: dovecot  on behalf of John Fawcett via 
dovecot 
Sent: Monday, April 15, 2019 3:57:08 AM
To: Dovecot Mailing List
Subject: Re: SOLR/Index?

On 15/04/2019 10:31, Larry Rosenman via dovecot wrote:
It always shows the autoindex. And yes built from sources.  I'm the FreeBSD 
port maintainer for mail/docecot.  This has been happening for several releases.

Get Outlook for Android<https://aka.ms/ghei36>


From: dovecot <mailto:dovecot-boun...@dovecot.org> 
on behalf of John Fawcett via dovecot 
<mailto:dovecot@dovecot.org>
Sent: Monday, April 15, 2019 2:06:55 AM
To: dovecot@dovecot.org<mailto:dovecot@dovecot.org>
Subject: Re: SOLR/Index?

On 15/04/2019 08:09, Larry Rosenman via dovecot wrote:
Note the hits after the fts rescan/index.

Get Outlook for Android<https://aka.ms/ghei36>


From: Aki Tuomi <mailto:aki.tu...@open-xchange.com>
Sent: Monday, April 15, 2019 12:55:07 AM
To: Larry Rosenman; John Fawcett
Cc: Dovecot Mailing List
Subject: Re: SOLR/Index?



On 15.4.2019 3.33, Larry Rosenman via dovecot wrote:
⌂72% [l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm search mailbox lists/freebsd/ports-commiters  body 'sysutils'
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm fts rescan
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm index -q lists/freebsd/ports-commiters
⌂64% [l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
tail -f /var/log/maillog
Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login: Disconnected (auth 
failed, 1 attempts in 2 secs): user=, method=PLAIN, rip=180.180.217.124, 
lip=192.147.25.65, TLS: Connection closed, session=
Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login: user=, 
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, 
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS, 
session=
Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged out 
in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP address 
23.100.68.192
Apr 14 19:30:55 thebighonker exim[14846]: 
H=(DaVinci-MWare.prophet21lab.com<http://DaVinci-MWare.prophet21lab.com>) 
[23.100.68.192]:52130 I=[192.147.25.65]:25 sender verify defer for 
mailto:i...@duke.org>>: host lookup did not complete
Apr 14 19:30:55 thebighonker exim[14846]: 
H=(DaVinci-MWare.prophet21lab.com<http://DaVinci-MWare.prophet21lab.com>) 
[23.100.68.192]:52130 I=[192.147.25.65]:25 
F=mailto:i...@duke.org>> temporarily rejected RCPT 
mailto:jpo...@why.net>>: Could not complete sender verify
Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login: user=, 
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, 
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS, 
session=
Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged out in=169 
out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP address 
80.253.235.35
Apr 14 19:31:19 thebighonker dovecot[2507]: indexer-worker(ler/14919): Indexed 
1578 messages in lists/freebsd/ports-commiters (UIDs 21067..22644)
^C
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] 130 $ 
doveadm search mailbox lists/freebsd/ports-commiters  body 'sysutils/'


Just minor nit, but you are searching for 'sysutils' first, then 'sysutils/'. 
FTS does not do substring searches by default.

Aki

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: 
larry...@gmail.com<mailto:larry...@gmail.com>
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Larry

just to be sure: are you running a standard unmodified 2.3.5.1 version which 
you built from source code?

I can see that first you search for sysutils, then do a rescan and reindex 
(which is shown in the log) and then you are able to find sysutils/.

It is better when doing these tests to search for the same string before and 
after, just to eliminate too many different factors in the test.

Nevertheless I did not see your logging for what happens when you receive a 
test message containing sysutils/. Dovecot should be outputing info about 
autoindexing given your setup. Does it do that or does it give some other 
message? Can you show those logs?

John

Larry

Did you notice any difference between the logging for auto indexing and the 
logging for indexing that you triggered manually? Would you mind posting the 
auto indexing logging for a message to that same user (ler)?

best regards

John


Re: SOLR/Index?

2019-04-15 Thread John Fawcett via dovecot
On 15/04/2019 10:31, Larry Rosenman via dovecot wrote:
> It always shows the autoindex. And yes built from sources.  I'm the
> FreeBSD port maintainer for mail/docecot.  This has been happening for
> several releases.
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
> 
> *From:* dovecot  on behalf of John
> Fawcett via dovecot 
> *Sent:* Monday, April 15, 2019 2:06:55 AM
> *To:* dovecot@dovecot.org
> *Subject:* Re: SOLR/Index?
>  
> On 15/04/2019 08:09, Larry Rosenman via dovecot wrote:
>> Note the hits after the fts rescan/index.
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>> 
>> *From:* Aki Tuomi 
>> *Sent:* Monday, April 15, 2019 12:55:07 AM
>> *To:* Larry Rosenman; John Fawcett
>> *Cc:* Dovecot Mailing List
>> *Subject:* Re: SOLR/Index?
>>  
>>
>>
>> On 15.4.2019 3.33, Larry Rosenman via dovecot wrote:
>>> ⌂72% [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox
>>> lists/freebsd/ports-commiters  body 'sysutils'
>>> [l...@thebighonker.lerctr.org:~] $ doveadm fts rescan
>>> [l...@thebighonker.lerctr.org:~] $ doveadm index -q
>>> lists/freebsd/ports-commiters
>>> ⌂64% [l...@thebighonker.lerctr.org:~] $ tail -f /var/log/maillog
>>> Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login: Disconnected
>>> (auth failed, 1 attempts in 2 secs): user=, method=PLAIN,
>>> rip=180.180.217.124, lip=192.147.25.65, TLS: Connection closed,
>>> session=
>>> Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login:
>>> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
>>> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS,
>>> session=
>>> Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged
>>> out in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
>>> Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP
>>> address 23.100.68.192
>>> Apr 14 19:30:55 thebighonker exim[14846]:
>>> H=(DaVinci-MWare.prophet21lab.com
>>> <http://DaVinci-MWare.prophet21lab.com>) [23.100.68.192]:52130
>>> I=[192.147.25.65]:25 sender verify defer for >> <mailto:i...@duke.org>>: host lookup did not complete
>>> Apr 14 19:30:55 thebighonker exim[14846]:
>>> H=(DaVinci-MWare.prophet21lab.com
>>> <http://DaVinci-MWare.prophet21lab.com>) [23.100.68.192]:52130
>>> I=[192.147.25.65]:25 F=mailto:i...@duke.org>>
>>> temporarily rejected RCPT mailto:jpo...@why.net>>:
>>> Could not complete sender verify
>>> Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login:
>>> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
>>> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS,
>>> session=
>>> Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged
>>> out in=169 out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
>>> Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP
>>> address 80.253.235.35
>>> Apr 14 19:31:19 thebighonker dovecot[2507]:
>>> indexer-worker(ler/14919): Indexed 1578 messages in
>>> lists/freebsd/ports-commiters (UIDs 21067..22644)
>>> ^C
>>> [l...@thebighonker.lerctr.org:~] 130 $ doveadm search mailbox
>>> lists/freebsd/ports-commiters  body 'sysutils/'
>>
>>
>> Just minor nit, but you are searching for 'sysutils' first, then
>> 'sysutils/'. FTS does not do substring searches by default.
>>
>> Aki
>>
>>>
>>> -- 
>>> Larry Rosenman                     http://www.lerctr.org/~ler
>>> <http://www.lerctr.org/~ler>
>>> Phone: +1 214-642-9640 (c)     E-Mail: larry...@gmail.com
>>> <mailto:larry...@gmail.com>
>>> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
>
> Larry
>
> just to be sure: are you running a standard unmodified 2.3.5.1 version
> which you built from source code?
>
> I can see that first you search for sysutils, then do a rescan and
> reindex (which is shown in the log) and then you are able to find
> sysutils/.
>
> It is better when doing these tests to search for the same string
> before and after, just to eliminate too many different factors in the
> test.
>
> Nevertheless I did not see your logging for what happens when you
> receive a test message containing sysutils/. Dovecot should be
> outputing info about autoindexing given your setup. Does it do that or
> does it give some other message? Can you show those logs?
>
> John
>
Larry

Did you notice any difference between the logging for auto indexing and
the logging for indexing that you triggered manually? Would you mind
posting the auto indexing logging for a message to that same user (ler)?

best regards

John



Re: SOLR/Index?

2019-04-15 Thread Larry Rosenman via dovecot
It always shows the autoindex. And yes built from sources.  I'm the FreeBSD 
port maintainer for mail/docecot.  This has been happening for several releases.

Get Outlook for Android<https://aka.ms/ghei36>


From: dovecot  on behalf of John Fawcett via 
dovecot 
Sent: Monday, April 15, 2019 2:06:55 AM
To: dovecot@dovecot.org
Subject: Re: SOLR/Index?

On 15/04/2019 08:09, Larry Rosenman via dovecot wrote:
Note the hits after the fts rescan/index.

Get Outlook for Android<https://aka.ms/ghei36>


From: Aki Tuomi <mailto:aki.tu...@open-xchange.com>
Sent: Monday, April 15, 2019 12:55:07 AM
To: Larry Rosenman; John Fawcett
Cc: Dovecot Mailing List
Subject: Re: SOLR/Index?



On 15.4.2019 3.33, Larry Rosenman via dovecot wrote:
⌂72% [l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm search mailbox lists/freebsd/ports-commiters  body 'sysutils'
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm fts rescan
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm index -q lists/freebsd/ports-commiters
⌂64% [l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
tail -f /var/log/maillog
Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login: Disconnected (auth 
failed, 1 attempts in 2 secs): user=, method=PLAIN, rip=180.180.217.124, 
lip=192.147.25.65, TLS: Connection closed, session=
Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login: user=, 
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, 
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS, 
session=
Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged out 
in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP address 
23.100.68.192
Apr 14 19:30:55 thebighonker exim[14846]: 
H=(DaVinci-MWare.prophet21lab.com<http://DaVinci-MWare.prophet21lab.com>) 
[23.100.68.192]:52130 I=[192.147.25.65]:25 sender verify defer for 
mailto:i...@duke.org>>: host lookup did not complete
Apr 14 19:30:55 thebighonker exim[14846]: 
H=(DaVinci-MWare.prophet21lab.com<http://DaVinci-MWare.prophet21lab.com>) 
[23.100.68.192]:52130 I=[192.147.25.65]:25 
F=mailto:i...@duke.org>> temporarily rejected RCPT 
mailto:jpo...@why.net>>: Could not complete sender verify
Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login: user=, 
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, 
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS, 
session=
Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged out in=169 
out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP address 
80.253.235.35
Apr 14 19:31:19 thebighonker dovecot[2507]: indexer-worker(ler/14919): Indexed 
1578 messages in lists/freebsd/ports-commiters (UIDs 21067..22644)
^C
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] 130 $ 
doveadm search mailbox lists/freebsd/ports-commiters  body 'sysutils/'


Just minor nit, but you are searching for 'sysutils' first, then 'sysutils/'. 
FTS does not do substring searches by default.

Aki

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: 
larry...@gmail.com<mailto:larry...@gmail.com>
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Larry

just to be sure: are you running a standard unmodified 2.3.5.1 version which 
you built from source code?

I can see that first you search for sysutils, then do a rescan and reindex 
(which is shown in the log) and then you are able to find sysutils/.

It is better when doing these tests to search for the same string before and 
after, just to eliminate too many different factors in the test.

Nevertheless I did not see your logging for what happens when you receive a 
test message containing sysutils/. Dovecot should be outputing info about 
autoindexing given your setup. Does it do that or does it give some other 
message? Can you show those logs?

John


Re: SOLR/Index?

2019-04-15 Thread John Fawcett via dovecot
On 15/04/2019 08:09, Larry Rosenman via dovecot wrote:
> Note the hits after the fts rescan/index.
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
> 
> *From:* Aki Tuomi 
> *Sent:* Monday, April 15, 2019 12:55:07 AM
> *To:* Larry Rosenman; John Fawcett
> *Cc:* Dovecot Mailing List
> *Subject:* Re: SOLR/Index?
>  
>
>
> On 15.4.2019 3.33, Larry Rosenman via dovecot wrote:
>> ⌂72% [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox
>> lists/freebsd/ports-commiters  body 'sysutils'
>> [l...@thebighonker.lerctr.org:~] $ doveadm fts rescan
>> [l...@thebighonker.lerctr.org:~] $ doveadm index -q
>> lists/freebsd/ports-commiters
>> ⌂64% [l...@thebighonker.lerctr.org:~] $ tail -f /var/log/maillog
>> Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login: Disconnected
>> (auth failed, 1 attempts in 2 secs): user=, method=PLAIN,
>> rip=180.180.217.124, lip=192.147.25.65, TLS: Connection closed,
>> session=
>> Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login:
>> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
>> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS,
>> session=
>> Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged
>> out in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
>> Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP
>> address 23.100.68.192
>> Apr 14 19:30:55 thebighonker exim[14846]:
>> H=(DaVinci-MWare.prophet21lab.com
>> <http://DaVinci-MWare.prophet21lab.com>) [23.100.68.192]:52130
>> I=[192.147.25.65]:25 sender verify defer for > <mailto:i...@duke.org>>: host lookup did not complete
>> Apr 14 19:30:55 thebighonker exim[14846]:
>> H=(DaVinci-MWare.prophet21lab.com
>> <http://DaVinci-MWare.prophet21lab.com>) [23.100.68.192]:52130
>> I=[192.147.25.65]:25 F=mailto:i...@duke.org>>
>> temporarily rejected RCPT mailto:jpo...@why.net>>:
>> Could not complete sender verify
>> Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login:
>> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
>> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS,
>> session=
>> Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged
>> out in=169 out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
>> Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP
>> address 80.253.235.35
>> Apr 14 19:31:19 thebighonker dovecot[2507]:
>> indexer-worker(ler/14919): Indexed 1578 messages in
>> lists/freebsd/ports-commiters (UIDs 21067..22644)
>> ^C
>> [l...@thebighonker.lerctr.org:~] 130 $ doveadm search mailbox
>> lists/freebsd/ports-commiters  body 'sysutils/'
>
>
> Just minor nit, but you are searching for 'sysutils' first, then
> 'sysutils/'. FTS does not do substring searches by default.
>
> Aki
>
>>
>> -- 
>> Larry Rosenman                     http://www.lerctr.org/~ler
>> <http://www.lerctr.org/~ler>
>> Phone: +1 214-642-9640 (c)     E-Mail: larry...@gmail.com
>> <mailto:larry...@gmail.com>
>> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Larry

just to be sure: are you running a standard unmodified 2.3.5.1 version
which you built from source code?

I can see that first you search for sysutils, then do a rescan and
reindex (which is shown in the log) and then you are able to find sysutils/.

It is better when doing these tests to search for the same string before
and after, just to eliminate too many different factors in the test.

Nevertheless I did not see your logging for what happens when you
receive a test message containing sysutils/. Dovecot should be outputing
info about autoindexing given your setup. Does it do that or does it
give some other message? Can you show those logs?

John



Re: SOLR/Index?

2019-04-15 Thread Larry Rosenman via dovecot
Note the hits after the fts rescan/index.

Get Outlook for Android<https://aka.ms/ghei36>


From: Aki Tuomi 
Sent: Monday, April 15, 2019 12:55:07 AM
To: Larry Rosenman; John Fawcett
Cc: Dovecot Mailing List
Subject: Re: SOLR/Index?



On 15.4.2019 3.33, Larry Rosenman via dovecot wrote:
⌂72% [l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm search mailbox lists/freebsd/ports-commiters  body 'sysutils'
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm fts rescan
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
doveadm index -q lists/freebsd/ports-commiters
⌂64% [l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] $ 
tail -f /var/log/maillog
Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login: Disconnected (auth 
failed, 1 attempts in 2 secs): user=, method=PLAIN, rip=180.180.217.124, 
lip=192.147.25.65, TLS: Connection closed, session=
Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login: user=, 
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, 
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS, 
session=
Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged out 
in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP address 
23.100.68.192
Apr 14 19:30:55 thebighonker exim[14846]: 
H=(DaVinci-MWare.prophet21lab.com<http://DaVinci-MWare.prophet21lab.com>) 
[23.100.68.192]:52130 I=[192.147.25.65]:25 sender verify defer for 
mailto:i...@duke.org>>: host lookup did not complete
Apr 14 19:30:55 thebighonker exim[14846]: 
H=(DaVinci-MWare.prophet21lab.com<http://DaVinci-MWare.prophet21lab.com>) 
[23.100.68.192]:52130 I=[192.147.25.65]:25 
F=mailto:i...@duke.org>> temporarily rejected RCPT 
mailto:jpo...@why.net>>: Could not complete sender verify
Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login: user=, 
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, 
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS, 
session=
Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged out in=169 
out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP address 
80.253.235.35
Apr 14 19:31:19 thebighonker dovecot[2507]: indexer-worker(ler/14919): Indexed 
1578 messages in lists/freebsd/ports-commiters (UIDs 21067..22644)
^C
[l...@thebighonker.lerctr.org:~<mailto:l...@thebighonker.lerctr.org:~>] 130 $ 
doveadm search mailbox lists/freebsd/ports-commiters  body 'sysutils/'


Just minor nit, but you are searching for 'sysutils' first, then 'sysutils/'. 
FTS does not do substring searches by default.

Aki

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: 
larry...@gmail.com<mailto:larry...@gmail.com>
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106


Re: SOLR/Index?

2019-04-14 Thread Aki Tuomi via dovecot

On 15.4.2019 3.33, Larry Rosenman via dovecot wrote:
> ⌂72% [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox
> lists/freebsd/ports-commiters  body 'sysutils'
> [l...@thebighonker.lerctr.org:~] $ doveadm fts rescan
> [l...@thebighonker.lerctr.org:~] $ doveadm index -q
> lists/freebsd/ports-commiters
> ⌂64% [l...@thebighonker.lerctr.org:~] $ tail -f /var/log/maillog
> Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login: Disconnected
> (auth failed, 1 attempts in 2 secs): user=, method=PLAIN,
> rip=180.180.217.124, lip=192.147.25.65, TLS: Connection closed,
> session=
> Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login:
> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS,
> session=
> Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged
> out in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
> Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP
> address 23.100.68.192
> Apr 14 19:30:55 thebighonker exim[14846]:
> H=(DaVinci-MWare.prophet21lab.com
> ) [23.100.68.192]:52130
> I=[192.147.25.65]:25 sender verify defer for  >: host lookup did not complete
> Apr 14 19:30:55 thebighonker exim[14846]:
> H=(DaVinci-MWare.prophet21lab.com
> ) [23.100.68.192]:52130
> I=[192.147.25.65]:25 F=mailto:i...@duke.org>>
> temporarily rejected RCPT mailto:jpo...@why.net>>:
> Could not complete sender verify
> Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login:
> user=, method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
> lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS,
> session=
> Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged
> out in=169 out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
> Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP
> address 80.253.235.35
> Apr 14 19:31:19 thebighonker dovecot[2507]: indexer-worker(ler/14919):
> Indexed 1578 messages in lists/freebsd/ports-commiters (UIDs 21067..22644)
> ^C
> [l...@thebighonker.lerctr.org:~] 130 $ doveadm search mailbox
> lists/freebsd/ports-commiters  body 'sysutils/'


Just minor nit, but you are searching for 'sysutils' first, then
'sysutils/'. FTS does not do substring searches by default.

Aki

>
> -- 
> Larry Rosenman                     http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 (c)     E-Mail: larry...@gmail.com
> 
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106


Re: SOLR/Index?

2019-04-14 Thread Larry Rosenman via dovecot
⌂72% [l...@thebighonker.lerctr.org:~] $ doveadm search mailbox
lists/freebsd/ports-commiters  body 'sysutils'
[l...@thebighonker.lerctr.org:~] $ doveadm fts rescan
[l...@thebighonker.lerctr.org:~] $ doveadm index -q
lists/freebsd/ports-commiters
⌂64% [l...@thebighonker.lerctr.org:~] $ tail -f /var/log/maillog
Apr 14 19:30:27 thebighonker dovecot[2507]: imap-login: Disconnected (auth
failed, 1 attempts in 2 secs): user=, method=PLAIN,
rip=180.180.217.124, lip=192.147.25.65, TLS: Connection closed,
session=
Apr 14 19:30:28 thebighonker dovecot[2507]: imap-login: Login: user=,
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14813, TLS,
session=
Apr 14 19:30:30 thebighonker dovecot[2507]: imap(ler/14813): Logged out
in=12412 out=66691 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:30:54 thebighonker exim[14846]: no host name found for IP address
23.100.68.192
Apr 14 19:30:55 thebighonker exim[14846]: H=(DaVinci-MWare.prophet21lab.com)
[23.100.68.192]:52130 I=[192.147.25.65]:25 sender verify defer for <
i...@duke.org>: host lookup did not complete
Apr 14 19:30:55 thebighonker exim[14846]: H=(DaVinci-MWare.prophet21lab.com)
[23.100.68.192]:52130 I=[192.147.25.65]:25 F= temporarily
rejected RCPT : Could not complete sender verify
Apr 14 19:31:04 thebighonker dovecot[2507]: imap-login: Login: user=,
method=PLAIN, rip=2001:470:1f0f:3ad:bb:dcff:fe50:d900,
lip=2001:470:1f0f:3ad:bb:dcff:fe50:d900, mpid=14910, TLS,
session=
Apr 14 19:31:04 thebighonker dovecot[2507]: imap(ctr/14910): Logged out
in=169 out=1711 fhc=0 fhb=0 fbc=0 fbb=0 del=0 exp=0 trash=0
Apr 14 19:31:16 thebighonker exim[14911]: no host name found for IP address
80.253.235.35
Apr 14 19:31:19 thebighonker dovecot[2507]: indexer-worker(ler/14919):
Indexed 1578 messages in lists/freebsd/ports-commiters (UIDs 21067..22644)
^C
[l...@thebighonker.lerctr.org:~] 130 $ doveadm search mailbox
lists/freebsd/ports-commiters  body 'sysutils/'
8097632f69627b5b5895bbe98eac 21077
8097632f69627b5b5895bbe98eac 21082
8097632f69627b5b5895bbe98eac 21083
8097632f69627b5b5895bbe98eac 21086
8097632f69627b5b5895bbe98eac 21118
8097632f69627b5b5895bbe98eac 21119
8097632f69627b5b5895bbe98eac 21121
8097632f69627b5b5895bbe98eac 21124
8097632f69627b5b5895bbe98eac 21125
8097632f69627b5b5895bbe98eac 21126
8097632f69627b5b5895bbe98eac 21127
8097632f69627b5b5895bbe98eac 21128
8097632f69627b5b5895bbe98eac 21141
8097632f69627b5b5895bbe98eac 21142
8097632f69627b5b5895bbe98eac 21168
8097632f69627b5b5895bbe98eac 21175
8097632f69627b5b5895bbe98eac 21180
8097632f69627b5b5895bbe98eac 21184
8097632f69627b5b5895bbe98eac 21186
8097632f69627b5b5895bbe98eac 21188
8097632f69627b5b5895bbe98eac 21195
8097632f69627b5b5895bbe98eac 21196
8097632f69627b5b5895bbe98eac 21198
8097632f69627b5b5895bbe98eac 21292
8097632f69627b5b5895bbe98eac 21312
8097632f69627b5b5895bbe98eac 21313
8097632f69627b5b5895bbe98eac 21323
8097632f69627b5b5895bbe98eac 21330
8097632f69627b5b5895bbe98eac 21344
8097632f69627b5b5895bbe98eac 21345
8097632f69627b5b5895bbe98eac 21348
8097632f69627b5b5895bbe98eac 21353
8097632f69627b5b5895bbe98eac 21354
8097632f69627b5b5895bbe98eac 21359
8097632f69627b5b5895bbe98eac 21367
8097632f69627b5b5895bbe98eac 21368
8097632f69627b5b5895bbe98eac 21369
8097632f69627b5b5895bbe98eac 21370
8097632f69627b5b5895bbe98eac 21371
8097632f69627b5b5895bbe98eac 21380
8097632f69627b5b5895bbe98eac 21390
8097632f69627b5b5895bbe98eac 21391
8097632f69627b5b5895bbe98eac 21392
8097632f69627b5b5895bbe98eac 21393
8097632f69627b5b5895bbe98eac 21394
8097632f69627b5b5895bbe98eac 21395
8097632f69627b5b5895bbe98eac 21439
8097632f69627b5b5895bbe98eac 21440
8097632f69627b5b5895bbe98eac 21480
8097632f69627b5b5895bbe98eac 21518
8097632f69627b5b5895bbe98eac 21538
8097632f69627b5b5895bbe98eac 21539
8097632f69627b5b5895bbe98eac 21593
8097632f69627b5b5895bbe98eac 21610
8097632f69627b5b5895bbe98eac 21612
8097632f69627b5b5895bbe98eac 21615
8097632f69627b5b5895bbe98eac 21682
8097632f69627b5b5895bbe98eac 21696
8097632f69627b5b5895bbe98eac 21697
8097632f69627b5b5895bbe98eac 21700
8097632f69627b5b5895bbe98eac 21701
8097632f69627b5b5895bbe98eac 21710
8097632f69627b5b5895bbe98eac 21743
8097632f69627b5b5895bbe98eac 21856
8097632f69627b5b5895bbe98eac 21858
8097632f69627b5b5895bbe98eac 21882
8097632f69627b5b5895bbe98eac 21883
8097632f69627b5b5895bbe98eac 21886
8097632f69627b5b5895bbe98eac 21887
8097632f69627b5b5895bbe98eac 21900
8097632f69627b5b5895bbe98eac 21910
8097632f69627b5b5895bbe98eac 21918
8097632f69627b5b5895bbe98eac 21930
8097632f69627b5b5895bbe98eac 21931
8097632f69627b5b5895bbe98eac 21955
8097632f69627b5b5895bbe98eac 21971
8097632f69627b5b5895bbe98eac 21986

Re: SOLR/Index?

2019-04-14 Thread John Fawcett via dovecot
On 15/04/2019 01:39, Larry Rosenman via dovecot wrote:
>
> full solr.log at:
> https://www.lerctr.org/~ler/solr.log
>
> The search DOES make it to SOLR:
> ⌂77% [l...@thebighonker.lerctr.org:~] 130 $ grep sysutils
> /var/log/solr/solr.log
> 2019-04-14 18:31:34.749 INFO  (qtp349420578-7538) [   x:dovecot]
> o.a.s.c.S.Request [dovecot]  webapp=/solr path=/select
> params={q={!lucene+q.op%3DAND}(hdr:sysutils\/+OR+body:sysutils\/)=uid,score=uid+asc=%2Bbox:8097632f69627b5b5895bbe98eac+%2Buser:ler=22644=xml}
> hits=0 status=0 QTime=460
>
> Pick showing subjects:
> https://www.lerctr.org/~ler/sysutils_mail.png
>
> What else?
>
> I'm happy to provide access.
>
> -- 
> Larry Rosenman                     http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 (c)     E-Mail: larry...@gmail.com
> 
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Larry

so the search is returning no hits as you said. But can you show that
there is data in the index that should match?

doveadm search -u u...@example.com mailbox inbox body "sysutils/"

Can you do a controlled test and send yourself a test message with that
string and show the solr log where it is being inserted into the index
and then search for it with doveadm (just to rule out roundcube for the
moment) and show solr log for that search?

John




Re: SOLR/Index?

2019-04-14 Thread Larry Rosenman via dovecot
full solr.log at:
https://www.lerctr.org/~ler/solr.log

The search DOES make it to SOLR:
⌂77% [l...@thebighonker.lerctr.org:~] 130 $ grep sysutils
/var/log/solr/solr.log
2019-04-14 18:31:34.749 INFO  (qtp349420578-7538) [   x:dovecot]
o.a.s.c.S.Request [dovecot]  webapp=/solr path=/select
params={q={!lucene+q.op%3DAND}(hdr:sysutils\/+OR+body:sysutils\/)=uid,score=uid+asc=%2Bbox:8097632f69627b5b5895bbe98eac+%2Buser:ler=22644=xml}
hits=0 status=0 QTime=460

Pick showing subjects:
https://www.lerctr.org/~ler/sysutils_mail.png

What else?

I'm happy to provide access.

-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: larry...@gmail.com
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106


Re: SOLR/Index?

2019-04-14 Thread John Fawcett via dovecot
On 15/04/2019 01:15, Larry Rosenman via dovecot wrote:
> Given all the discussion on FTS/Solr, etc, I have a question:
>
> I have autoindex set, and searching in roundcube most of the time does
> NOT find things,
> HOWEVER if I do:
> doveadm fts rescan
> doveadm index
>
> I can find things in the mailboxes.
>
> WHY?
>
> (doveconf -n attached).
>
> -- 
> Larry Rosenman                     http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 (c)     E-Mail: larry...@gmail.com
> 
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Larry

have you been able to check your solr logs to see if the queries that
find nothing actually arrive at solr? What is logged for those queries?

Some clients (don't know if roundcube is among those) don't send the
query in some circumstances.

Also when email arrives in dovecot is the automatic indexing working: do
you see solr logging for adding those messages.

John



Re: [PATCH] Re: Solr connection timeout hardwired to 60s

2019-04-14 Thread Shawn Heisey via dovecot

On 4/14/2019 7:59 AM, John Fawcett via dovecot wrote:

From dovecot point of view I can see the following as potentially useful
features:

1) a configurable batch size would enable to tune the number of emails
per request and help stay under the 60 seconds hard coded http request
timeout. A configurable http timeout would be less useful, since this
will potentially run into other timeouts on solr side.


Even if several thousand emails are sent per batch, unless they're 
incredibly large, I can't imagine indexing them taking more than a few 
seconds.  Does dovecot send attachments to Solr as well as the email 
itself?  Hopefully it doesn't.  If it does, then you would want a 
smaller batch size.


But if the heap size for Solr is not big enough, that can cause major 
delays no matter what requests are being sent, because Java will be 
spending most of its time doing garbage collection.


I'm also assuming that the Solr server is on the same LAN as dovecot and 
that transferring the update data does not take a long time.


Thanks,
Shawn


Re: Solr connection timeout hardwired to 60s

2019-04-14 Thread John Fawcett via dovecot
On 14/04/2019 17:55, John Fawcett via dovecot wrote:
> The solr server is a small test virtual machine with 0.2 (shared) vCPU
> and 0.6MB of memory and non SSD storage. It can index around 2000 emails
> per minute when there is no other activity. Average email size is about
> 45Kb. I'm not indexing attachments.
>
> John
>
I more than double the performance by using a 0.5 shared vCPU and SSD
storage. Approx 4500 emails per minute.

John



Re: Solr connection timeout hardwired to 60s

2019-04-14 Thread John Fawcett via dovecot
On 14/04/2019 17:16, Peter Mogensen via dovecot wrote:
> sorry... I got distracted half way and forgot to put a meaningfull
> subject so the archive could figure out the thread. - resending.
>
> On 4/14/19 4:04 PM, dovecot-requ...@dovecot.org wrote:
>
>>> Solr ships with autoCommit set to 15 seconds and openSearcher set to
>>> false on the autoCommit.? The autoSoftCommit setting is not enabled by
>>> default, but depending on how the index was created, Solr might try to
>>> set autoSoftCommit to 3 seconds ... which is WAY too short.
> I just run with the default. 15s autoCommit and no autoSoftCommit
>
>>> This thread says that dovecot is sending explicit commits.? 
> I see explicit /update req. with softCommit and waitSearcer=true in a
> tcpdump.
>
>>> One thing
>>> that might be happening to exceed 60 seconds is an extremely long
>>> commit, which is usually caused by excessive cache autowarming, but
>>> might be related to insufficient memory.? The max heap setting on an
>>> out-of-the-box Solr install (5.0 and later) is 512MB.? That's VERY
>>> small, and it doesn't take much index data before a much larger heap
>>> is required.
> I run with
>
> SOLR_JAVA_MEM="-Xmx8g -Xms2g"
>
>> I looked into the code (version 2.3.5.1):
> This is 2.2.35. I haven't checked the source difference to 2.3.x I must
> admit.
>
>> I immagine that one of the reasons dovecot sends softCommits is because
>> without autoindex active and even if mailboxes are periodically indexed
>> from cron, the last emails received with be indexed at the moment of the
>> search.? 
> I expect that dovecot has to because of it's default behavior by only
> bringing the index up-to-date just before search. So it has towait for
> the index result to be available if there's been any new mails indexed.
>
>> 1) a configurable batch size would enable to tune the number of emails
>> per request and help stay under the 60 seconds hard coded http request
>> timeout. A configurable http timeout would be less useful, since this
>> will potentially run into other timeouts on solr side.
> Being able to configure it is great.
> But I don't think it solves much. I recompiled with 100 as batch size
> and it still ended in timeouts.
> Then I recompiled with 10min timeout and now I see all the batches
> completing and their processesing time is mostly between 1 and 2 minutes
> (so all would have failed).
>
> To me it looks like Solr just takes too long time to index. This is no
> small machine. It's a 20 core Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> and for this test it's not doing anything else, so I'm a bit surprised
> that even with only a few users this takes so long time.
>
> /Peter
>
>
Peter

I suppose you could go with a batch size of 50. If it's linear, you
could still keep under the default 60 seconds http request time :-)

I'm now testing with solr settings autoCommit 15 seconds, autoSoftCommit
60 seconds and sending no softCommits from dovecot and 500 batch size.

I've set up

    /usr/local/bin/doveadm index -A "*"

in crontab every 5 minutes so indexes will stay mostly up to date to
minimize amount of mail not already visible in the index when searches
are done.

The solr server is a small test virtual machine with 0.2 (shared) vCPU
and 0.6MB of memory and non SSD storage. It can index around 2000 emails
per minute when there is no other activity. Average email size is about
45Kb. I'm not indexing attachments.

John



Re: [PATCH] Re: Solr connection timeout hardwired to 60s

2019-04-14 Thread John Fawcett via dovecot
On 14/04/2019 16:04, Aki Tuomi via dovecot wrote:
>
>> On 14 April 2019 16:59 John Fawcett via dovecot < dovecot@dovecot.org
>> > wrote:
>>
>>
>> On 13/04/2019 17:16, Shawn Heisey via dovecot wrote:
>>> On 4/13/2019 4:29 AM, John Fawcett via dovecot wrote:
 If this value was made configurable people could set it to what they
 want. However the underlying problem is likely on solr configuration.
>>> The Jetty that is included in Solr has its idle timeout set to 50
>>> seconds.  But in practice, I have not seen this timeout trigger ...
>>> and if the OP is seeing a 60 second timeout, then the 50 second idle
>>> timeout in Jetty must not be occurring.
>>> There may be a socket timeout configured on inter-server requests --
>>> distributed queries or the load balancing that SolrCloud does.  I can
>>> never remember whether this is the case by default.  I think it is.
>> >> If there is an issue on initial indexing, where you are not really
>> >> concerned about qucik visibility but just getting things into the
>> index
>> >> efficiently, a better approach would be for dovecot plugin not to
>> send
>> >> any commit or softCommit (or waitSearcher either) and that should
>> speed
>> >> things up. You'd need to configure solr with a long autoSoftCommit
>> >> maxTime and a reasonable autoCommit maxTime, which you could then
>> >> reconfigure when the load was done.
>> >
>>> Solr ships with autoCommit set to 15 seconds and openSearcher set to
>>> false on the autoCommit.  The autoSoftCommit setting is not enabled by
>>> default, but depending on how the index was created, Solr might try to
>>> set autoSoftCommit to 3 seconds ... which is WAY too short.
>>> I will usually increase the autoCommit time to 60 seconds, just to
>>> reduce the amount of work that Solr is doing.  The autoSoftCommit
>>> time, if it is used, should be set to a reasonably long value ...
>>> values between two and five minutes would be good.  Attempting to use
>>> a very short autoSoftCommit time will usually lead to problems.
>>> This thread says that dovecot is sending explicit commits.  One thing
>>> that might be happening to exceed 60 seconds is an extremely long
>>> commit, which is usually caused by excessive cache autowarming, but
>>> might be related to insufficient memory.  The max heap setting on an
>>> out-of-the-box Solr install (5.0 and later) is 512MB.  That's VERY
>>> small, and it doesn't take much index data before a much larger heap
>>> is required.
>>> Thanks,
>>> Shawn
>> I looked into the code (version 2.3.5.1): the fts-solr plugin is not
>> sending softCommit every 1000 emails. Emails from a single folder are
>> batched in up to a maximum of 1000 emails per request, but the
>> softCommit gets sent once per mailbox folder at the end of all the
>> requests for that folder.
>>
>> I immagine that one of the reasons dovecot sends softCommits is because
>> without autoindex active and even if mailboxes are periodically indexed
>> from cron, the last emails received with be indexed at the moment of the
>> search.  So while sending softCommit has the advantage of including
>> recent mails in searches, it means that softCommits are being done upon
>> user request. Frequency depends on user activity.
>>
>> Going back to the original problem: seems the first advice to Peter is
>> to look into solr configuration as others have said.
>>
>> From dovecot point of view I can see the following as potentially useful
>> features:
>>
>> 1) a configurable batch size would enable to tune the number of emails
>> per request and help stay under the 60 seconds hard coded http request
>> timeout. A configurable http timeout would be less useful, since this
>> will potentially run into other timeouts on solr side.
>>
>> 2) abilty to turn off softCommits so as to have a more predictable
>> softCommit workload. In that case autoSoftCommit should be configured in
>> solr. In order to minimize risk of recent emails not appearing in search
>> results, periodic indexing could be set up by cron.
>>
>> I've attached a patch, any comments are welcome (especially about
>> getting settings from the backend context).
>>
>> Example config
>>
>> plugin {
>>   fts = solr
>>   fts_solr =
>> url= https://user:passw...@solr.example.com:443/solr/dovecot/
>> batch_size=500 no_soft_commit
>> }
>>
>> John
>
> Can you please open a pull request to https://github.com/dovecot/core ?
> ---
> Aki Tuomi

Done, thanks for considering it.

John



Re: Solr connection timeout hardwired to 60s

2019-04-14 Thread Peter Mogensen via dovecot


sorry... I got distracted half way and forgot to put a meaningfull
subject so the archive could figure out the thread. - resending.

On 4/14/19 4:04 PM, dovecot-requ...@dovecot.org wrote:

>> Solr ships with autoCommit set to 15 seconds and openSearcher set to
>> false on the autoCommit.? The autoSoftCommit setting is not enabled by
>> default, but depending on how the index was created, Solr might try to
>> set autoSoftCommit to 3 seconds ... which is WAY too short.

I just run with the default. 15s autoCommit and no autoSoftCommit

>> This thread says that dovecot is sending explicit commits.? 

I see explicit /update req. with softCommit and waitSearcer=true in a
tcpdump.

>> One thing
>> that might be happening to exceed 60 seconds is an extremely long
>> commit, which is usually caused by excessive cache autowarming, but
>> might be related to insufficient memory.? The max heap setting on an
>> out-of-the-box Solr install (5.0 and later) is 512MB.? That's VERY
>> small, and it doesn't take much index data before a much larger heap
>> is required.

I run with

SOLR_JAVA_MEM="-Xmx8g -Xms2g"

> I looked into the code (version 2.3.5.1):

This is 2.2.35. I haven't checked the source difference to 2.3.x I must
admit.

> I immagine that one of the reasons dovecot sends softCommits is because
> without autoindex active and even if mailboxes are periodically indexed
> from cron, the last emails received with be indexed at the moment of the
> search.? 

I expect that dovecot has to because of it's default behavior by only
bringing the index up-to-date just before search. So it has towait for
the index result to be available if there's been any new mails indexed.

> 1) a configurable batch size would enable to tune the number of emails
> per request and help stay under the 60 seconds hard coded http request
> timeout. A configurable http timeout would be less useful, since this
> will potentially run into other timeouts on solr side.

Being able to configure it is great.
But I don't think it solves much. I recompiled with 100 as batch size
and it still ended in timeouts.
Then I recompiled with 10min timeout and now I see all the batches
completing and their processesing time is mostly between 1 and 2 minutes
(so all would have failed).

To me it looks like Solr just takes too long time to index. This is no
small machine. It's a 20 core Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
and for this test it's not doing anything else, so I'm a bit surprised
that even with only a few users this takes so long time.

/Peter




Re: [PATCH] Re: Solr connection timeout hardwired to 60s

2019-04-14 Thread Aki Tuomi via dovecot


 
 
  
   
  
  
   
On 14 April 2019 16:59 John Fawcett via dovecot <
dovecot@dovecot.org> wrote:
   
   

   
   

   
   
On 13/04/2019 17:16, Shawn Heisey via dovecot wrote:
   
   

 On 4/13/2019 4:29 AM, John Fawcett via dovecot wrote:


 
  If this value was made configurable people could set it to what they
 
 
  want. However the underlying problem is likely on solr configuration.
 

   
   

 The Jetty that is included in Solr has its idle timeout set to 50


 seconds.  But in practice, I have not seen this timeout trigger ...


 and if the OP is seeing a 60 second timeout, then the 50 second idle


 timeout in Jetty must not be occurring.

   
   

 There may be a socket timeout configured on inter-server requests --


 distributed queries or the load balancing that SolrCloud does.  I can


 never remember whether this is the case by default.  I think it is.

   
   
>> If there is an issue on initial indexing, where you are not really
   
   
>> concerned about qucik visibility but just getting things into the index
   
   
>> efficiently, a better approach would be for dovecot plugin not to send
   
   
>> any commit or softCommit (or waitSearcher either) and that should speed
   
   
>> things up. You'd need to configure solr with a long autoSoftCommit
   
   
>> maxTime and a reasonable autoCommit maxTime, which you could then
   
   
>> reconfigure when the load was done.
   
   
>
   
   

 Solr ships with autoCommit set to 15 seconds and openSearcher set to


 false on the autoCommit.  The autoSoftCommit setting is not enabled by


 default, but depending on how the index was created, Solr might try to


 set autoSoftCommit to 3 seconds ... which is WAY too short.

   
   

 I will usually increase the autoCommit time to 60 seconds, just to


 reduce the amount of work that Solr is doing.  The autoSoftCommit


 time, if it is used, should be set to a reasonably long value ...


 values between two and five minutes would be good.  Attempting to use


 a very short autoSoftCommit time will usually lead to problems.

   
   

 This thread says that dovecot is sending explicit commits.  One thing


 that might be happening to exceed 60 seconds is an extremely long


 commit, which is usually caused by excessive cache autowarming, but


 might be related to insufficient memory.  The max heap setting on an


 out-of-the-box Solr install (5.0 and later) is 512MB.  That's VERY


 small, and it doesn't take much index data before a much larger heap


 is required.

   
   

 Thanks,


 Shawn

   
   
I looked into the code (version 2.3.5.1): the fts-solr plugin is not
   
   
sending softCommit every 1000 emails. Emails from a single folder are
   
   
batched in up to a maximum of 1000 emails per request, but the
   
   
softCommit gets sent once per mailbox folder at the end of all the
   
   
requests for that folder.
   
   

   
   
I immagine that one of the reasons dovecot sends softCommits is because
   
   
without autoindex active and even if mailboxes are periodically indexed
   
   
from cron, the last emails received with be indexed at the moment of the
   
   
search.  So while sending softCommit has the advantage of including
   
   
recent mails in searches, it means that softCommits are being done upon
   
   
user request. Frequency depends on user activity.
   
   

   
   
Going back to the original problem: seems the first advice to Peter is
   
   
to look into solr configuration as others have said.
   
   

   
   
From dovecot point of view I can see the following as potentially useful
   
   
features:
   
   

   
   
1) a configurable batch size would enable to tune the number of emails
   
   
per request and help stay under the 60 seconds hard coded http request
   
   
timeout. A configurable http timeout would be less useful, since this
   
   
will potentially run into other timeouts on solr side.
   
   

   
   
2) abilty to turn off softCommits so as to have a more predictable
   
   
softCommit workload. In that case autoSoftCommit should be configured in
   
   
solr. In order to minimize risk of recent emails not appearing in search
   
   
results, periodic indexing could be set up by cron.
   
   

   
   
I've attached a patch, any comments are welcome (especially about
   
   
getting settings from the backend context).
   
   

   
   
Example config
   
   

   
   
plugin {
   
   
  fts = solr
   
   
  fts_solr =
 

[PATCH] Re: Solr connection timeout hardwired to 60s

2019-04-14 Thread John Fawcett via dovecot
On 13/04/2019 17:16, Shawn Heisey via dovecot wrote:
> On 4/13/2019 4:29 AM, John Fawcett via dovecot wrote:
>> If this value was made configurable people could set it to what they
>> want. However the underlying problem is likely on solr configuration.
>
> The Jetty that is included in Solr has its idle timeout set to 50
> seconds.  But in practice, I have not seen this timeout trigger ...
> and if the OP is seeing a 60 second timeout, then the 50 second idle
> timeout in Jetty must not be occurring.
>
> There may be a socket timeout configured on inter-server requests --
> distributed queries or the load balancing that SolrCloud does.  I can
> never remember whether this is the case by default.  I think it is.
>
>> If there is an issue on initial indexing, where you are not really
>> concerned about qucik visibility but just getting things into the index
>> efficiently, a better approach would be for dovecot plugin not to send
>> any commit or softCommit (or waitSearcher either) and that should speed
>> things up. You'd need to configure solr with a long autoSoftCommit
>> maxTime and a reasonable autoCommit maxTime, which you could then
>> reconfigure when the load was done.
>
> Solr ships with autoCommit set to 15 seconds and openSearcher set to
> false on the autoCommit.  The autoSoftCommit setting is not enabled by
> default, but depending on how the index was created, Solr might try to
> set autoSoftCommit to 3 seconds ... which is WAY too short.
>
> I will usually increase the autoCommit time to 60 seconds, just to
> reduce the amount of work that Solr is doing.  The autoSoftCommit
> time, if it is used, should be set to a reasonably long value ...
> values between two and five minutes would be good.  Attempting to use
> a very short autoSoftCommit time will usually lead to problems.
>
> This thread says that dovecot is sending explicit commits.  One thing
> that might be happening to exceed 60 seconds is an extremely long
> commit, which is usually caused by excessive cache autowarming, but
> might be related to insufficient memory.  The max heap setting on an
> out-of-the-box Solr install (5.0 and later) is 512MB.  That's VERY
> small, and it doesn't take much index data before a much larger heap
> is required.
>
> Thanks,
> Shawn

I looked into the code (version 2.3.5.1): the fts-solr plugin is not
sending softCommit every 1000 emails. Emails from a single folder are
batched in up to a maximum of 1000 emails per request, but the
softCommit gets sent once per mailbox folder at the end of all the
requests for that folder.

I immagine that one of the reasons dovecot sends softCommits is because
without autoindex active and even if mailboxes are periodically indexed
from cron, the last emails received with be indexed at the moment of the
search.  So while sending softCommit has the advantage of including
recent mails in searches, it means that softCommits are being done upon
user request. Frequency depends on user activity.

Going back to the original problem: seems the first advice to Peter is
to look into solr configuration as others have said.

>From dovecot point of view I can see the following as potentially useful
features:

1) a configurable batch size would enable to tune the number of emails
per request and help stay under the 60 seconds hard coded http request
timeout. A configurable http timeout would be less useful, since this
will potentially run into other timeouts on solr side.

2) abilty to turn off softCommits so as to have a more predictable
softCommit workload. In that case autoSoftCommit should be configured in
solr. In order to minimize risk of recent emails not appearing in search
results, periodic indexing could be set up by cron.

I've attached a patch, any comments are welcome (especially about
getting settings from the backend context).

Example config

plugin {
  fts = solr
  fts_solr =
url=https://user:passw...@solr.example.com:443/solr/dovecot/
batch_size=500 no_soft_commit
}

John

--- src/plugins/fts-solr/fts-solr-plugin.h.orig 2019-04-14 15:12:07.694289402 
+0200
+++ src/plugins/fts-solr/fts-solr-plugin.h  2019-04-14 14:04:17.213939414 
+0200
@@ -12,8 +12,10 @@
 
 struct fts_solr_settings {
const char *url, *default_ns_prefix;
+   unsigned int batch_size;
bool use_libfts;
bool debug;
+   bool no_soft_commit;
 };
 
 struct fts_solr_user {
--- src/plugins/fts-solr/fts-solr-plugin.c.orig 2019-04-14 11:41:03.591782439 
+0200
+++ src/plugins/fts-solr/fts-solr-plugin.c  2019-04-14 14:37:46.059433864 
+0200
@@ -10,6 +10,8 @@
 #include "fts-solr-plugin.h"
 
 
+#define DEFAULT_SOLR_BATCH_SIZE 1000
+
 const char *fts_solr_plugin_version = DOVECOT_ABI_VERSION;
 struct http_client *solr_http_client = NULL;
 
@@ -37,6 +39,10 @@
} else if (str_begins(*tmp, "default_ns=")) {
set->default_ns_prefix =
p_strdup(user->pool, *tmp + 11);
+   } else if 

Re: Solr connection timeout hardwired to 60s

2019-04-13 Thread Shawn Heisey via dovecot

On 4/13/2019 4:29 AM, John Fawcett via dovecot wrote:

If this value was made configurable people could set it to what they
want. However the underlying problem is likely on solr configuration.


The Jetty that is included in Solr has its idle timeout set to 50 
seconds.  But in practice, I have not seen this timeout trigger ... and 
if the OP is seeing a 60 second timeout, then the 50 second idle timeout 
in Jetty must not be occurring.


There may be a socket timeout configured on inter-server requests -- 
distributed queries or the load balancing that SolrCloud does.  I can 
never remember whether this is the case by default.  I think it is.



If there is an issue on initial indexing, where you are not really
concerned about qucik visibility but just getting things into the index
efficiently, a better approach would be for dovecot plugin not to send
any commit or softCommit (or waitSearcher either) and that should speed
things up. You'd need to configure solr with a long autoSoftCommit
maxTime and a reasonable autoCommit maxTime, which you could then
reconfigure when the load was done.


Solr ships with autoCommit set to 15 seconds and openSearcher set to 
false on the autoCommit.  The autoSoftCommit setting is not enabled by 
default, but depending on how the index was created, Solr might try to 
set autoSoftCommit to 3 seconds ... which is WAY too short.


I will usually increase the autoCommit time to 60 seconds, just to 
reduce the amount of work that Solr is doing.  The autoSoftCommit time, 
if it is used, should be set to a reasonably long value ... values 
between two and five minutes would be good.  Attempting to use a very 
short autoSoftCommit time will usually lead to problems.


This thread says that dovecot is sending explicit commits.  One thing 
that might be happening to exceed 60 seconds is an extremely long 
commit, which is usually caused by excessive cache autowarming, but 
might be related to insufficient memory.  The max heap setting on an 
out-of-the-box Solr install (5.0 and later) is 512MB.  That's VERY 
small, and it doesn't take much index data before a much larger heap is 
required.


Thanks,
Shawn


Re: Solr connection timeout hardwired to 60s

2019-04-13 Thread John Fawcett via dovecot
On 12/04/2019 12:09, Peter Mogensen via dovecot wrote:
> Looking further at tcpdumps of the Dovecot->Solr traffic and Solr
> metrics it doesn't seem like there's anything suspicious apart from the
> TCP windows running full and Dovecot backing of ... until it times out
> and close the connection.
>
> From my understanding of how Dovecot operates towards Solr it will flush
> ~1000 documents towards Solr in /update request until it has traversed
> the mailbox (let's say 20.000 mails), doing softCommits after each.
>
> But is it really reasonable for Dovecot to expect that no request will
> take more than 60s to process by Solr?
> It doesn't seem like my Solr can handle that, although it does process
> documents and it does reasonably fast clear pending documents after
> Dovecot closes the connection.
>
> On the surface it looks like Dovecot is too impatient.
>
> /Peter

If this value was made configurable people could set it to what they
want. However the underlying problem is likely on solr configuration.

Is this a problem only on initial indexing or an ongoing problem after
initial indexing?

The parameters that the solr plugin are using are designed to make
documents visible to searches quickly.

If there is an issue on initial indexing, where you are not really
concerned about qucik visibility but just getting things into the index
efficiently, a better approach would be for dovecot plugin not to send
any commit or softCommit (or waitSearcher either) and that should speed
things up. You'd need to configure solr with a long autoSoftCommit
maxTime and a reasonable autoCommit maxTime, which you could then
reconfigure when the load was done.

If you're using dovecot built from source code it should be possible to
test that by some minor modification of the code in  fts-backend-solr.c.

John




Re: Solr connection timeout hardwired to 60s

2019-04-12 Thread Peter Mogensen via dovecot


Looking further at tcpdumps of the Dovecot->Solr traffic and Solr
metrics it doesn't seem like there's anything suspicious apart from the
TCP windows running full and Dovecot backing of ... until it times out
and close the connection.

>From my understanding of how Dovecot operates towards Solr it will flush
~1000 documents towards Solr in /update request until it has traversed
the mailbox (let's say 20.000 mails), doing softCommits after each.

But is it really reasonable for Dovecot to expect that no request will
take more than 60s to process by Solr?
It doesn't seem like my Solr can handle that, although it does process
documents and it does reasonably fast clear pending documents after
Dovecot closes the connection.

On the surface it looks like Dovecot is too impatient.

/Peter

On 4/10/19 6:25 PM, Peter Mogensen wrote:
> 
> 
> On 4/4/19 6:57 PM, Peter Mogensen wrote:
>>
>>
>> On 4/4/19 6:47 PM, dovecot-requ...@dovecot.org wrote:
>>> For a typical Solr index, 60 seconds is an eternity.  Most people aim
>>> for query times of 100 milliseconds or less, and they often achieve
>>> that goal.
>>
>> I'm pretty sure I get these while indexing, not querying.
>>
>> Apr 04 16:44:50 host dovecot[114690]: indexer-worker(m...@example.com):
>> Error: fts_solr: Indexing failed: Request timed out (Request queued
>> 66.015 secs ago, 1 attempts in 66.005 secs, 63.146 in http ioloop, 0.000
>> in other ioloops, connected 94.903 secs ago)
> 
> Doing a TCP dump on indexing operations which consistently fail, I see
> that there's a lot of softCommits which never get an HTTP answer:
> 
> ==
> POST /solr/dovebody/update HTTP/1.1
> Host: localhost:8983
> Date: Wed, 10 Apr 2019 14:22:29 GMT
> Expect: 100-continue
> Content-Length: 47
> Connection: Keep-Alive
> Content-Type: text/xml
> 
> HTTP/1.1 100 Continue
> 
> 
> 





Re: Solr connection timeout hardwired to 60s

2019-04-10 Thread Peter Mogensen via dovecot



On 4/4/19 6:57 PM, Peter Mogensen wrote:
> 
> 
> On 4/4/19 6:47 PM, dovecot-requ...@dovecot.org wrote:
>> For a typical Solr index, 60 seconds is an eternity.  Most people aim
>> for query times of 100 milliseconds or less, and they often achieve
>> that goal.
> 
> I'm pretty sure I get these while indexing, not querying.
> 
> Apr 04 16:44:50 host dovecot[114690]: indexer-worker(m...@example.com):
> Error: fts_solr: Indexing failed: Request timed out (Request queued
> 66.015 secs ago, 1 attempts in 66.005 secs, 63.146 in http ioloop, 0.000
> in other ioloops, connected 94.903 secs ago)

Doing a TCP dump on indexing operations which consistently fail, I see
that there's a lot of softCommits which never get an HTTP answer:

==
POST /solr/dovebody/update HTTP/1.1
Host: localhost:8983
Date: Wed, 10 Apr 2019 14:22:29 GMT
Expect: 100-continue
Content-Length: 47
Connection: Keep-Alive
Content-Type: text/xml

HTTP/1.1 100 Continue





... in contrast to the first softCommit on the connection:


POST /solr/dovebody/update HTTP/1.1
Host: localhost:8983
Date: Wed, 10 Apr 2019 14:20:53 GMT
Expect: 100-continue
Content-Length: 47
Connection: Keep-Alive
Content-Type: text/xml

HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Content-Type: application/xml; charset=UTF-8
Content-Length: 156





  0
  37


==

The missing softCommit responses seem to start right after the last
added document:
==

0

HTTP/1.1 200 OK
Content-Type: application/xml; charset=UTF-8
Content-Length: 156





  0
  12


POST /solr/dovebody/update HTTP/1.1
Host: localhost:8983
Date: Wed, 10 Apr 2019 14:22:29 GMT
Expect: 100-continue
Content-Length: 47
Connection: Keep-Alive
Content-Type: text/xml

HTTP/1.1 100 Continue


===

... and then the rest of the TCP dump doesn't get responses to
softCommit POSTs

/Peter


Re: Solr connection timeout hardwired to 60s

2019-04-04 Thread Shawn Heisey via dovecot

On 4/4/2019 6:42 PM, M. Balridge via dovecot wrote:

What is a general rule of thumb for RAM and SSD disk requirements as a
fraction of indexed document hive size to keep query performance at 200ms or
less? How do people deal with the JAVA GC world-stoppages, other than simply
doubling or tripling every instance?


There's no hard and fast rule for exactly how much memory you need for a 
search engine.  Some installs work well with half the index cached, 
others require more, some require less.


For ideal performance, you should have enough memory over and above your 
program requirements to cache the entire index.  That can be problematic 
with indexes that are hundreds of gigabytes, or even terabytes. 
Achieving the ideal is rarely necessary, though.


With a large enough heap, it is simply impossible to avoid long 
stop-the-world GC.  With proper tuning, those full garbage collections 
can happen far less frequently.  I've got another page about that.


https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

To handle extremely large indexes with good performance, I would 
recommend many servers running SolrCloud, and a sharded index.  That way 
each individual server will not be required to handle terabytes of data. 
 This can get very expensive very quickly.  You will also need a load 
balancer, to eliminate single points of failure.



I am wondering how well alternatives to Solr work in these situations
(ElasticSearch, Xapian, and any others I may have missed).


Assuming they are configured as similarly as possible, ElasticSearch and 
Solr will have nearly identical requirements, and perform similarly to 
each other.  They are both Lucene-based, and it is Lucene that primarily 
drives the requirements.  I know nothing about any other solutions.


With the extremely large index you have described, memory will be your 
achilles heel no matter what solution you find.


It is not Java that needs the extreme amounts of memory for very large 
indexes.  It is the operating system -- the disk cache.  You might also 
need a fairly large heap, but the on-disk size of the index will have 
less of an impact on heap requirements than the number of documents in 
the index.


Thanks,
Shawn


Re: Solr connection timeout hardwired to 60s

2019-04-04 Thread M. Balridge via dovecot


> I'm a denizen of the solr-u...@lucene.apache.org mailing list.
> [...]
> Here's a wiki page that I wrote about that topic.  This wiki is going
> away next month, but for now you can still access it:
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems

That's a great resource, Shawn.

I am about to put together a test case to provide a comprehensive FTS setup
around Dovecot with a goal towards exposing proximity keyword searching, with
email silos containing tens of terabytes (most of the "bulk" is represented by
attachments, each of which get processed down to plaintext, if possible).
Figure thousands of users with decades of email (80,000 to 750,000) emails per
user).

My main background is in software engineering (C/C++/Python/Assembler), but I
have been forced into system admin tasks during many stretches of my work. I
do vividly remember the tedium of dealing with JAVA and GC, tuning it to avoid
stalls, and its ravenous appetite for RAM. 

It looks like those problems are still with us, many versions later.  For
corporations with infinite budgets, throwing lots of crazy money at the
problem is "fine" (>1TB RAM, all PCIe SSDs, etc), but I am worried that I will
be shoved forcefully into a wall of having to spend a fortune just to keep FTS
performing reasonably well before I even get to the 10,000 user mark.

I realise the only way to keep performance reasonable is to heavily shard the
index database, but I am concerned about how well the process works in
practice without needing a great deal of sysadmin hand-holding. I would
ideally prefer the decisions of how/where to shard be based on
volume/heuristics than something that is done manually. I realise that a human
will be necessary to add more hardware to the pools, but what are my options
for scaling the system by orders of magnitude?

What is a general rule of thumb for RAM and SSD disk requirements as a
fraction of indexed document hive size to keep query performance at 200ms or
less? How do people deal with the JAVA GC world-stoppages, other than simply
doubling or tripling every instance?

I am wondering how well alternatives to Solr work in these situations
(ElasticSearch, Xapian, and any others I may have missed).

Regards,

=M=





Re: Solr connection timeout hardwired to 60s

2019-04-04 Thread Peter Mogensen via dovecot



On 4/4/19 6:47 PM, dovecot-requ...@dovecot.org wrote:
> For a typical Solr index, 60 seconds is an eternity.  Most people aim
> for query times of 100 milliseconds or less, and they often achieve
> that goal.

I'm pretty sure I get these while indexing, not querying.

Apr 04 16:44:50 host dovecot[114690]: indexer-worker(m...@example.com):
Error: fts_solr: Indexing failed: Request timed out (Request queued
66.015 secs ago, 1 attempts in 66.005 secs, 63.146 in http ioloop, 0.000
in other ioloops, connected 94.903 secs ago)

/Peter


Re: Solr connection timeout hardwired to 60s

2019-04-04 Thread Shawn Heisey via dovecot

On 4/4/2019 2:21 AM, Peter Mogensen via dovecot wrote:

What's the recommended way to handling timeouts on large mailboxes given
the hardwired request timeout of 60s in solr-connection.c:

http_set.request_timeout_msecs = 60*1000;


I'm a denizen of the solr-u...@lucene.apache.org mailing list.

For a typical Solr index, 60 seconds is an eternity.  Most people aim 
for query times of 100 milliseconds or less, and they often achieve that 
goal.


If you have an index where queries really are taking longer than 60 
seconds, you're most likely going to need to get better hardware for 
Solr.  Memory is the resource that usually has the greatest impact on 
Solr performance.  Putting the index on SSD can help, but memory will 
help more.


Here's a wiki page that I wrote about that topic.  This wiki is going 
away next month, but for now you can still access it:


https://wiki.apache.org/solr/SolrPerformanceProblems

There's a section in that wiki page about asking for help on performance 
issues.  It describes how to create a particular process listing for a 
screenshot.  If you can get that screenshot and share it using a file 
sharing site (dropbox is usually a good choice), I may be able to offer 
some insight.


Thanks,
Shawn


Re: Solr connection timeout hardwired to 60s

2019-04-04 Thread Daniel Lange via dovecot
Hi Shawn

Am 04.04.19 um 16:12 schrieb Shawn Heisey via dovecot:
> On 4/4/2019 2:21 AM, Peter Mogensen via dovecot wrote:
> Here's a wiki page that I wrote about that topic.  This wiki is going 
> away next month, but for now you can still access it:
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems

https://web.archive.org/web/20190404143817/https://wiki.apache.org/solr/SolrPerformanceProblems

That one will last longer :).

Best
Daniel


Re: Solr connection timeout hardwired to 60s

2019-04-04 Thread Shawn Heisey via dovecot

On 4/4/2019 2:21 AM, Peter Mogensen via dovecot wrote:

What's the recommended way to handling timeouts on large mailboxes given
the hardwired request timeout of 60s in solr-connection.c:

http_set.request_timeout_msecs = 60*1000;


I'm a denizen of the solr-u...@lucene.apache.org mailing list.

For a typical Solr index, 60 seconds is an eternity.  Most people aim 
for query times of 100 milliseconds or less, and they often achieve that 
goal.


If you have an index where queries really are taking longer than 60 
seconds, you're most likely going to need to get better hardware for 
Solr.  Memory is the resource that usually has the greatest impact on 
Solr performance.  Putting the index on SSD can help, but memory will 
help more.


Here's a wiki page that I wrote about that topic.  This wiki is going 
away next month, but for now you can still access it:


https://wiki.apache.org/solr/SolrPerformanceProblems

There's a section in that wiki page about asking for help on performance 
issues.  It describes how to create a particular process listing for a 
screenshot.  If you can get that screenshot and share it using a file 
sharing site (dropbox is usually a good choice), I may be able to offer 
some insight.


Thanks,
Shawn


Re: Solr - complete setup (update)

2019-01-29 Thread Joan Moreau via dovecot

On 2019-01-30 07:33, Stephan Bosch wrote:


(forgot to CC mailing list)

Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot: 


*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong) *You can check that 
regularly in dovecot log file. My guess is the mix of Unicode which is not properly 
addressed here.*


Does this happen with specific messages? Do you have a sample message
for me? I don't see how Unicode could cause this. 


MY ONLY GUESS IS THAT IT REFERS TO SOME 'STRLEN', WHICH IS WRONG OF
COURSE IN CASE OF UNICODE EMAILS. THIS IS JUST A GUESS. 


BUT DO A GREP FOR "HUGE" IN THE DOVECOT LOG OF A BUSY SERVER TO FIND
EXAMPLES. 


(SORRY, I SWITCHED TO XAPIAN, AS SOLR IS CREATING TOO MUCH TROUBLES FOR
MY SERVER, SO NO MORE CONCRETE EXAMPLE) 


-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)

*This is just highly visible in Solr schema.xml. Swithcing it to "long" in 
schema.xml returns plenty of errors.*


I cannot reproduce this so far (see modified schema below). In a simple
test I just get the desired results and no errors logged. 


I got this with large mailboxes (where UID seems not acceptable for Solr
). The fault is not on Dovecot side but Solr, and the returned UID(s)
for a search is garbage instead of a proper value -> Putting it as
string solves this


-> Java errors : A lot of non sense for me, I am not expert in Java. But, with 
increased memory, it seems not crashing, even if complaining quite a lot in the 
logs

Can you elaborate on the errors you have seen so far? When do these happen? How 
can I reproduce them?

*Honestly, I have no clue what the problems are. I just increased the memory of 
the JVM and the systems stopped crashing. Log files are huge anyway.*


What errors do you see? I see only INFO entries in my
/var/solr/logs/solr.log. Looks like Solr is pretty verbose by default
(lots of INFO output), but there must be a way to reduce that. 

I DELETED SOLR. NO MORE LOGS. MAYBE SOMEONE ELSE CAN TELL. 




id
















































Re: Solr - complete setup (update)

2019-01-29 Thread Stephan Bosch

(forgot to CC mailing list)

Op 26/01/2019 om 20:07 schreef Joan Moreau via dovecot:



*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is 
improperly calculated ("huge header" warning for a simple email, 
which kilss the index of that considered email, so basically MOST 
emails as the calculation is wrong)
*You can check that regularly in dovecot log file. My guess is the mix 
of Unicode which is not properly addressed here.*


Does this happen with specific messages? Do you have a sample message 
for me? I don't see how Unicode could cause this.




-> The UID returned by SOlr is to be considered as a STRING (and that 
is maybe the source of problem of the "out of bound" errors in 
fts_solr dovecot, as "long" is not enough)
*This is just highly visible in Solr schema.xml. Swithcing it to 
"long" in schema.xml returns plenty of errors.*


I cannot reproduce this so far (see modified schema below). In a simple 
test I just get the desired results and no errors logged.




-> Java errors : A lot of non sense for me, I am not expert in Java. 
But, with increased memory, it seems not crashing, even if 
complaining quite a lot in the logs


Can you elaborate on the errors you have seen so far? When do these 
happen? How can I reproduce them?


*Honestly, I have no clue what the problems are. I just increased the 
memory of the JVM and the systems stopped crashing. Log files are huge 
anyway.*


What errors do you see? I see only INFO entries in my 
/var/solr/logs/solr.log. Looks like Solr is pretty verbose by default 
(lots of INFO output), but there must be a way to reduce that.


Regards,

Stephan.




id
positionIncrementGap="0"/>
autoGeneratePhraseQueries="true" positionIncrementGap="100">



generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" 
splitOnNumerics="1" catenateAll="1" catenateWords="1" preserveOriginal="1"/>













autoGeneratePhraseQueries="true">




















stored="true"/>




stored="true"/>




stored="true"/>








Re: Solr - complete setup (update)

2019-01-26 Thread Joan Moreau via dovecot

*- Installation:*

-> Create a clean install using the default, (at least in the Archlinux package), and do 
a "sudo -u solr solr create -c dovecot ". The config files are then in 
/opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data


On my system (Debian) these directories are wildly different (e.g. data
is under /var), but other than that, this information is OK.

Used this as a side-reference for Debian installation:
https://tecadmin.net/install-apache-solr-on-debian/

Accessed http://solr-host.tld:8983/solr/ to check whether all is OK. 


MAKE SURE YOU HAVE A DOVECOT INSTANCE (NOT THE DEFAULT INSTANCE) , WITH
THE FUNCTION BELOW: 

SOLR CREATE -C DOVECOT (OR WHATEVER NAME) 


Weirdly, rescan returns immediately here. When I perform `doveadm index INBOX` 
for my test user, I do see a lot of fts and HTTP activity.


THE SOLR PLUGIN IS NOT CODED ENTIRELY, REFRESH AND RESCAN FUNCTIONS ARE
MISSING : 


https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/fts-backend-solr.c


static int fts_backend_solr_refresh(struct fts_backend *backend
ATTR_UNUSED)
{
return 0;
} 


static int fts_backend_solr_rescan(struct fts_backend *backend)
{
/* FIXME: proper rescan needed. for now we'll just reset the
last-uids */
return fts_backend_reset_last_uids(backend);
} 


*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong)


YOU CAN CHECK THAT REGULARLY IN DOVECOT LOG FILE. MY GUESS IS THE MIX OF
UNICODE WHICH IS NOT PROPERLY ADDRESSED HERE. 


-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)


THIS IS JUST HIGHLY VISIBLE IN SOLR SCHEMA.XML. SWITHCING IT TO "LONG"
IN SCHEMA.XML RETURNS PLENTY OF ERRORS. 

-> Java errors : A lot of non sense for me, I am not expert in Java. But, with increased memory, it seems not crashing, even if complaining quite a lot in the logs 


Can you elaborate on the errors you have seen so far? When do these happen? How 
can I reproduce them?


HONESTLY, I HAVE NO CLUE WHAT THE PROBLEMS ARE. I JUST INCREASED THE
MEMORY OF THE JVM AND THE SYSTEMS STOPPED CRASHING. LOG FILES ARE HUGE
ANYWAY.

Re: Solr - complete setup (update)

2019-01-26 Thread Stephan Bosch




Op 26/01/2019 om 15:24 schreef Hendrik Boom:

On Sat, Jan 26, 2019 at 01:44:16PM +0100, Stephan Bosch wrote:

Hi Joan,

Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:

Hi Stephan,

What's up with that ?

Thank you so much

On 2019-01-05 02:04, Stephan Bosch wrote:

Debian does something weird here. It doesn't use an explicit systemd unit.
It is generated from the SysV init file. I ended up setting the ulimits in
/etc/security/limits.conf for user solr.

Please make sure the changes you make don't make your Debian package
*require* systemd.  There are Debian-derived distros that avoid systemd.


Don't worry, I am not working on packaging this. I just want to know 
what the problems are and how these can be solved, so that we can update 
the wiki.


Regards,

Stephan.


Re: Solr - complete setup (update)

2019-01-26 Thread Hendrik Boom
On Sat, Jan 26, 2019 at 01:44:16PM +0100, Stephan Bosch wrote:
> Hi Joan,
> 
> Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:
> > 
> > Hi Stephan,
> > 
> > What's up with that ?
> > 
> > Thank you so much
> > 
> > On 2019-01-05 02:04, Stephan Bosch wrote:
> > 
> > > Hi,
> > > 
> > > Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:
> > > > 
...
...
> > > > 
> > > > -> The systemd unit shall specify high ulimit for files and proc
> > > > (see below)
> 
> Debian does something weird here. It doesn't use an explicit systemd unit.
> It is generated from the SysV init file. I ended up setting the ulimits in
> /etc/security/limits.conf for user solr.

Please make sure the changes you make don't make your Debian package 
*require* systemd.  There are Debian-derived distros that avoid systemd.

-- hendrik


Re: Solr - complete setup (update)

2019-01-26 Thread Stephan Bosch

Hi Joan,

Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:


Hi Stephan,

What's up with that ?

Thank you so much

On 2019-01-05 02:04, Stephan Bosch wrote:


Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:


Hi

This is the summary of my work with SOLR-Dovecot, in my *quest to 
reproduce the previoulsy excellent work of fts_squat*



@Aki : Based on the time I have spent on this, I would love to see 
you updating the Wiki with those improvements, and adding my name 
somewhere


@All : Hope it helps








*- Installation:*

-> Create a clean install using the default, (at least in the 
Archlinux package), and do a "sudo -u solr solr create -c dovecot ". 
The config files are then in /opt/solr/server/solr/dovecot/conf and 
datafiles in /opt/solr/server/solr/dovecot/data


On my system (Debian) these directories are wildly different (e.g. data 
is under /var), but other than that, this information is OK.


Used this as a side-reference for Debian installation: 
https://tecadmin.net/install-apache-solr-on-debian/


Accessed http://solr-host.tld:8983/solr/ to check whether all is OK.



-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

 * around line 313, change false to 
true


 * around line 147, set 
2000 (or above)


 * around line 696 : uncomment hdr

 * around line 1127, before class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add 



 * around line 1161, delete the whole class="solr.AddSchemaFieldsUpdateProcessorFactory" 
name="add-schema-fields">


    * around line 1192, remove the whole 
... />


Applied these changes. We should probably provide an example config file 
on the Wiki that incorporates all this.. or maybe a diff.


We also need to evaluate what the merit of all of this is. I did 
something similar in my previous effort, but it was all based on getting 
an error from Solr and then removing that section of the config file 
with the assumption it wasn't needed. So far, I have little clue what 
these things are and why these things are enabled by default. As I said 
in an earlier mail, there is an option to leave some of this cruft out 
at backend initialization, but I haven't tried that yet.




-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat 
behavior  (equivalent to " fts_squat = partial=3 full=25" in 
dovecot.conf) (note : such a huge trouble to replace a single line 
setup, anyway...)


Did that too.



-> Move /opt/solr/server/solr (or the subfolder data) to a partition 
with *space*, ideally ext4 or faster file system (it looks like Solr 
is not considering using a simple mysql database, which would make 
sense to avoid all the fuzz and let it transit to a non-java state, 
but that is another story)


Skipped that.


-> Config of dovecot.conf is as below


I also enabled debug for fts_solr.



-> The systemd unit shall specify high ulimit for files and proc 
(see below)


Debian does something weird here. It doesn't use an explicit systemd 
unit. It is generated from the SysV init file. I ended up setting the 
ulimits in /etc/security/limits.conf for user solr.




-> Increase the memory available for the JavaVM (I put 12Gb as I 
have quite a space on my server, but you may adapt it as per your 
specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"


Skipped that.



-> As Solr is complaining a lot, you may consider a filter for it in 
your syslog-ng or journald as it pollutes greatly your audit files


What does it complain about and when does it happen? I haven't seen much 
logging from Solr so far.




-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes


Weirdly, rescan returns immediately here. When I perform `doveadm index 
INBOX` for my test user, I do see a lot of fts and HTTP activity.



*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is 
improperly calculated ("huge header" warning for a simple email, 
which kilss the index of that considered email, so basically MOST 
emails as the calculation is wrong)


-> The UID returned by SOlr is to be considered as a STRING (and 
that is maybe the source of problem of the "out of bound" errors in 
fts_solr dovecot, as "long" is not enough)


-> Java errors : A lot of non sense for me, I am not expert in Java. 
But, with increased memory, it seems not crashing, even if 
complaining quite a lot in the logs


Can you elaborate on the errors you have seen so far? When do these 
happen? How can I reproduce them?


Regards,

Stephan.





*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id
autoGeneratePhraseQueries="true" positionIncrementGap="100">



catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" 
generateWordParts="1" splitOnNumerics="1" catenateAll="1" 
catenateWords="1" preserveOriginal="1"/>














Re: Solr -> Xapian ?

2019-01-22 Thread Joan Moreau via dovecot
 
greatest value), or the gratest value (which may not be the latest) (the code of 
existing plugins is unclear about this, Solr looks for the greatest for insance)


All the mails are always supposed to be indexed from the beginning to the last 
indexed mail. If there's a gap, indexer first indexes all the missing mails. So 
the latest UID is supposed to be the greatest UID. (Supporting out-of-order 
indexing would be rather difficult to keep track of.)


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the 
link with "build_more" ?


The idea is that it calls something like:

- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...


Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, 
to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ?


lookup() gets struct mail_search_arg *args, which contains the entire IMAP 
SEARCH query. This could be used for more or less complex query builders.

In case of a single header search, you should have args->args->hdr_field_name contain 
the header name and args->args->value.str contain the content you're searching for.


Q4 : Refresh : this is very unclear. How come there would not be the "latest" 
view on index. What is the real meaning of this function ?


In case of Xapian it might not matter if it automatically refreshes its indexes 
between each query. But with some other indexes this could happen:

- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.


Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ?


It's run when "doveadm fts rescan" is run manually. Usually that's only run 
manually to fix up some brokenness. So it's intended to verify that the current mailbox 
contents match the FTS indexes:
- If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? 
and finally , for fts_backend__lookup_multi, why is that backend dependent ?


This function is called only when searching in virtual folders. So for
example the virtual "All mails" folder, which would contain all mails in
all folders. In that case the boxes[] would contain a list of user's all
folders, except Trash and Spam. If lookup_multi() isn't implemented
(left to NULL), the search is run separately via lookup() for each
folder. With lookup_multi() there can be just one lookup, and the
backend can filter only the wanted folders and return them directly. So
it's an optimization for FTS indexes that support user-global searches
rather than only per-folder searches.


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct 
mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags 
flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx;

int i=0;

while(boxes[i]!=NULL)
{
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
 return -1;
i++;
}
return 0;
}


See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it
basically does this.


For "rescan " and "optimize", wouldn't it be the dovecot core who indicate 
which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would 
the backend be aware of the transactions on the mailbox ???


rescan() is about fixing up a more or less broken index, or simply to
verify that it's all ok. So core doesn't know what messages exist in the
FTS index and can't request specific reindexing or expunging. I guess an
alternative API could have been to have functions that iterate through
all mails in the index, and use that to implement rescan in core. Now
thinking about it, that sounds like a simpler and better way.

optimize() is currently done only when explicitly running "doveadm fts
optimize", which requests running a slower index optimization. 

Re: Solr - complete setup (update)

2019-01-18 Thread Joan Moreau via dovecot

Yes, the " -property update.autoCreateFields -value false " seems
interesting 

However, we smash the created schema just after 


On 2019-01-14 23:25, Stephan Bosch wrote:

Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot: 


Hi Stephan,

What's up with that ?

Thank you so much


Working on it, somewhat anyway.

BTW, did you see this ? :

"""
$ sudo -u solr /opt/solr/bin/solr create -c dovecot
WARNING: Using _default configset with data driven schema functionality. NOT 
RECOMMENDED for production use.
To turn off: bin/solr config -c dovecot -p 8983 -action set-user-property 
-property update.autoCreateFields -value false
INFO  - 2019-01-14 23:19:56.831; 
org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL 
Credential Provider chain: env;sysprop

Created new core 'dovecot'
"""

I'll be trying your steps first, but the mentioned command might at least get 
rid of some of the cruft in the default config file.

Regards,

Stephan.

On 2019-01-05 02:04, Stephan Bosch wrote:

Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot: 
Hi


This is the summary of my work with SOLR-Dovecot, in my *quest to reproduce the 
previoulsy excellent work of fts_squat*

@Aki : Based on the time I have spent on this, I would love to see you updating 
the Wiki with those improvements, and adding my name somewhere

@All : Hope it helps

I'll be going through the description below soon. I've recently independently 
installed fts-solr from scratch. Although this wasn't a flawless effort, I 
managed to get some basic indexing going. From this mail thread I understand 
that there are quite a few more problems than I've seen myself so far. Then 
again, I didn't perform extensive tests with actual searches.

Maybe we can turn all this into a test suite that we can run internally here at 
Dovecot. At the very least, the described Dovecot bugs need to be addressed and 
the wiki needs to be updated.

I'll get back to you.

Regards,

Stephan.

*- Installation:*

-> Create a clean install using the default, (at least in the Archlinux package), and do 
a "sudo -u solr solr create -c dovecot ". The config files are then in 
/opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data

-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

* around line 313, change false to 
true

* around line 147, set 2000 (or above)

* around line 696 : uncomment hdr

* around line 1127, before , 
add 

* around line 1161, delete the whole 

* around line 1192, remove the whole 

-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat behavior  (equivalent to 
" fts_squat = partial=3 full=25" in dovecot.conf) (note : such a huge trouble to replace a 
single line setup, anyway...)

-> Move /opt/solr/server/solr (or the subfolder data) to a partition with 
*space*, ideally ext4 or faster file system (it looks like Solr is not considering 
using a simple mysql database, which would make sense to avoid all the fuzz and 
let it transit to a non-java state, but that is another story)

-> Config of dovecot.conf is as below

-> The systemd unit shall specify high ulimit for files and proc (see below)

-> Increase the memory available for the JavaVM (I put 12Gb as I have quite a space on my 
server, but you may adapt it as per your specs) : in /opt/solr/bin/solr.in.sh, set 
SOLR_HEAP="12288m"

-> As Solr is complaining a lot, you may consider a filter for it in your 
syslog-ng or journald as it pollutes greatly your audit files

-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes

*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong)

-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)

-> Java errors : A lot of non sense for me, I am not expert in Java. But, with 
increased memory, it seems not crashing, even if complaining quite a lot in the 
logs

*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id















































*-- DOVECOT.CONF*

mail_plugins = fts fts_solr

plugin {
plugin = fts fts_solr managesieve sieve

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/

(replace 127.0.0.1 by your solr server if you want to use an external server)
(...)

}

*-- /etc/systemd/system/multi-user.target.wants/solr.service*

[Unit]
Description=Solr full text search engine
After=network.target

[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
*LimitNOFILE=65000*
*LimitNPROC=65000*

Re: Solr - complete setup (update)

2019-01-14 Thread Stephan Bosch




Op 14/01/2019 om 07:44 schreef Joan Moreau via dovecot:


Hi Stephan,

What's up with that ?

Thank you so much



Working on it, somewhat anyway.

BTW, did you see this ? :

"""
$ sudo -u solr /opt/solr/bin/solr create -c dovecot
WARNING: Using _default configset with data driven schema functionality. 
NOT RECOMMENDED for production use.
 To turn off: bin/solr config -c dovecot -p 8983 -action 
set-user-property -property update.autoCreateFields -value false
INFO  - 2019-01-14 23:19:56.831; 
org.apache.solr.util.configuration.SSLCredentialProviderFactory; 
Processing SSL Credential Provider chain: env;sysprop


Created new core 'dovecot'
"""

I'll be trying your steps first, but the mentioned command might at 
least get rid of some of the cruft in the default config file.


Regards,

Stephan.



On 2019-01-05 02:04, Stephan Bosch wrote:


Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot:


Hi

This is the summary of my work with SOLR-Dovecot, in my *quest to 
reproduce the previoulsy excellent work of fts_squat*



@Aki : Based on the time I have spent on this, I would love to see 
you updating the Wiki with those improvements, and adding my name 
somewhere


@All : Hope it helps

I'll be going through the description below soon. I've recently 
independently installed fts-solr from scratch. Although this wasn't a 
flawless effort, I managed to get some basic indexing going. From 
this mail thread I understand that there are quite a few more 
problems than I've seen myself so far. Then again, I didn't perform 
extensive tests with actual searches.


Maybe we can turn all this into a test suite that we can run 
internally here at Dovecot. At the very least, the described Dovecot 
bugs need to be addressed and the wiki needs to be updated.


I'll get back to you.


Regards,

Stephan.





*- Installation:*

-> Create a clean install using the default, (at least in the 
Archlinux package), and do a "sudo -u solr solr create -c dovecot ". 
The config files are then in /opt/solr/server/solr/dovecot/conf and 
datafiles in /opt/solr/server/solr/dovecot/data


-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

 * around line 313, change false to 
true


 * around line 147, set 
2000 (or above)


 * around line 696 : uncomment hdr

 * around line 1127, before class="solr.UUIDUpdateProcessorFactory" name="uuid"/>, add 



 * around line 1161, delete the whole class="solr.AddSchemaFieldsUpdateProcessorFactory" 
name="add-schema-fields">


    * around line 1192, remove the whole 
... />


-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat 
behavior  (equivalent to " fts_squat = partial=3 full=25" in 
dovecot.conf) (note : such a huge trouble to replace a single line 
setup, anyway...)


-> Move /opt/solr/server/solr (or the subfolder data) to a partition 
with *space*, ideally ext4 or faster file system (it looks like Solr 
is not considering using a simple mysql database, which would make 
sense to avoid all the fuzz and let it transit to a non-java state, 
but that is another story)


-> Config of dovecot.conf is as below

-> The systemd unit shall specify high ulimit for files and proc 
(see below)


-> Increase the memory available for the JavaVM (I put 12Gb as I 
have quite a space on my server, but you may adapt it as per your 
specs) : in /opt/solr/bin/solr.in.sh, set SOLR_HEAP="12288m"


-> As Solr is complaining a lot, you may consider a filter for it in 
your syslog-ng or journald as it pollutes greatly your audit files


-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes


*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is 
improperly calculated ("huge header" warning for a simple email, 
which kilss the index of that considered email, so basically MOST 
emails as the calculation is wrong)


-> The UID returned by SOlr is to be considered as a STRING (and 
that is maybe the source of problem of the "out of bound" errors in 
fts_solr dovecot, as "long" is not enough)


-> Java errors : A lot of non sense for me, I am not expert in Java. 
But, with increased memory, it seems not crashing, even if 
complaining quite a lot in the logs





*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id
autoGeneratePhraseQueries="true" positionIncrementGap="100">



catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" 
generateWordParts="1" splitOnNumerics="1" catenateAll="1" 
catenateWords="1" preserveOriginal="1"/>













autoGeneratePhraseQueries="true">




















stored="true"/>




stored="true"/>



stored="true"/>
stored="true"/>




*-- DOVECOT.CONF*

mail_plugins = fts fts_solr

plugin {
plugin = fts fts_solr managesieve sieve

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = 

Re: Solr - complete setup (update)

2019-01-13 Thread Joan Moreau via dovecot
Hi Stephan, 

What's up with that ? 

Thank you so much 


On 2019-01-05 02:04, Stephan Bosch wrote:


Hi,

Op 04/01/2019 om 05:36 schreef Joan Moreau via dovecot: 


Hi

This is the summary of my work with SOLR-Dovecot, in my *quest to reproduce the 
previoulsy excellent work of fts_squat*

@Aki : Based on the time I have spent on this, I would love to see you updating 
the Wiki with those improvements, and adding my name somewhere

@All : Hope it helps

I'll be going through the description below soon. I've recently independently 
installed fts-solr from scratch. Although this wasn't a flawless effort, I 
managed to get some basic indexing going. From this mail thread I understand 
that there are quite a few more problems than I've seen myself so far. Then 
again, I didn't perform extensive tests with actual searches.

Maybe we can turn all this into a test suite that we can run internally here at 
Dovecot. At the very least, the described Dovecot bugs need to be addressed and 
the wiki needs to be updated.

I'll get back to you.

Regards,

Stephan.


*- Installation:*

-> Create a clean install using the default, (at least in the Archlinux package), and do 
a "sudo -u solr solr create -c dovecot ". The config files are then in 
/opt/solr/server/solr/dovecot/conf and datafiles in /opt/solr/server/solr/dovecot/data

-> In /opt/solr/server/solr/dovecot/conf/solrconfig.xml:

* around line 313, change false to 
true

* around line 147, set 2000 (or above)

* around line 696 : uncomment hdr

* around line 1127, before , 
add 

* around line 1161, delete the whole 

* around line 1192, remove the whole 

-> Remove /opt/solr/server/solr/dovecot/conf/managed-schema

-> Change "schema.xml" by the one below to reproduce fts_squat behavior  (equivalent to 
" fts_squat = partial=3 full=25" in dovecot.conf) (note : such a huge trouble to replace a 
single line setup, anyway...)

-> Move /opt/solr/server/solr (or the subfolder data) to a partition with 
*space*, ideally ext4 or faster file system (it looks like Solr is not considering 
using a simple mysql database, which would make sense to avoid all the fuzz and 
let it transit to a non-java state, but that is another story)

-> Config of dovecot.conf is as below

-> The systemd unit shall specify high ulimit for files and proc (see below)

-> Increase the memory available for the JavaVM (I put 12Gb as I have quite a space on my 
server, but you may adapt it as per your specs) : in /opt/solr/bin/solr.in.sh, set 
SOLR_HEAP="12288m"

-> As Solr is complaining a lot, you may consider a filter for it in your 
syslog-ng or journald as it pollutes greatly your audit files

-> (re)Start solr (first) and dovecot by systemctl

-> Launch redindex ( doveadm fts rescan -u  )

-> wait for a big while to let the system re-index all your mail boxes

*- Bugs so far*

-> Line 620 of fts_solr dovecot plugin : the size oof header is improperly calculated 
("huge header" warning for a simple email, which kilss the index of that 
considered email, so basically MOST emails as the calculation is wrong)

-> The UID returned by SOlr is to be considered as a STRING (and that is maybe the source of 
problem of the "out of bound" errors in fts_solr dovecot, as "long" is not enough)

-> Java errors : A lot of non sense for me, I am not expert in Java. But, with 
increased memory, it seems not crashing, even if complaining quite a lot in the 
logs

*---SCHEMA.XML in /opt/solr/server/solr/dovecot/conf*



id















































*-- DOVECOT.CONF*

mail_plugins = fts fts_solr

plugin {
plugin = fts fts_solr managesieve sieve

fts = solr
fts_autoindex = yes
fts_enforced = yes
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/

(replace 127.0.0.1 by your solr server if you want to use an external server)
(...)

}

*-- /etc/systemd/system/multi-user.target.wants/solr.service*

[Unit]
Description=Solr full text search engine
After=network.target

[Service]
Type=simple
User=solr
Group=solr
PrivateTmp=yes
WorkingDirectory=/opt/solr
*LimitNOFILE=65000*
*LimitNPROC=65000*
ExecStart=/opt/solr/bin/solr start -f

[Install]
WantedBy=multi-user.target

Re: Solr -> Xapian ?

2019-01-13 Thread Timo Sirainen
On 13 Jan 2019, at 10.45, Joan Moreau via dovecot  wrote:
> 
> Now, I can see in the logs that several times, the dovecot calls the 
> fts_backend_xapian_update_set_mailbox with box == NULL. WHy so ?
> 
fts-api.h says:

/* Switch to updating the specified mailbox. box may also be set to NULL to
   make sure the previous mailbox won't tried to be accessed anymore. */
void fts_backend_update_set_mailbox(struct fts_backend_update_context *ctx,
struct mailbox *box);

So it's just telling you that you can close/free any stuff related to that 
mailbox.
>> additionally, my logic is that the backend stores one databalse per mailox 
>> in /xapian-indexes (in the "root" dir of the user), the name od the database 
>> is the GUID of the mailbox
>> 
>> For INBOX, that works perfectly, and database is properly createdm and 
>> backed starts indexing all emails
>> 
>> For other folder, somehow, the process can not access that (root) folder.
>> 
>> Am I missing something ?
>> 

This is a bit ambiguous, because some people mean mailbox=folder and others 
mean mailbox=user account, and GUID can also be the internal Dovecot folder 
GUID, or a GUID of the user.

I'd recommend using a single database per user anyway.



Re: Solr -> Xapian ?

2019-01-13 Thread Joan Moreau via dovecot
because fts_squat is set to be deleted 

Xapian and similar libraries offers a very easy interface for FTS 

(and basically, I have done it already) 


On 2019-01-07 18:31, Michael Slusarz wrote:

Maybe a dumb question (I admit I haven't followed this thread very closely)... 

But why are you writing a new FTS driver?  If squat allegedly does everything you need it to do, why don't you just take that plugin and fix it up to do what you need?  That seems way easier than trying to create a FTS driver from scratch. 

michael 

On January 7, 2019 at 7:05 AM Joan Moreau via dovecot  wrote: 

Hi 

ANyone to answer specifically ? 

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) 

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? 

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? 

Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? 

Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? 


Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ?

Re: Solr -> Xapian ?

2019-01-13 Thread Joan Moreau via dovecot

I found the solution o this using
SEQ_RANGE_ARRAY_ADD(>DEFINITE_UIDS, UID); 


Now, I can see in the logs that several times, the dovecot calls the
fts_backend_xapian_update_set_mailbox with box == NULL. WHy so ? 

THank you 


On 2019-01-12 21:40, Joan Moreau via dovecot wrote:

I somehow fixed the folder issue. (seems some unix rights after too many tests) 

Getting back on the "fts_results" structure: 

I am trying: 


I_ARRAY_INIT(&(RESULT->DEFINITE_UIDS),R->SIZE);
I_ARRAY_INIT(&(RESULT->MAYBE_UIDS),0); 


uint32_t uid;
for(i=0;isize;i++)
{
try
{
uid=atol(backend->dbr->get_document(r->data[i]).get_value(1).c_str());
i_warning("Rresult UID=%d",uid);
ARRAY_IDX_SET(&(RESULT->DEFINITE_UIDS),I,);
}
catch(Xapian::Error e)
{
i_warning(e.get_msg().c_str());
}
} 

I can see in hte log that UID are properly found on Xapian database, but no results are transmitted to dovecot and to the imap client (roundcube in my case) 

Help please :) 

On 2019-01-12 18:15, Joan Moreau wrote: 

additionally, my logic is that the backend stores one databalse per mailox in /xapian-indexes (in the "root" dir of the user), the name od the database is the GUID of the mailbox 

For INBOX, that works perfectly, and database is properly createdm and backed starts indexing all emails 

For other folder, somehow, the process can not access that (root) folder. 

Am I missing something ? 

On 2019-01-12 17:37, Joan Moreau wrote: 

THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 

How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? 

On 2019-01-12 10:25, Timo Sirainen wrote: 
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 
The below patch resolves the compilation error


$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

1 - WHat does represent "subargs" in mail_search_args 
It's set only for SEARCH_OR and SEARCH_SUB. So for example:


SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 
The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ? 
I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.


You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Joan Moreau via dovecot

I somehow fixed the folder issue. (seems some unix rights after too many
tests) 

Getting back on the "fts_results" structure: 

I am trying: 


I_ARRAY_INIT(&(RESULT->DEFINITE_UIDS),R->SIZE);
I_ARRAY_INIT(&(RESULT->MAYBE_UIDS),0); 


uint32_t uid;
for(i=0;isize;i++)
{
  try
  {

uid=atol(backend->dbr->get_document(r->data[i]).get_value(1).c_str());

 i_warning("Rresult UID=%d",uid);
 ARRAY_IDX_SET(&(RESULT->DEFINITE_UIDS),I,);
  }
  catch(Xapian::Error e)
  {
 i_warning(e.get_msg().c_str());
  }
} 


I can see in hte log that UID are properly found on Xapian database, but
no results are transmitted to dovecot and to the imap client (roundcube
in my case) 

Help please :) 


On 2019-01-12 18:15, Joan Moreau wrote:

additionally, my logic is that the backend stores one databalse per mailox in /xapian-indexes (in the "root" dir of the user), the name od the database is the GUID of the mailbox 

For INBOX, that works perfectly, and database is properly createdm and backed starts indexing all emails 

For other folder, somehow, the process can not access that (root) folder. 

Am I missing something ? 

On 2019-01-12 17:37, Joan Moreau wrote: 

THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 

How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? 

On 2019-01-12 10:25, Timo Sirainen wrote: 
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 
The below patch resolves the compilation error


$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

1 - WHat does represent "subargs" in mail_search_args 
It's set only for SEARCH_OR and SEARCH_SUB. So for example:


SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 
The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ? 
I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.


You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Joan Moreau via dovecot

additionally, my logic is that the backend stores one databalse per
mailox in /xapian-indexes (in the "root" dir of the user), the name od
the database is the GUID of the mailbox 


For INBOX, that works perfectly, and database is properly createdm and
backed starts indexing all emails 


For other folder, somehow, the process can not access that (root)
folder. 

Am I missing something ? 


On 2019-01-12 17:37, Joan Moreau wrote:

THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 

How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? 

On 2019-01-12 10:25, Timo Sirainen wrote: 
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 
The below patch resolves the compilation error


$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

1 - WHat does represent "subargs" in mail_search_args 
It's set only for SEARCH_OR and SEARCH_SUB. So for example:


SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 
The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ? 
I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.


You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Joan Moreau via dovecot
THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 


How to put my UIDs into this "definite_uids" ? Obviously this is not a
simple array/pointer. How to say someting similar to
result->definite_uids[1]=my_uid ? 


On 2019-01-12 10:25, Timo Sirainen wrote:

On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 


The below patch resolves the compilation error

$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif


You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.


1 - WHat does represent "subargs" in mail_search_args


It's set only for SEARCH_OR and SEARCH_SUB. So for example:

SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.


2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large)


The next indexing run is responsible for it. If you return get_last_uid=0, then 
indexer starts feeding you all mails. So fts backend doesn't have to know about 
it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ?


I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 
have been indexed by the FTS backend. It's possible that at this point there 
are already mails with UIDs 101..200 in the folder. So when UID=201 is 
delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so 
far, and starts feeding it UIDs 101..201 in that order.

You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Timo Sirainen
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote:
> 
> The below patch resolves the compilation error
> 
> $ diff -p compat.h compat.h.joan 
> *** compat.h 2019-01-11 20:21:00.726625427 +0100
> --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
> *** struct iovec;
> *** 202,207 
> --- 202,211 
> ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
> #endif
> 
> + #ifdef __cplusplus
> + extern "C" {
> + #endif
> 

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

> 1 - WHat does represent "subargs" in mail_search_args

It's set only for SEARCH_OR and SEARCH_SUB. So for example:

SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
  { type=SEARCH, value.str="foo" },
  { type=SEARCH, value.str="bar" },
  { type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
 
> 2 - for rescan : who is responsible for passing again the new email ? Is
> the Dovecot core sending again all the emails to index ? or the fts
> shall somehow access the mailbox and read all emails ? Wouldn't just be
> saying "delete all index and get_last_uid is now 0" the easy way ? or
> the fts must process all emails (and block the current thread as a
> mailbx maybe quite large)

The next indexing run is responsible for it. If you return get_last_uid=0, then 
indexer starts feeding you all mails. So fts backend doesn't have to know about 
it.

> 3 - for get_last_uid : this uncertainity is very unclear. "If there is a
> gap, then indexer first indexes all the missing" -> this mean at a
> certain point, indexer maybe rebuilding a previous email, so *last* uid
> is something different than max. And how indexer does know whther there
> is a gap wihtout callong the fts backend (whch it does not as there are
> no function for that) ?

I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 
have been indexed by the FTS backend. It's possible that at this point there 
are already mails with UIDs 101..200 in the folder. So when UID=201 is 
delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so 
far, and starts feeding it UIDs 101..201 in that order.

You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.



Re: Solr -> Xapian ?

2019-01-11 Thread fauno
El 04/01/19 a las 03:20, Joan Moreau via dovecot escribió:
> What about consedering linking Dovecot with Xapian librairies instead of
> going to nightmare Solr ?
> https://xapian.org/features

given that notmuch already does a good job at indexing email (although
only supports maildirs afaik), wouldn't it be simpler to write a plugin
for running notmuch searches from dovecot?

https://notmuchmail.org/



Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot
../../../src/lib/compat.h:208:20: error: conflicting declaration of 
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' 
linkage 
# define pwrite i_my_pwrite 

Any help welcome 

Hi, 

I figured out the "namespace" issue 

Remaining questions are : 

1 - WHat does represent "subargs" in mail_search_args 

2 - for rescan : who is responsible for passing again the new email ? Is 
the Dovecot core sending again all the emails to index ? or the fts 
shall somehow access the mailbox and read all emails ? Wouldn't just be 
saying "delete all index and get_last_uid is now 0" the easy way ? or 
the fts must process all emails (and block the current thread as a 
mailbx maybe quite large) 

3 - for get_last_uid : this uncertainity is very unclear. "If there is a 
gap, then indexer first indexes all the missing" -> this mean at a 
certain point, indexer maybe rebuilding a previous email, so *last* uid 
is something different than max. And how indexer does know whther there 
is a gap wihtout callong the fts backend (whch it does not as there are 
no function for that) ? 

4 - How to update configure.ac & additional files to add the 
"--with-xapian" wichi will test for libxapian presence and add it to the 
build ? 

Thank you 

On 2019-01-08 04:24, Timo Sirainen wrote: 

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org> 
wrote: 
Hi 

ANyone to answer specifically ? 

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
greatest value), or the gratest value (which may not be the latest) (the 
code of existing plugins is unclear about this, Solr looks for the 
greatest for insance) 
All the mails are always supposed to be indexed from the beginning to 
the last indexed mail. If there's a gap, indexer first indexes all the 
missing mails. So the latest UID is supposed to be the greatest UID. 
(Supporting out-of-order indexing would be rather difficult to keep 
track of.) 

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why 
so ? What is the link with "build_more" ? 
The idea is that it calls something like: 

- build_key(type=hdr, hdr_name=From) 
- build_more(" t...@iki.fi") 
- build_key(type=hdr, hdr_name=Subject) 
- build_more("Re: Solr -> Xapian ?") 
- build_key(type=body_part) 
- build_more("message body piece") 
- build_more("message body piece2") 
... 

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a 
least among "cc, to, from, subject, body") is not appearing in the 
'struct' data. WHere to find it ? 
lookup() gets struct mail_search_arg *args, which contains the entire 
IMAP SEARCH query. This could be used for more or less complex query 
builders. 

In case of a single header search, you should have 
args->args->hdr_field_name contain the header name and 
args->args->value.str contain the content you're searching for. 

Q4 : Refresh : this is very unclear. How come there would not be the 
"latest" view on index. What is the real meaning of this function ? 
In case of Xapian it might not matter if it automatically refreshes its 
indexes between each query. But with some other indexes this could 
happen: 

- IMAP session is opened 
- IMAP SEARCH is run, which opens and searches the index 
- a new mail is delivered to the mailbox and indexed 
- IMAP SEARCH is run. Without refresh() it doesn't see the newly 
indexed mail and doesn't include it in the search results. 

Q5 : Rescan : is it just a bout remonving all indexes for a specific 
mailbox ? 
It's run when "doveadm fts rescan" is run manually. Usually that's only 
run manually to fix up some brokenness. So it's intended to verify that 
the current mailbox contents match the FTS indexes: 
- If there are any mails in FTS index that no longer exist in the 
actual mailbox, delete those mails from FTS 
- If FTS is missing any mails in the middle of the mailbox, make sure 
that the next mailbox indexing will index those missing mails. I think 
currently this basically means reindexing all the mails since the first 
missing mail, even the mails that are already in the index. 

fts-lucene implements this, but other FTS backends are lazy and simply 
rebuild all mails. Actually fts-solr is bad because it doesn't even 
delete the extra mails. 

Q6 : lokkup_multi : isn't the function the same for all plugnins (see 
below) ?and finally , for fts_backend__lookup_multi, why is that 
backend dependent ? 
This function is called only when searching in virtual folders. So for 
example the virtual "All mails" folder, which would contain all mails in 
all folders. In that case the boxes[] would contain a list of user's all 
folders, except Trash and Spam. If lookup_multi() isn't implemented 
(left to NULL), the search is run separately via lookup() for each 
folder. With lookup_multi() there can be jus

Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot

There is no point into a separate plugin, the purpose is to replace
squat as the default fts (solr being a nightmare) 


On 2019-01-11 18:23, Aki Tuomi wrote:

I would recommend making this a standalone plugin for now instead of trying to keep it in core fts.  

Aki 

On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot@dovecot.org> wrote: 

I managed to deal with the namespace issue (updated makefile.am) 

However, I reach : 

../../../src/lib/compat.h:207:19: error: conflicting declaration of 
'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage 
# define pread i_my_pread 
^~ 
../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' 
linkage 
ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); 
^~ 
../../../src/lib/compat.h:208:20: error: conflicting declaration of 
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' 
linkage 
# define pwrite i_my_pwrite 

Any help welcome 

Hi, 

I figured out the "namespace" issue 

Remaining questions are : 

1 - WHat does represent "subargs" in mail_search_args 

2 - for rescan : who is responsible for passing again the new email ? Is 
the Dovecot core sending again all the emails to index ? or the fts 
shall somehow access the mailbox and read all emails ? Wouldn't just be 
saying "delete all index and get_last_uid is now 0" the easy way ? or 
the fts must process all emails (and block the current thread as a 
mailbx maybe quite large) 

3 - for get_last_uid : this uncertainity is very unclear. "If there is a 
gap, then indexer first indexes all the missing" -> this mean at a 
certain point, indexer maybe rebuilding a previous email, so *last* uid 
is something different than max. And how indexer does know whther there 
is a gap wihtout callong the fts backend (whch it does not as there are 
no function for that) ? 

4 - How to update configure.ac & additional files to add the 
"--with-xapian" wichi will test for libxapian presence and add it to the 
build ? 

Thank you 

On 2019-01-08 04:24, Timo Sirainen wrote: 

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org> 
wrote: 
Hi 

ANyone to answer specifically ? 

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
greatest value), or the gratest value (which may not be the latest) (the 
code of existing plugins is unclear about this, Solr looks for the 
greatest for insance) 
All the mails are always supposed to be indexed from the beginning to 
the last indexed mail. If there's a gap, indexer first indexes all the 
missing mails. So the latest UID is supposed to be the greatest UID. 
(Supporting out-of-order indexing would be rather difficult to keep 
track of.) 

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why 
so ? What is the link with "build_more" ? 
The idea is that it calls something like: 

- build_key(type=hdr, hdr_name=From) 
- build_more(" t...@iki.fi") 
- build_key(type=hdr, hdr_name=Subject) 
- build_more("Re: Solr -> Xapian ?") 
- build_key(type=body_part) 
- build_more("message body piece") 
- build_more("message body piece2") 
... 

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a 
least among "cc, to, from, subject, body") is not appearing in the 
'struct' data. WHere to find it ? 
lookup() gets struct mail_search_arg *args, which contains the entire 
IMAP SEARCH query. This could be used for more or less complex query 
builders. 

In case of a single header search, you should have 
args->args->hdr_field_name contain the header name and 
args->args->value.str contain the content you're searching for. 

Q4 : Refresh : this is very unclear. How come there would not be the 
"latest" view on index. What is the real meaning of this function ? 
In case of Xapian it might not matter if it automatically refreshes its 
indexes between each query. But with some other indexes this could 
happen: 

- IMAP session is opened 
- IMAP SEARCH is run, which opens and searches the index 
- a new mail is delivered to the mailbox and indexed 
- IMAP SEARCH is run. Without refresh() it doesn't see the newly 
indexed mail and doesn't include it in the search results. 

Q5 : Rescan : is it just a bout remonving all indexes for a specific 
mailbox ? 
It's run when "doveadm fts rescan" is run manually. Usually that's only 
run manually to fix up some brokenness. So it's intended to verify that 
the current mailbox contents match the FTS indexes: 
- If there are any mails in FTS index that no longer exist in the 
actual mailbox, delete those mails from FTS 
- If FTS is missing any mails in the middle of the mailbox, make sure 
that the next mailbox indexing will index those missing mails. I think 
currently this basically means reindexing all the mails since the first 
missing 

Re: Solr -> Xapian ?

2019-01-11 Thread Aki Tuomi


 
 
  
   I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. 
  
  
   
  
  
   Aki
  
  
   
On 11 January 2019 at 18:40 Joan Moreau via dovecot <
dovecot@dovecot.org> wrote:
   
   

   
   

   
   
I managed to deal with the namespace issue (updated makefile.am)
   
   

   
   
However, I reach :
   
   

   
   
../../../src/lib/compat.h:207:19: error: conflicting declaration of
   
   
'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage
   
   
# define pread i_my_pread
   
   
^~
   
   
../../../src/lib/compat.h:210:9: note: previous declaration with 'C++'
   
   
linkage
   
   
ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset);
   
   
^~
   
   
../../../src/lib/compat.h:208:20: error: conflicting declaration of
   
   
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C'
   
   
linkage
   
   
# define pwrite i_my_pwrite
   
   

   
   
Any help welcome
   
   

   
   
Hi,
   
   

   
   
I figured out the "namespace" issue
   
   

   
   
Remaining questions are :
   
   

   
   
1 - WHat does represent "subargs" in mail_search_args
   
   

   
   
2 - for rescan : who is responsible for passing again the new email ? Is
   
   
the Dovecot core sending again all the emails to index ? or the fts
   
   
shall somehow access the mailbox and read all emails ? Wouldn't just be
   
   
saying "delete all index and get_last_uid is now 0" the easy way ? or
   
   
the fts must process all emails (and block the current thread as a
   
   
mailbx maybe quite large)
   
   

   
   
3 - for get_last_uid : this uncertainity is very unclear. "If there is a
   
   
gap, then indexer first indexes all the missing" -> this mean at a
   
   
certain point, indexer maybe rebuilding a previous email, so *last* uid
   
   
is something different than max. And how indexer does know whther there
   
   
is a gap wihtout callong the fts backend (whch it does not as there are
   
   
no function for that) ?
   
   

   
   
4 - How to update configure.ac & additional files to add the
   
   
"--with-xapian" wichi will test for libxapian presence and add it to the
   
   
build ?
   
   

   
   
Thank you
   
   

   
   
On 2019-01-08 04:24, Timo Sirainen wrote:
   
   

   
   
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot <
dovecot@dovecot.org>
   
   
wrote:
   
   
Hi
   
   

   
   
ANyone to answer specifically ?
   
   

   
   
Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
   
   
greatest value), or the gratest value (which may not be the latest) (the
   
   
code of existing plugins is unclear about this, Solr looks for the
   
   
greatest for insance)
   
   
All the mails are always supposed to be indexed from the beginning to
   
   
the last indexed mail. If there's a gap, indexer first indexes all the
   
   
missing mails. So the latest UID is supposed to be the greatest UID.
   
   
(Supporting out-of-order indexing would be rather difficult to keep
   
   
track of.)
   
   

   
   
Q2 : WHen Indexing an email, the data is not passed by "build_key". Why
   
   
so ? What is the link with "build_more" ?
   
   
The idea is that it calls something like:
   
   

   
   
- build_key(type=hdr, hdr_name=From)
   
   
- build_more("
    t...@iki.fi")
   
   
- build_key(type=hdr, hdr_name=Subject)
   
   
- build_more("Re: Solr -> Xapian ?")
   
   
- build_key(type=body_part)
   
   
- build_more("message body piece")
   
   
- build_more("message body piece2")
   
   
...
   
   

   
   
Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
   
   
least among "cc, to, from, subject, body") is not appearing in the
   
   
'struct' data. WHere to find it ?
   
   
lookup() gets struct mail_search_arg *args, which contains the entire
   
   
IMAP SEARCH query. This could be used for more or less complex query
   
   
builders.
   
   

   
   
In case of a single header search, you should have
   
   
args->args->hdr_field_name contain the header name and
   
   
args->args->value.str contain the content you're searching for.
   
   

   
   
Q4 : Refresh : this is very unclear. How come there would not be the
   
   
"latest" view on index. What is the real meaning of this function ?
   
   
In case of Xapian it might not matter if it automatically refreshes its
   
   
indexes between

Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot

I managed to deal with the namespace issue (updated makefile.am)

However, I reach : 


../../../src/lib/compat.h:207:19: error: conflicting declaration of
'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage
# define pread i_my_pread
^~
../../../src/lib/compat.h:210:9: note: previous declaration with 'C++'
linkage
ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset);
^~
../../../src/lib/compat.h:208:20: error: conflicting declaration of
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C'
linkage
# define pwrite i_my_pwrite 

Any help welcome 

Hi, 

I figured out the "namespace" issue 

Remaining questions are : 

1 - WHat does represent "subargs" in mail_search_args 


2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ?

4 - How to update configure.ac & additional files to add the
"--with-xapian" wichi will test for libxapian presence and add it to the
build ? 

Thank you 


On 2019-01-08 04:24, Timo Sirainen wrote:

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot 
wrote:
Hi

ANyone to answer specifically ?

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
greatest value), or the gratest value (which may not be the latest) (the
code of existing plugins is unclear about this, Solr looks for the
greatest for insance)
All the mails are always supposed to be indexed from the beginning to
the last indexed mail. If there's a gap, indexer first indexes all the
missing mails. So the latest UID is supposed to be the greatest UID.
(Supporting out-of-order indexing would be rather difficult to keep
track of.)

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why
so ? What is the link with "build_more" ?
The idea is that it calls something like:

- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
least among "cc, to, from, subject, body") is not appearing in the
'struct' data. WHere to find it ?
lookup() gets struct mail_search_arg *args, which contains the entire
IMAP SEARCH query. This could be used for more or less complex query
builders.

In case of a single header search, you should have
args->args->hdr_field_name contain the header name and
args->args->value.str contain the content you're searching for.

Q4 : Refresh : this is very unclear. How come there would not be the
"latest" view on index. What is the real meaning of this function ?
In case of Xapian it might not matter if it automatically refreshes its
indexes between each query. But with some other indexes this could
happen:

- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly
indexed mail and doesn't include it in the search results.

Q5 : Rescan : is it just a bout remonving all indexes for a specific
mailbox ?
It's run when "doveadm fts rescan" is run manually. Usually that's only
run manually to fix up some brokenness. So it's intended to verify that
the current mailbox contents match the FTS indexes:
- If there are any mails in FTS index that no longer exist in the
actual mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure
that the next mailbox indexing will index those missing mails. I think
currently this basically means reindexing all the mails since the first
missing mail, even the mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply
rebuild all mails. Actually fts-solr is bad because it doesn't even
delete the extra mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see
below) ?and finally , for fts_backend__lookup_multi, why is that
backend dependent ?
This function is called only when searching in virtual folders. So for
example the

Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot
Also, 

1 - WHat does represent "subargs" in mail_search_args 


2 - I made my first code, and the error I get compiling within the
dovecot architecture is 


"In file included from fts-xapian-plugin.c:4:
fts-xapian-plugin.h:6:1: error: unknown type name 'using'; did you mean
'uint'?
using namespace std;" 


if I remove this, the Xapian library is also complaining about
"namespace" keyword 


In file included from /usr/include/xapian.h:47,
from fts-backend-xapian.c:11:
/usr/include/xapian/types.h:31:1: error: unknown type name 'namespace';
did you mean 'i_isspace'?
namespace Xapian { 

Someone can bring me some light ? 

Thanks 


On 2019-01-09 09:58, Joan Moreau via dovecot wrote:

Ok. 

Additional question : 

- for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) 


- for get_last_uid : this uncertainity is very unclear. "If there is a gap, then 
indexer first indexes all the missing" -> this mean at a certain point, indexer 
maybe rebuilding a previous email, so *last* uid is something different than max. And how 
indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as 
there are no function for that) ?

On 2019-01-08 04:24, Timo Sirainen wrote: 
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot  wrote: 
Hi


ANyone to answer specifically ?

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) 
All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.)


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? 
The idea is that it calls something like:


- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? 
lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders.


In case of a single header search, you should have args->args->hdr_field_name contain 
the header name and args->args->value.str contain the content you're searching for.

Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? 
In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen:


- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.

Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? 
It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes:

- If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? 
and finally , for fts_backend__lookup_multi, why is that backend dependent ?


This function is called only when searching in virtual folders. So for
example the virtual "All mails" folder, which would contain all mails in
all folders. In that case the boxes[] would contain a list of user's all
folders, except Tras

Re: Solr -> Xapian ?

2019-01-09 Thread Joan Moreau via dovecot
Ok. 

Additional question : 


- for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 


- for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ?

On 2019-01-08 04:24, Timo Sirainen wrote:

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot  wrote: 


Hi

ANyone to answer specifically ?

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
greatest value), or the gratest value (which may not be the latest) (the code of 
existing plugins is unclear about this, Solr looks for the greatest for insance)


All the mails are always supposed to be indexed from the beginning to the last 
indexed mail. If there's a gap, indexer first indexes all the missing mails. So 
the latest UID is supposed to be the greatest UID. (Supporting out-of-order 
indexing would be rather difficult to keep track of.)


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the 
link with "build_more" ?


The idea is that it calls something like:

- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...


Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, 
to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ?


lookup() gets struct mail_search_arg *args, which contains the entire IMAP 
SEARCH query. This could be used for more or less complex query builders.

In case of a single header search, you should have args->args->hdr_field_name contain 
the header name and args->args->value.str contain the content you're searching for.


Q4 : Refresh : this is very unclear. How come there would not be the "latest" 
view on index. What is the real meaning of this function ?


In case of Xapian it might not matter if it automatically refreshes its indexes 
between each query. But with some other indexes this could happen:

- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.


Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ?


It's run when "doveadm fts rescan" is run manually. Usually that's only run 
manually to fix up some brokenness. So it's intended to verify that the current mailbox 
contents match the FTS indexes:
- If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? 
and finally , for fts_backend__lookup_multi, why is that backend dependent ?


This function is called only when searching in virtual folders. So for
example the virtual "All mails" folder, which would contain all mails in
all folders. In that case the boxes[] would contain a list of user's all
folders, except Trash and Spam. If lookup_multi() isn't implemented
(left to NULL), the search is run separately via lookup() for each
folder. With lookup_multi() there can be just one lookup, and the
backend can filter only the wanted folders and return them directly. So
it's an optimization for FTS indexes that support user-global searches
rather than only per-folder searches.


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct 
mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags 
flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx;

int i=0;

while(boxes[i]!=NULL)
{
if(fts_backen

Re: Solr -> Xapian ?

2019-01-07 Thread Timo Sirainen
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot  wrote:
> 
> Hi
> 
> ANyone to answer specifically ?
> 
> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
> greatest value), or the gratest value (which may not be the latest) (the code 
> of existing plugins is unclear about this, Solr looks for the greatest for 
> insance)

All the mails are always supposed to be indexed from the beginning to the last 
indexed mail. If there's a gap, indexer first indexes all the missing mails. So 
the latest UID is supposed to be the greatest UID. (Supporting out-of-order 
indexing would be rather difficult to keep track of.)

> Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? 
> What is the link with "build_more" ?

The idea is that it calls something like:

 - build_key(type=hdr, hdr_name=From)
 - build_more("t...@iki.fi")
 - build_key(type=hdr, hdr_name=Subject)
 - build_more("Re: Solr -> Xapian ?")
 - build_key(type=body_part)
 - build_more("message body piece")
 - build_more("message body piece2")
 ...

> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least 
> among "cc, to, from, subject, body") is not appearing in the 'struct' data. 
> WHere to find it ?

lookup() gets struct mail_search_arg *args, which contains the entire IMAP 
SEARCH query. This could be used for more or less complex query builders.

In case of a single header search, you should have args->args->hdr_field_name 
contain the header name and args->args->value.str contain the content you're 
searching for.

> Q4 : Refresh : this is very unclear. How come there would not be the "latest" 
> view on index. What is the real meaning of this function ?

In case of Xapian it might not matter if it automatically refreshes its indexes 
between each query. But with some other indexes this could happen:

 - IMAP session is opened
 - IMAP SEARCH is run, which opens and searches the index
 - a new mail is delivered to the mailbox and indexed
 - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.

> Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ?

It's run when "doveadm fts rescan" is run manually. Usually that's only run 
manually to fix up some brokenness. So it's intended to verify that the current 
mailbox contents match the FTS indexes:
 - If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
 - If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

> Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ?
>> and finally , for fts_backend__lookup_multi, why is that backend 
>> dependent ?

This function is called only when searching in virtual folders. So for example 
the virtual "All mails" folder, which would contain all mails in all folders. 
In that case the boxes[] would contain a list of user's all folders, except 
Trash and Spam. If lookup_multi() isn't implemented (left to NULL), the search 
is run separately via lookup() for each folder. With lookup_multi() there can 
be just one lookup, and the backend can filter only the wanted folders and 
return them directly. So it's an optimization for FTS indexes that support 
user-global searches rather than only per-folder searches.

>> static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, 
>> struct mailbox *const boxes[], struct mail_search_arg *args, enum 
>> fts_lookup_flags flags, struct fts_multi_result *result)
>> {
>> struct xapian_fts_backend_update_context *ctx =
>> (struct xapian_fts_backend_update_context *)_ctx;
>> 
>> int i=0;
>> 
>> while(boxes[i]!=NULL)
>> {
>> if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
>>  return -1;
>> i++;
>> }
>> return 0;
>> }

See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it basically 
does this.

>> For "rescan " and "optimize", wouldn't it be the dovecot core who indicate 
>> which are to be dismissed (expunged), or re-ask for indexing a particular 
>> (or all) uid ? WHy would the backend be aware of the transactions on the 
>> mailbox ???

rescan() is about fixing up a more or less broken index, or simply to verify 
that it'

Re: Solr -> Xapian ?

2019-01-07 Thread Michael Slusarz
Maybe a dumb question (I admit I haven't followed this thread very closely)...

But why are you writing a new FTS driver?  If squat allegedly does everything 
you need it to do, why don't you just take that plugin and fix it up to do what 
you need?  That seems way easier than trying to create a FTS driver from 
scratch.

michael


> On January 7, 2019 at 7:05 AM Joan Moreau via dovecot  
> wrote:
> 
> 
> Hi
> 
> ANyone to answer specifically ?
> 
> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
> greatest value), or the gratest value (which may not be the latest) (the code 
> of existing plugins is unclear about this, Solr looks for the greatest for 
> insance)
> 
> Q2 : WHen Indexing an email, the data is not passed by "build_key". Why 
> so ? What is the link with "build_more" ?
> 
> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least 
> among "cc, to, from, subject, body") is not appearing in the 'struct' data. 
> WHere to find it ?
> 
> Q4 : Refresh : this is very unclear. How come there would not be the 
> "latest" view on index. What is the real meaning of this function ?
> 
> Q5 : Rescan : is it just a bout remonving all indexes for a specific 
> mailbox ?
> 
> Q6 : lokkup_multi : isn't the function the same for all plugnins (see 
> below) ?
> 


Re: Solr -> Xapian ?

2019-01-07 Thread Joan Moreau via dovecot
Hi 

ANyone to answer specifically ? 


Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
greatest value), or the gratest value (which may not be the latest) (the
code of existing plugins is unclear about this, Solr looks for the
greatest for insance) 


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why
so ? What is the link with "build_more" ? 


Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
least among "cc, to, from, subject, body") is not appearing in the
'struct' data. WHere to find it ? 


Q4 : Refresh : this is very unclear. How come there would not be the
"latest" view on index. What is the real meaning of this function ? 


Q5 : Rescan : is it just a bout remonving all indexes for a specific
mailbox ? 


Q6 : lokkup_multi : isn't the function the same for all plugnins (see
below) ? 

THank you 


On 2019-01-06 16:50, Joan Moreau via dovecot wrote:

and finally , for fts_backend__lookup_multi, why is that backend dependent ? 

Would- nt the below function below be the same for any backend ? 

Waiting fro your feedback on all those questions 

Thank you 

JM 

- 


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct 
mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags 
flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx; 

int i=0; 


while(boxes[i]!=NULL)
{
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
 return -1;
i++;
}
return 0;
}

On 2019-01-06 16:31, Joan Moreau via dovecot wrote: 


for fts_backend_xxx_lookup, where is specidifed in which field (to, cc, 
subject, body, from, all) to lookup ?

On 2019-01-06 16:03, Joan Moreau wrote: 

For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the management 
of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote: 

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,

Re: Solr -> Xapian ?

2019-01-06 Thread Joan Moreau via dovecot

and finally , for fts_backend__lookup_multi, why is that backend
dependent ? 

Would- nt the below function below be the same for any backend ? 

Waiting fro your feedback on all those questions 

Thank you 

JM 

- 


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend,
struct mailbox *const boxes[], struct mail_search_arg *args, enum
fts_lookup_flags flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx; 

int i=0; 


while(boxes[i]!=NULL)
{
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
return -1;
i++;
}
return 0;
}

On 2019-01-06 16:31, Joan Moreau via dovecot wrote:


for fts_backend_xxx_lookup, where is specidifed in which field (to, cc, 
subject, body, from, all) to lookup ?

On 2019-01-06 16:03, Joan Moreau wrote: 

For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the management 
of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote: 

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in 

Re: Solr -> Xapian ?

2019-01-06 Thread Joan Moreau via dovecot

for fts_backend_xxx_lookup, where is specidifed in which field (to, cc,
subject, body, from, all) to lookup ?

On 2019-01-06 16:03, Joan Moreau wrote:

For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the management 
of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote: 

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

Re: Solr -> Xapian ?

2019-01-06 Thread Joan Moreau via dovecot

For "rescan " and "optimize", wouldn't it be the dovecot core who
indicate which are to be dismissed (expunged), or re-ask for indexing a
particular (or all) uid ? WHy would the backend be aware of the
transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the
management of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote:

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

  1   2   >