Re: [Dovecot] Solr 4.0 - lucene - FTS

2013-01-19 Thread Paul Freeman

On 2012-11-07 15:14, Timo Sirainen wrote:

On 7.11.2012, at 15.01, Charles Marcus wrote:

As one who is interested in implementing FTS sometime in the future, 
I'm curious about what is in store as far as improvements go...


Specifically, any plans for implementing immediate/automatic index 
updates at delivery time? The lack of automatically updated indexes is 
one downside for its implementation...


Nothing really prevents from adding that very easily ..


trivial ? :)


I guess it
would need a new setting, which is always the most annoying part of
small changes. :) I think it would have to have a setting equivalent
to doveadm index -n parameter, which allows indexing most users,
except those who pretty much never read their emails. So with doveadm
index -n 1000 you could set that if the mailbox's \Recent count is
over 1000, don't index the mailbox. So .. hmm. I guess two settings
would be cleaner:

plugin {
  fts_autoindex = yes
  fts_autoindex_max_recent = 1000
}


sounds nice, any plans on the roadmap to actually implement this yet ? 
:)




Or maybe there's a better name than "autoindex" for this feature.
SEARCH always autoindexes anyway.

Also, does the release of Solr 4.0 mean anything for the lucene 
library used by dovecot?


No, fts-lucene and fts-solr are separate backends. But I do have some
small plans to add a few more features to fts-solr.





Re: [Dovecot] Solr 4.0 - lucene - FTS

2012-11-11 Thread Daniel L. Miller
 

On 2012-11-08 03:45, Charles Marcus wrote: 

> On 2012-11-07 10:14
AM, Timo Sirainen  wrote:
> 
>> No, fts-lucene and fts-solr
are separate backends. But I do have some small plans to add a few more
features to fts-solr.
> 
> Thanks again Timo, but one last
follow-up...
> 
> According to the wiki, Solr is the preferred method,
but that seems 
> weird to me - it requires a full blown Solr server
that dovecot 
> communicates with using HTTP/XML queries? Maybe not that
big a deal, but 
> just sounds like overkill to me, unless you are maybe
already using Solr 
> for website searches (which I'm not and have no
need for). I would much 
> prefer something simpler that doesn't require
any external dependencies 
> like that, so, next choice is Lucene...
>

> Looks much simpler, only requires Lucene's C++ library...
> 
> But it
builds only a single Lucene index for all mailboxes - not sure if 
>
this is good or bad? Seems like it would be better/more efficient (and

> less chance of index corruption, but most importantly, less overhead
in 
> the event that one gets hosed and dovecot needs to rebuild it) to
build 
> individual indexes for each mailbox, then, maybe, to provide
support for 
> searching ALL mailboxes, have a master index that
basically just 
> maintains a list of all of the individual indexes to
be used for the 
> search (so it doesn't have to scan all available
mailboxes, but which it 
> can do in the event that *it* ever got
hosed).
> 
> Obviously I don't know much about all this, so may be
totally off base...
> 
> Thanks again, and for listening to my
ramblings,

My, probably wrong, impression is this: 

The concept of
running a "full blown Solr server" seems intimidating - until you
actually do it. It's just another Java process. If you're already using
Java for something else then I don't think there's much concern - my
(again, probably wrong) understanding is once you've got one Java
process running, other than process-specific variables/caching the
overall overhead of the Java VM is shared - so in for a penny in for a
pound. 

Lucene development is actively done in Java, with Solr being
the primary reference implementation. The C libraries (I know of two)
are then derived from the Java library - so the C implementations always
lag behind the Java one, and it looks like there's much more active work
going into the Java library. 

There's no question the Lucene
implementation in Dovecot is the simplest for an administrator to work
with - but the Solr version sure looks a lot more powerful. The tradeoff
is sometimes needing to fiddle with configuration settings (not like we
ever need to that for anything else, right?), especially with new
versions of either Dovecot or Solr. 

Having a single index store - I
suppose theoretically increases a point of failure, but given that the
FTS indexes are a partial duplicate of and generated from the mail
storage I'm not losing sleep over it. I put my Solr installation on the
same raid array as my mail store - I'm not seeing any issues with it but
I don't claim to be a senior admin. 

I'm currently running Solr 4.0. A
few tweaks are needed to get it running, but once it's up it goes quite
smoothly. 

-- 

Daniel
 


Re: [Dovecot] Solr 4.0 - lucene - FTS

2012-11-08 Thread Charles Marcus

On 2012-11-07 10:14 AM, Timo Sirainen  wrote:

No, fts-lucene and fts-solr are separate backends. But I do have some small 
plans to add a few more features to fts-solr.


Thanks again Timo, but one last follow-up...

According to the wiki, Solr is the preferred method, but that seems 
weird to me - it requires a full blown Solr server that dovecot 
communicates with using HTTP/XML queries? Maybe not that big a deal, but 
just sounds like overkill to me, unless you are maybe already using Solr 
for website searches (which I'm not and have no need for). I would much 
prefer something simpler that doesn't require any external dependencies 
like that, so, next choice is Lucene...


Looks much simpler, only requires Lucene's C++ library...

But it builds only a single Lucene index for all mailboxes - not sure if 
this is good or bad? Seems like it would be better/more efficient (and 
less chance of index corruption, but most importantly, less overhead in 
the event that one gets hosed and dovecot needs to rebuild it) to build 
individual indexes for each mailbox, then, maybe, to provide support for 
searching ALL mailboxes, have a master index that basically just 
maintains a list of all of the individual indexes to be used for the 
search (so it doesn't have to scan all available mailboxes, but which it 
can do in the event that *it* ever got hosed).


Obviously I don't know much about all this, so may be totally off base...

Thanks again, and for listening to my ramblings,

--

Best regards,

Charles



Re: [Dovecot] Solr 4.0 - lucene - FTS

2012-11-07 Thread Charles Marcus

On 2012-11-07 11:29 AM, Timo Sirainen  wrote:

On 7.11.2012, at 18.21, Charles Marcus wrote:


On 2012-11-07 10:14 AM, Timo Sirainen  wrote:

Specifically, any plans for implementing immediate/automatic index updates at 
delivery time? The lack of automatically updated indexes is one downside for 
its implementation...

Nothing really prevents from adding that very easily .. I guess it would need a 
new setting, which is always the most annoying part of small changes.:)  I 
think it would have to have a setting equivalent to doveadm index -n parameter, 
which allows indexing most users, except those who pretty much never read their 
emails. So with doveadm index -n 1000 you could set that if the mailbox's 
\Recent count is over 1000, don't index the mailbox. So .. hmm. I guess two 
settings would be cleaner:

plugin {
   fts_autoindex = yes
   fts_autoindex_max_recent = 1000
}

And this would work in conjunction with (and require) the dovecot LDA / LMTP?

Yes. For non-Dovecot LDA/LMTP you can already run "doveadm index" after the 
delivery. Or you could do that already with dovecot-lda as well.


Gotcha... just confirming that as long as you were using dovecot 
LDA/LMTP, index updates would be immediate and not impact system 
performance.


Thanks... looking forward to its implementation someday. ;)

--

Best regards,

Charles



Re: [Dovecot] Solr 4.0 - lucene - FTS

2012-11-07 Thread Timo Sirainen
On 7.11.2012, at 18.21, Charles Marcus wrote:

> On 2012-11-07 10:14 AM, Timo Sirainen  wrote:
>>> Specifically, any plans for implementing immediate/automatic index updates 
>>> at delivery time? The lack of automatically updated indexes is one downside 
>>> for its implementation...
>> Nothing really prevents from adding that very easily .. I guess it would 
>> need a new setting, which is always the most annoying part of small 
>> changes.:)  I think it would have to have a setting equivalent to doveadm 
>> index -n parameter, which allows indexing most users, except those who 
>> pretty much never read their emails. So with doveadm index -n 1000 you could 
>> set that if the mailbox's \Recent count is over 1000, don't index the 
>> mailbox. So .. hmm. I guess two settings would be cleaner:
>> 
>> plugin {
>>   fts_autoindex = yes
>>   fts_autoindex_max_recent = 1000
>> }
> 
> And this would work in conjunction with (and require) the dovecot LDA / LMTP?

Yes. For non-Dovecot LDA/LMTP you can already run "doveadm index" after the 
delivery. Or you could do that already with dovecot-lda as well.



Re: [Dovecot] Solr 4.0 - lucene - FTS

2012-11-07 Thread Charles Marcus

On 2012-11-07 10:14 AM, Timo Sirainen  wrote:

Specifically, any plans for implementing immediate/automatic index updates at 
delivery time? The lack of automatically updated indexes is one downside for 
its implementation...

Nothing really prevents from adding that very easily .. I guess it would need a 
new setting, which is always the most annoying part of small changes.:)  I 
think it would have to have a setting equivalent to doveadm index -n parameter, 
which allows indexing most users, except those who pretty much never read their 
emails. So with doveadm index -n 1000 you could set that if the mailbox's 
\Recent count is over 1000, don't index the mailbox. So .. hmm. I guess two 
settings would be cleaner:

plugin {
   fts_autoindex = yes
   fts_autoindex_max_recent = 1000
}


And this would work in conjunction with (and require) the dovecot LDA / 
LMTP?


--

Best regards,

Charles



Re: [Dovecot] Solr 4.0 - lucene - FTS

2012-11-07 Thread Timo Sirainen
On 7.11.2012, at 15.01, Charles Marcus wrote:

> As one who is interested in implementing FTS sometime in the future, I'm 
> curious about what is in store as far as improvements go...
> 
> Specifically, any plans for implementing immediate/automatic index updates at 
> delivery time? The lack of automatically updated indexes is one downside for 
> its implementation...

Nothing really prevents from adding that very easily .. I guess it would need a 
new setting, which is always the most annoying part of small changes. :) I 
think it would have to have a setting equivalent to doveadm index -n parameter, 
which allows indexing most users, except those who pretty much never read their 
emails. So with doveadm index -n 1000 you could set that if the mailbox's 
\Recent count is over 1000, don't index the mailbox. So .. hmm. I guess two 
settings would be cleaner:

plugin {
  fts_autoindex = yes
  fts_autoindex_max_recent = 1000
}

Or maybe there's a better name than "autoindex" for this feature. SEARCH always 
autoindexes anyway.

> Also, does the release of Solr 4.0 mean anything for the lucene library used 
> by dovecot?

No, fts-lucene and fts-solr are separate backends. But I do have some small 
plans to add a few more features to fts-solr.