Re: Solr -> Xapian ?
which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more("t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen: - IMAP session is opened - IMAP SEARCH is run, which opens and searches the index - a new mail is delivered to the mailbox and indexed - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail and doesn't include it in the search results. Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes: - If there are any mails in FTS index that no longer exist in the actual mailbox, delete those mails from FTS - If FTS is missing any mails in the middle of the mailbox, make sure that the next mailbox indexing will index those missing mails. I think currently this basically means reindexing all the mails since the first missing mail, even the mails that are already in the index. fts-lucene implements this, but other FTS backends are lazy and simply rebuild all mails. Actually fts-solr is bad because it doesn't even delete the extra mails. Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? and finally , for fts_backend__lookup_multi, why is that backend dependent ? This function is called only when searching in virtual folders. So for example the virtual "All mails" folder, which would contain all mails in all folders. In that case the boxes[] would contain a list of user's all folders, except Trash and Spam. If lookup_multi() isn't implemented (left to NULL), the search is run separately via lookup() for each folder. With lookup_multi() there can be just one lookup, and the backend can filter only the wanted folders and return them directly. So it's an optimization for FTS indexes that support user-global searches rather than only per-folder searches. static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags flags, struct fts_multi_result *result) { struct xapian_fts_backend_update_context *ctx = (struct xapian_fts_backend_update_context *)_ctx; int i=0; while(boxes[i]!=NULL) { if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) return -1; i++; } return 0; } See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it basically does this. For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? rescan() is about fixing up a more or less broken index, or simply to verify that it's all ok. So core doesn't know what messages exist in the FTS index and can't request specific reindexing or expunging. I guess an alternative API could have been to have functions that iterate through all mails in the index, and use that to implement rescan in core. Now thinking about it, that sounds like a simpler and better way. optimize() is currently done only
Re: Solr -> Xapian ?
On 13 Jan 2019, at 10.45, Joan Moreau via dovecot wrote: > > Now, I can see in the logs that several times, the dovecot calls the > fts_backend_xapian_update_set_mailbox with box == NULL. WHy so ? > fts-api.h says: /* Switch to updating the specified mailbox. box may also be set to NULL to make sure the previous mailbox won't tried to be accessed anymore. */ void fts_backend_update_set_mailbox(struct fts_backend_update_context *ctx, struct mailbox *box); So it's just telling you that you can close/free any stuff related to that mailbox. >> additionally, my logic is that the backend stores one databalse per mailox >> in /xapian-indexes (in the "root" dir of the user), the name od the database >> is the GUID of the mailbox >> >> For INBOX, that works perfectly, and database is properly createdm and >> backed starts indexing all emails >> >> For other folder, somehow, the process can not access that (root) folder. >> >> Am I missing something ? >> This is a bit ambiguous, because some people mean mailbox=folder and others mean mailbox=user account, and GUID can also be the internal Dovecot folder GUID, or a GUID of the user. I'd recommend using a single database per user anyway.
Re: Solr -> Xapian ?
because fts_squat is set to be deleted Xapian and similar libraries offers a very easy interface for FTS (and basically, I have done it already) On 2019-01-07 18:31, Michael Slusarz wrote: Maybe a dumb question (I admit I haven't followed this thread very closely)... But why are you writing a new FTS driver? If squat allegedly does everything you need it to do, why don't you just take that plugin and fix it up to do what you need? That seems way easier than trying to create a FTS driver from scratch. michael On January 7, 2019 at 7:05 AM Joan Moreau via dovecot wrote: Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ?
Re: Solr -> Xapian ?
I found the solution o this using SEQ_RANGE_ARRAY_ADD(&RESULT->DEFINITE_UIDS, UID); Now, I can see in the logs that several times, the dovecot calls the fts_backend_xapian_update_set_mailbox with box == NULL. WHy so ? THank you On 2019-01-12 21:40, Joan Moreau via dovecot wrote: I somehow fixed the folder issue. (seems some unix rights after too many tests) Getting back on the "fts_results" structure: I am trying: I_ARRAY_INIT(&(RESULT->DEFINITE_UIDS),R->SIZE); I_ARRAY_INIT(&(RESULT->MAYBE_UIDS),0); uint32_t uid; for(i=0;isize;i++) { try { uid=atol(backend->dbr->get_document(r->data[i]).get_value(1).c_str()); i_warning("Rresult UID=%d",uid); ARRAY_IDX_SET(&(RESULT->DEFINITE_UIDS),I,&UID); } catch(Xapian::Error e) { i_warning(e.get_msg().c_str()); } } I can see in hte log that UID are properly found on Xapian database, but no results are transmitted to dovecot and to the imap client (roundcube in my case) Help please :) On 2019-01-12 18:15, Joan Moreau wrote: additionally, my logic is that the backend stores one databalse per mailox in /xapian-indexes (in the "root" dir of the user), the name od the database is the GUID of the mailbox For INBOX, that works perfectly, and database is properly createdm and backed starts indexing all emails For other folder, somehow, the process can not access that (root) folder. Am I missing something ? On 2019-01-12 17:37, Joan Moreau wrote: THank you Now, for the results I see the member of fts_result is : ARRAY_TYPE(seq_range) definite_uids; I have the UID as a aray of uint32_t * How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? On 2019-01-12 10:25, Timo Sirainen wrote: On 11 Jan 2019, at 21.23, Joan Moreau via dovecot wrote: The below patch resolves the compilation error $ diff -p compat.h compat.h.joan *** compat.h 2019-01-11 20:21:00.726625427 +0100 --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100 *** struct iovec; *** 202,207 --- 202,211 ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len); #endif + #ifdef __cplusplus + extern "C" { + #endif You should put this extern "C" into the C++ file you're creating. See for example how fts-lucene/lucene-wrapper.cc does this. 1 - WHat does represent "subargs" in mail_search_args It's set only for SEARCH_OR and SEARCH_SUB. So for example: SEARCH TEXT foo TEXT bar TEXT baz results in: type=SEARCH_SUB value.subargs = ( { type=SEARCH, value.str="foo" }, { type=SEARCH, value.str="bar" }, { type=SEARCH, value.str="baz" }, ) Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other combination of OR/ANDs. 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it. 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order. You can implement get_last_uid() simply by keeping track of it in dovecot.index* files, similar to how Lucene and Solr already do it with fts_index_get_header() / fts_index_set_header(). They also have a fallback that if the index doesn't have the last_uid value, they do a slower search from the Lucene/Solr index to find the last UID.
Re: Solr -> Xapian ?
I somehow fixed the folder issue. (seems some unix rights after too many tests) Getting back on the "fts_results" structure: I am trying: I_ARRAY_INIT(&(RESULT->DEFINITE_UIDS),R->SIZE); I_ARRAY_INIT(&(RESULT->MAYBE_UIDS),0); uint32_t uid; for(i=0;isize;i++) { try { uid=atol(backend->dbr->get_document(r->data[i]).get_value(1).c_str()); i_warning("Rresult UID=%d",uid); ARRAY_IDX_SET(&(RESULT->DEFINITE_UIDS),I,&UID); } catch(Xapian::Error e) { i_warning(e.get_msg().c_str()); } } I can see in hte log that UID are properly found on Xapian database, but no results are transmitted to dovecot and to the imap client (roundcube in my case) Help please :) On 2019-01-12 18:15, Joan Moreau wrote: additionally, my logic is that the backend stores one databalse per mailox in /xapian-indexes (in the "root" dir of the user), the name od the database is the GUID of the mailbox For INBOX, that works perfectly, and database is properly createdm and backed starts indexing all emails For other folder, somehow, the process can not access that (root) folder. Am I missing something ? On 2019-01-12 17:37, Joan Moreau wrote: THank you Now, for the results I see the member of fts_result is : ARRAY_TYPE(seq_range) definite_uids; I have the UID as a aray of uint32_t * How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? On 2019-01-12 10:25, Timo Sirainen wrote: On 11 Jan 2019, at 21.23, Joan Moreau via dovecot wrote: The below patch resolves the compilation error $ diff -p compat.h compat.h.joan *** compat.h 2019-01-11 20:21:00.726625427 +0100 --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100 *** struct iovec; *** 202,207 --- 202,211 ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len); #endif + #ifdef __cplusplus + extern "C" { + #endif You should put this extern "C" into the C++ file you're creating. See for example how fts-lucene/lucene-wrapper.cc does this. 1 - WHat does represent "subargs" in mail_search_args It's set only for SEARCH_OR and SEARCH_SUB. So for example: SEARCH TEXT foo TEXT bar TEXT baz results in: type=SEARCH_SUB value.subargs = ( { type=SEARCH, value.str="foo" }, { type=SEARCH, value.str="bar" }, { type=SEARCH, value.str="baz" }, ) Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other combination of OR/ANDs. 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it. 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order. You can implement get_last_uid() simply by keeping track of it in dovecot.index* files, similar to how Lucene and Solr already do it with fts_index_get_header() / fts_index_set_header(). They also have a fallback that if the index doesn't have the last_uid value, they do a slower search from the Lucene/Solr index to find the last UID.
Re: Solr -> Xapian ?
additionally, my logic is that the backend stores one databalse per mailox in /xapian-indexes (in the "root" dir of the user), the name od the database is the GUID of the mailbox For INBOX, that works perfectly, and database is properly createdm and backed starts indexing all emails For other folder, somehow, the process can not access that (root) folder. Am I missing something ? On 2019-01-12 17:37, Joan Moreau wrote: THank you Now, for the results I see the member of fts_result is : ARRAY_TYPE(seq_range) definite_uids; I have the UID as a aray of uint32_t * How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? On 2019-01-12 10:25, Timo Sirainen wrote: On 11 Jan 2019, at 21.23, Joan Moreau via dovecot wrote: The below patch resolves the compilation error $ diff -p compat.h compat.h.joan *** compat.h 2019-01-11 20:21:00.726625427 +0100 --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100 *** struct iovec; *** 202,207 --- 202,211 ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len); #endif + #ifdef __cplusplus + extern "C" { + #endif You should put this extern "C" into the C++ file you're creating. See for example how fts-lucene/lucene-wrapper.cc does this. 1 - WHat does represent "subargs" in mail_search_args It's set only for SEARCH_OR and SEARCH_SUB. So for example: SEARCH TEXT foo TEXT bar TEXT baz results in: type=SEARCH_SUB value.subargs = ( { type=SEARCH, value.str="foo" }, { type=SEARCH, value.str="bar" }, { type=SEARCH, value.str="baz" }, ) Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other combination of OR/ANDs. 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it. 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order. You can implement get_last_uid() simply by keeping track of it in dovecot.index* files, similar to how Lucene and Solr already do it with fts_index_get_header() / fts_index_set_header(). They also have a fallback that if the index doesn't have the last_uid value, they do a slower search from the Lucene/Solr index to find the last UID.
Re: Solr -> Xapian ?
THank you Now, for the results I see the member of fts_result is : ARRAY_TYPE(seq_range) definite_uids; I have the UID as a aray of uint32_t * How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? On 2019-01-12 10:25, Timo Sirainen wrote: On 11 Jan 2019, at 21.23, Joan Moreau via dovecot wrote: The below patch resolves the compilation error $ diff -p compat.h compat.h.joan *** compat.h 2019-01-11 20:21:00.726625427 +0100 --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100 *** struct iovec; *** 202,207 --- 202,211 ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len); #endif + #ifdef __cplusplus + extern "C" { + #endif You should put this extern "C" into the C++ file you're creating. See for example how fts-lucene/lucene-wrapper.cc does this. 1 - WHat does represent "subargs" in mail_search_args It's set only for SEARCH_OR and SEARCH_SUB. So for example: SEARCH TEXT foo TEXT bar TEXT baz results in: type=SEARCH_SUB value.subargs = ( { type=SEARCH, value.str="foo" }, { type=SEARCH, value.str="bar" }, { type=SEARCH, value.str="baz" }, ) Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other combination of OR/ANDs. 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it. 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order. You can implement get_last_uid() simply by keeping track of it in dovecot.index* files, similar to how Lucene and Solr already do it with fts_index_get_header() / fts_index_set_header(). They also have a fallback that if the index doesn't have the last_uid value, they do a slower search from the Lucene/Solr index to find the last UID.
Re: Solr -> Xapian ?
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot wrote: > > The below patch resolves the compilation error > > $ diff -p compat.h compat.h.joan > *** compat.h 2019-01-11 20:21:00.726625427 +0100 > --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100 > *** struct iovec; > *** 202,207 > --- 202,211 > ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len); > #endif > > + #ifdef __cplusplus > + extern "C" { > + #endif > You should put this extern "C" into the C++ file you're creating. See for example how fts-lucene/lucene-wrapper.cc does this. > 1 - WHat does represent "subargs" in mail_search_args It's set only for SEARCH_OR and SEARCH_SUB. So for example: SEARCH TEXT foo TEXT bar TEXT baz results in: type=SEARCH_SUB value.subargs = ( { type=SEARCH, value.str="foo" }, { type=SEARCH, value.str="bar" }, { type=SEARCH, value.str="baz" }, ) Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other combination of OR/ANDs. > 2 - for rescan : who is responsible for passing again the new email ? Is > the Dovecot core sending again all the emails to index ? or the fts > shall somehow access the mailbox and read all emails ? Wouldn't just be > saying "delete all index and get_last_uid is now 0" the easy way ? or > the fts must process all emails (and block the current thread as a > mailbx maybe quite large) The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it. > 3 - for get_last_uid : this uncertainity is very unclear. "If there is a > gap, then indexer first indexes all the missing" -> this mean at a > certain point, indexer maybe rebuilding a previous email, so *last* uid > is something different than max. And how indexer does know whther there > is a gap wihtout callong the fts backend (whch it does not as there are > no function for that) ? I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order. You can implement get_last_uid() simply by keeping track of it in dovecot.index* files, similar to how Lucene and Solr already do it with fts_index_get_header() / fts_index_set_header(). They also have a fallback that if the index doesn't have the last_uid value, they do a slower search from the Lucene/Solr index to find the last UID.
Re: Solr -> Xapian ?
El 04/01/19 a las 03:20, Joan Moreau via dovecot escribió: > What about consedering linking Dovecot with Xapian librairies instead of > going to nightmare Solr ? > https://xapian.org/features given that notmuch already does a good job at indexing email (although only supports maildirs afaik), wouldn't it be simpler to write a plugin for running notmuch searches from dovecot? https://notmuchmail.org/
Re: Solr -> Xapian ?
count, off_t offset); ^~ ../../../src/lib/compat.h:208:20: error: conflicting declaration of 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' linkage # define pwrite i_my_pwrite Any help welcome Hi, I figured out the "namespace" issue Remaining questions are : 1 - WHat does represent "subargs" in mail_search_args 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? 4 - How to update configure.ac & additional files to add the "--with-xapian" wichi will test for libxapian presence and add it to the build ? Thank you On 2019-01-08 04:24, Timo Sirainen wrote: On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org> wrote: Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more(" t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen: - IMAP session is opened - IMAP SEARCH is run, which opens and searches the index - a new mail is delivered to the mailbox and indexed - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail and doesn't include it in the search results. Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes: - If there are any mails in FTS index that no longer exist in the actual mailbox, delete those mails from FTS - If FTS is missing any mails in the middle of the mailbox, make sure that the next mailbox indexing will index those missing mails. I think currently this basically means reindexing all the mails since the first missing mail, even the mails that are already in the index. fts-lucene implements this, but other FTS backends are lazy and simply rebuild all mails. Actually fts-solr is bad because it doesn't even delete the extra mails. Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ?and finally , for fts_backend__lookup_multi, why is that backend dependent ? This function is called only when searching in virtual folders. So for example the virtual "All mails" folder, which would contain all mails in all folders. In that case the boxes[] would contain a list of user's all folders, except Trash and Spam. If lookup_multi() isn't
Re: Solr -> Xapian ?
There is no point into a separate plugin, the purpose is to replace squat as the default fts (solr being a nightmare) On 2019-01-11 18:23, Aki Tuomi wrote: I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. Aki On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot@dovecot.org> wrote: I managed to deal with the namespace issue (updated makefile.am) However, I reach : ../../../src/lib/compat.h:207:19: error: conflicting declaration of 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage # define pread i_my_pread ^~ ../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' linkage ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); ^~ ../../../src/lib/compat.h:208:20: error: conflicting declaration of 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' linkage # define pwrite i_my_pwrite Any help welcome Hi, I figured out the "namespace" issue Remaining questions are : 1 - WHat does represent "subargs" in mail_search_args 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? 4 - How to update configure.ac & additional files to add the "--with-xapian" wichi will test for libxapian presence and add it to the build ? Thank you On 2019-01-08 04:24, Timo Sirainen wrote: On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org> wrote: Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more(" t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen: - IMAP session is opened - IMAP SEARCH is run, which opens and searches the index - a new mail is delivered to the mailbox and indexed - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail and doesn't include it in the search results. Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes: - If there are any mails in FTS index that no longer exist in the actual mailbox, delete those mails from FTS - If FTS is missing any mails in the middle of the mailbox, make sure that the next mailbox indexing will index those missing
Re: Solr -> Xapian ?
I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. Aki On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot@dovecot.org> wrote: I managed to deal with the namespace issue (updated makefile.am) However, I reach : ../../../src/lib/compat.h:207:19: error: conflicting declaration of 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage # define pread i_my_pread ^~ ../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' linkage ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); ^~ ../../../src/lib/compat.h:208:20: error: conflicting declaration of 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' linkage # define pwrite i_my_pwrite Any help welcome Hi, I figured out the "namespace" issue Remaining questions are : 1 - WHat does represent "subargs" in mail_search_args 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? 4 - How to update configure.ac & additional files to add the "--with-xapian" wichi will test for libxapian presence and add it to the build ? Thank you On 2019-01-08 04:24, Timo Sirainen wrote: On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org> wrote: Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more(" t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? In case of Xapian it might n
Re: Solr -> Xapian ?
I managed to deal with the namespace issue (updated makefile.am) However, I reach : ../../../src/lib/compat.h:207:19: error: conflicting declaration of 'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage # define pread i_my_pread ^~ ../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' linkage ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); ^~ ../../../src/lib/compat.h:208:20: error: conflicting declaration of 'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' linkage # define pwrite i_my_pwrite Any help welcome Hi, I figured out the "namespace" issue Remaining questions are : 1 - WHat does represent "subargs" in mail_search_args 2 - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) 3 - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? 4 - How to update configure.ac & additional files to add the "--with-xapian" wichi will test for libxapian presence and add it to the build ? Thank you On 2019-01-08 04:24, Timo Sirainen wrote: On 7 Jan 2019, at 16.05, Joan Moreau via dovecot wrote: Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more("t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen: - IMAP session is opened - IMAP SEARCH is run, which opens and searches the index - a new mail is delivered to the mailbox and indexed - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail and doesn't include it in the search results. Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes: - If there are any mails in FTS index that no longer exist in the actual mailbox, delete those mails from FTS - If FTS is missing any mails in the middle of the mailbox, make sure that the next mailbox indexing will index those missing mails. I think currently this basically means reindexing all the mails since the first missing mail, even the mails that are already in the index. fts-lucene implements this, but other FTS backends are lazy and simply rebuild all mails. Actually fts-solr is bad because it doesn't even delete the extra mails. Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ?and finally , for fts_backend__lookup_multi, why is t
Re: Solr -> Xapian ?
Also, 1 - WHat does represent "subargs" in mail_search_args 2 - I made my first code, and the error I get compiling within the dovecot architecture is "In file included from fts-xapian-plugin.c:4: fts-xapian-plugin.h:6:1: error: unknown type name 'using'; did you mean 'uint'? using namespace std;" if I remove this, the Xapian library is also complaining about "namespace" keyword In file included from /usr/include/xapian.h:47, from fts-backend-xapian.c:11: /usr/include/xapian/types.h:31:1: error: unknown type name 'namespace'; did you mean 'i_isspace'? namespace Xapian { Someone can bring me some light ? Thanks On 2019-01-09 09:58, Joan Moreau via dovecot wrote: Ok. Additional question : - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? On 2019-01-08 04:24, Timo Sirainen wrote: On 7 Jan 2019, at 16.05, Joan Moreau via dovecot wrote: Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more("t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen: - IMAP session is opened - IMAP SEARCH is run, which opens and searches the index - a new mail is delivered to the mailbox and indexed - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail and doesn't include it in the search results. Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes: - If there are any mails in FTS index that no longer exist in the actual mailbox, delete those mails from FTS - If FTS is missing any mails in the middle of the mailbox, make sure that the next mailbox indexing will index those missing mails. I think currently this basically means reindexing all the mails since the first missing mail, even the mails that are already in the index. fts-lucene implements this, but other FTS backends are lazy and simply rebuild all mails. Actually fts-solr is bad because it doesn't even delete the extra mails. Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? and finally , for fts_backend__lookup_multi, why is that backend dependent ? This function is called only when searching in virtual folders. So for example the virtual "All mails" folder, which would contain all ma
Re: Solr -> Xapian ?
Ok. Additional question : - for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) - for get_last_uid : this uncertainity is very unclear. "If there is a gap, then indexer first indexes all the missing" -> this mean at a certain point, indexer maybe rebuilding a previous email, so *last* uid is something different than max. And how indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as there are no function for that) ? On 2019-01-08 04:24, Timo Sirainen wrote: On 7 Jan 2019, at 16.05, Joan Moreau via dovecot wrote: Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more("t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen: - IMAP session is opened - IMAP SEARCH is run, which opens and searches the index - a new mail is delivered to the mailbox and indexed - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail and doesn't include it in the search results. Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes: - If there are any mails in FTS index that no longer exist in the actual mailbox, delete those mails from FTS - If FTS is missing any mails in the middle of the mailbox, make sure that the next mailbox indexing will index those missing mails. I think currently this basically means reindexing all the mails since the first missing mail, even the mails that are already in the index. fts-lucene implements this, but other FTS backends are lazy and simply rebuild all mails. Actually fts-solr is bad because it doesn't even delete the extra mails. Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? and finally , for fts_backend__lookup_multi, why is that backend dependent ? This function is called only when searching in virtual folders. So for example the virtual "All mails" folder, which would contain all mails in all folders. In that case the boxes[] would contain a list of user's all folders, except Trash and Spam. If lookup_multi() isn't implemented (left to NULL), the search is run separately via lookup() for each folder. With lookup_multi() there can be just one lookup, and the backend can filter only the wanted folders and return them directly. So it's an optimization for FTS indexes that support user-global searches rather than only per-folder searches. static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags flags, struct fts_multi_result *result) { struct xapian_fts_backend_update_context *ctx = (struct xapian_fts_back
Re: Solr -> Xapian ?
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot wrote: > > Hi > > ANyone to answer specifically ? > > Q1 : get_last_uid -> Is this the last UID indexed (which may be not the > greatest value), or the gratest value (which may not be the latest) (the code > of existing plugins is unclear about this, Solr looks for the greatest for > insance) All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.) > Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? > What is the link with "build_more" ? The idea is that it calls something like: - build_key(type=hdr, hdr_name=From) - build_more("t...@iki.fi") - build_key(type=hdr, hdr_name=Subject) - build_more("Re: Solr -> Xapian ?") - build_key(type=body_part) - build_more("message body piece") - build_more("message body piece2") ... > Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least > among "cc, to, from, subject, body") is not appearing in the 'struct' data. > WHere to find it ? lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders. In case of a single header search, you should have args->args->hdr_field_name contain the header name and args->args->value.str contain the content you're searching for. > Q4 : Refresh : this is very unclear. How come there would not be the "latest" > view on index. What is the real meaning of this function ? In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen: - IMAP session is opened - IMAP SEARCH is run, which opens and searches the index - a new mail is delivered to the mailbox and indexed - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail and doesn't include it in the search results. > Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes: - If there are any mails in FTS index that no longer exist in the actual mailbox, delete those mails from FTS - If FTS is missing any mails in the middle of the mailbox, make sure that the next mailbox indexing will index those missing mails. I think currently this basically means reindexing all the mails since the first missing mail, even the mails that are already in the index. fts-lucene implements this, but other FTS backends are lazy and simply rebuild all mails. Actually fts-solr is bad because it doesn't even delete the extra mails. > Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? >> and finally , for fts_backend__lookup_multi, why is that backend >> dependent ? This function is called only when searching in virtual folders. So for example the virtual "All mails" folder, which would contain all mails in all folders. In that case the boxes[] would contain a list of user's all folders, except Trash and Spam. If lookup_multi() isn't implemented (left to NULL), the search is run separately via lookup() for each folder. With lookup_multi() there can be just one lookup, and the backend can filter only the wanted folders and return them directly. So it's an optimization for FTS indexes that support user-global searches rather than only per-folder searches. >> static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, >> struct mailbox *const boxes[], struct mail_search_arg *args, enum >> fts_lookup_flags flags, struct fts_multi_result *result) >> { >> struct xapian_fts_backend_update_context *ctx = >> (struct xapian_fts_backend_update_context *)_ctx; >> >> int i=0; >> >> while(boxes[i]!=NULL) >> { >> if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) >> return -1; >> i++; >> } >> return 0; >> } See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it basically does this. >> For "rescan " and "optimize", wouldn't it be the dovecot core who indicate >> which are to be dismissed (expunged), or re-ask for indexing a particular >> (or all) uid ? WHy would the backend be aware of the transactions on the >> mailbox ??? rescan() is
Re: Solr -> Xapian ?
Maybe a dumb question (I admit I haven't followed this thread very closely)... But why are you writing a new FTS driver? If squat allegedly does everything you need it to do, why don't you just take that plugin and fix it up to do what you need? That seems way easier than trying to create a FTS driver from scratch. michael > On January 7, 2019 at 7:05 AM Joan Moreau via dovecot > wrote: > > > Hi > > ANyone to answer specifically ? > > Q1 : get_last_uid -> Is this the last UID indexed (which may be not the > greatest value), or the gratest value (which may not be the latest) (the code > of existing plugins is unclear about this, Solr looks for the greatest for > insance) > > Q2 : WHen Indexing an email, the data is not passed by "build_key". Why > so ? What is the link with "build_more" ? > > Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least > among "cc, to, from, subject, body") is not appearing in the 'struct' data. > WHere to find it ? > > Q4 : Refresh : this is very unclear. How come there would not be the > "latest" view on index. What is the real meaning of this function ? > > Q5 : Rescan : is it just a bout remonving all indexes for a specific > mailbox ? > > Q6 : lokkup_multi : isn't the function the same for all plugnins (see > below) ? >
Re: Solr -> Xapian ?
Hi ANyone to answer specifically ? Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? THank you On 2019-01-06 16:50, Joan Moreau via dovecot wrote: and finally , for fts_backend__lookup_multi, why is that backend dependent ? Would- nt the below function below be the same for any backend ? Waiting fro your feedback on all those questions Thank you JM - static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags flags, struct fts_multi_result *result) { struct xapian_fts_backend_update_context *ctx = (struct xapian_fts_backend_update_context *)_ctx; int i=0; while(boxes[i]!=NULL) { if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) return -1; i++; } return 0; } On 2019-01-06 16:31, Joan Moreau via dovecot wrote: for fts_backend_xxx_lookup, where is specidifed in which field (to, cc, subject, body, from, all) to lookup ? On 2019-01-06 16:03, Joan Moreau wrote: For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? There is alredy "fts_backend_xxx_update_expunge", so I beleive the management of the expunged messages is *NOT* in the backend, right ? On 2019-01-06 15:41, Joan Moreau wrote: also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? On 2019-01-06 14:10, Joan Moreau wrote: for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? On 2019-01-06 11:53, Joan Moreau via dovecot wrote: Thank you I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? On 2019-01-06 08:43, Stephan Bosch wrote: Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xap
Re: Solr -> Xapian ?
and finally , for fts_backend__lookup_multi, why is that backend dependent ? Would- nt the below function below be the same for any backend ? Waiting fro your feedback on all those questions Thank you JM - static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags flags, struct fts_multi_result *result) { struct xapian_fts_backend_update_context *ctx = (struct xapian_fts_backend_update_context *)_ctx; int i=0; while(boxes[i]!=NULL) { if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0) return -1; i++; } return 0; } On 2019-01-06 16:31, Joan Moreau via dovecot wrote: for fts_backend_xxx_lookup, where is specidifed in which field (to, cc, subject, body, from, all) to lookup ? On 2019-01-06 16:03, Joan Moreau wrote: For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? There is alredy "fts_backend_xxx_update_expunge", so I beleive the management of the expunged messages is *NOT* in the backend, right ? On 2019-01-06 15:41, Joan Moreau wrote: also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? On 2019-01-06 14:10, Joan Moreau wrote: for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? On 2019-01-06 11:53, Joan Moreau via dovecot wrote: Thank you I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? On 2019-01-06 08:43, Stephan Bosch wrote: Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in
Re: Solr -> Xapian ?
for fts_backend_xxx_lookup, where is specidifed in which field (to, cc, subject, body, from, all) to lookup ? On 2019-01-06 16:03, Joan Moreau wrote: For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? There is alredy "fts_backend_xxx_update_expunge", so I beleive the management of the expunged messages is *NOT* in the backend, right ? On 2019-01-06 15:41, Joan Moreau wrote: also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? On 2019-01-06 14:10, Joan Moreau wrote: for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? On 2019-01-06 11:53, Joan Moreau via dovecot wrote: Thank you I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? On 2019-01-06 08:43, Stephan Bosch wrote: Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? There is alredy "fts_backend_xxx_update_expunge", so I beleive the management of the expunged messages is *NOT* in the backend, right ? On 2019-01-06 15:41, Joan Moreau wrote: also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? On 2019-01-06 14:10, Joan Moreau wrote: for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? On 2019-01-06 11:53, Joan Moreau via dovecot wrote: Thank you I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? On 2019-01-06 08:43, Stephan Bosch wrote: Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? On 2019-01-06 14:10, Joan Moreau wrote: for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? On 2019-01-06 11:53, Joan Moreau via dovecot wrote: Thank you I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? On 2019-01-06 08:43, Stephan Bosch wrote: Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? On 2019-01-06 11:53, Joan Moreau via dovecot wrote: Thank you I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? On 2019-01-06 08:43, Stephan Bosch wrote: Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
Thank you I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? On 2019-01-06 08:43, Stephan Bosch wrote: Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those functions ? Most notably " get_last_uid" From src/plugins/fts/fts-api.h: /* Get the last_uid for the mailbox. */ int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box, uint32_t *last_uid_r); The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this returns the last UID added to the index for the given mailbox and FTS index. "set_build_key" From src/plugins/fts/fts-api.h: /* Switch to building index for specified key. If backend doesn't want to index this key, it can return FALSE and caller will skip to next key. */ bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx, const struct fts_backend_build_key *key); Same file provides outline of what a build_key is. "build_more" , /* Add more content to the index for the currently specified build key. Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters, but it doesn't need to be NUL-terminated. size contains the data size in bytes, not characters. This function may be called many times and the data block sizes may be small. Backend returns 0 if ok, -1 if build should be aborted. */ int fts_backend_update_build_more(struct fts_backend_update_context *ctx, const unsigned char *data, size_t size); You should look at the sources of a few backends like squat and solr to get a feel of what exactly this is doing. what is refresh versus rescan ? From fts-api.h: /* Refresh index to make sure we see latest changes from lookups. Returns 0 if ok, -1 if error. */ int fts_backend_refresh(struct fts_backend *backend); /* Go through the entire index and make sure all mails are indexed, and delete any extra mails in the index. */ int fts_backend_rescan(struct fts_backend *backend); Regards, Stepham On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?* { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
Anyone willing to explain those functions ? Most notably " get_last_uid" "set_build_key" "build_more" , what is refresh versus rescan ? On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> what other flags ? { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
Anyone willing to explain those functions ? On January 5, 2019 14:23:10 Joan Moreau via dovecot wrote: Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> what other flags ? { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
Thank Stephan I basically need to know the role/description of each of the functions of the fts_backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> WHAT OTHER FLAGS ? { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; THank you On 2019-01-05 08:49, Stephan Bosch wrote: Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin The Dovecot API documentation is not exhaustive everywhere, but the basics are documented. The remaining questions can be answered by looking at examples found in similar plugins or the relevant API sources. I know of one FTS plugin not written by Dovecot developers: https://github.com/atkinsj/fts-elasticsearch If you really wish to do something like this, just go ahead. It will not be a small effort though. As soon as you have concrete questions, we can help you (don't expect rapid responses though). Regards, Stephan.
Re: Solr -> Xapian ?
Also, a description of the "to be" functions of the backend: struct fts_backend fts_backend_xapian = { .name = "xapian", .flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> WHAT OTHER FLAGS ? { fts_backend_xapian_alloc, fts_backend_xapian_init, fts_backend_xapian_deinit, fts_backend_xapian_get_last_uid, fts_backend_xapian_update_init, fts_backend_xapian_update_deinit, fts_backend_xapian_update_set_mailbox, fts_backend_xapian_update_expunge, fts_backend_xapian_update_set_build_key, fts_backend_xapian_update_unset_build_key, fts_backend_xapian_update_build_more, fts_backend_xapian_refresh, fts_backend_xapian_rescan, fts_backend_xapian_optimize, fts_backend_default_can_lookup, fts_backend_xapian_lookup, fts_backend_xapian_lookup_multi, fts_backend_xapian_lookup_done } }; On 2019-01-04 20:33, Joan Moreau via dovecot wrote: Yes but: 1 - is there a documentation of the main object ? (fts_backend, mail_user, mailbox, etc..) 2 - What are the mandatory functions ? 3 - Search : Supposedly, the FTS shall have several parameters : the keyword(s), the user & mailbox, and the fields (to, from, body, etc..) to be includude in the search. What is the function called in the plugin ? 4 - Indexing : Somehow, what is the logic ? fts core just ask to "index me this email of this mailbox" ? or this is delegated to the plugin to sort out which emails it has indexed yet or not ? Thank you On 2019-01-04 18:49, admin wrote: A starting point would be to have a look at the current FTS plugins: https://github.com/dovecot/core/tree/master/src/plugins/fts-solr and https://github.com/dovecot/core/tree/master/src/plugins/fts-squat -M Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin On 2019-01-04 17:20, Aki Tuomi wrote: I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. Aki On 04 January 2019 at 08:20 Joan Moreau via dovecot wrote: What about consedering linking Dovecot with Xapian librairies instead of going to nightmare Solr ? https://xapian.org/features On 2019-01-02 17:10, John Tulp wrote: On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot returns errors (invalid SID, etc...) and Solr return "out of range indexes" errors I've been watching the progress of this thread with no small concern, mainly because I've been tasked with providing a server-side email search facility with a budget and manpower level that comes down to mainly *1*, i.e., me. I was expecting, given the strongly worded language about "just use lucene/SOLR" and "ignore squat", that I should invest time + effort into this JAVA nightmare that is SOLR. I started with squat and another word-indexor system that used out-of-band (not a dovecot plugin) software to provide rapid (sub-second) searches through tens-of-GB-scale mailboxes. Unlike what I was led to believe, the squat indexes worked surprisingly well, once you sorted out the odd resource size (ulimit-related) issues (vsz & friends) limitations. I did notice the "worst-case" search performance have worryingly high O(x) increases in time, but I'd not seen anything that was a dealbreaker. It goes without saying that various substring searches worked as expected, for the most part. My experiences with SOLR were similar to Messr. Moreau's: lots of startup errors with provided schemata files. Lots of JAVA nonsense issues. Lots of sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, version of SOLR, and dovecot to find the "best" working combination, only to find that the searches didn't work out as expected. I expected to be able to do date-ranging based searches. Didn't work. I expected to search CONTENTS of emails, and despite many days of tweaks, I couldn't get it to index even the basics like filenames/types of attachments, so I could exposed attachment-based searching to my users. So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the following functionality: 1) The ability to search for a string within any of the structured fields (from/subject) that returns correct results? 2) The ability to search for any string within the BODY of emails, including the MIME attachment boundaries? 3) The ability to do "ranging" searches for structures within emails that decompose to "dates" or other simple-numeric data? OPTIONALLY, and this is probably way outside of the scope of the above, despite the fact that it's listed as a "selling point" of SOLR versus other full text search engines: 4) The ability to do searches against any attachments that are able to be post-processed and hyper-indexed by SOLR+Tika? - SOLR seems
Re: Solr -> Xapian ?
Yes but: 1 - is there a documentation of the main object ? (fts_backend, mail_user, mailbox, etc..) 2 - What are the mandatory functions ? 3 - Search : Supposedly, the FTS shall have several parameters : the keyword(s), the user & mailbox, and the fields (to, from, body, etc..) to be includude in the search. What is the function called in the plugin ? 4 - Indexing : Somehow, what is the logic ? fts core just ask to "index me this email of this mailbox" ? or this is delegated to the plugin to sort out which emails it has indexed yet or not ? Thank you On 2019-01-04 18:49, admin wrote: A starting point would be to have a look at the current FTS plugins: https://github.com/dovecot/core/tree/master/src/plugins/fts-solr and https://github.com/dovecot/core/tree/master/src/plugins/fts-squat -M Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot: Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin On 2019-01-04 17:20, Aki Tuomi wrote: I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. Aki On 04 January 2019 at 08:20 Joan Moreau via dovecot wrote: What about consedering linking Dovecot with Xapian librairies instead of going to nightmare Solr ? https://xapian.org/features On 2019-01-02 17:10, John Tulp wrote: On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot returns errors (invalid SID, etc...) and Solr return "out of range indexes" errors I've been watching the progress of this thread with no small concern, mainly because I've been tasked with providing a server-side email search facility with a budget and manpower level that comes down to mainly *1*, i.e., me. I was expecting, given the strongly worded language about "just use lucene/SOLR" and "ignore squat", that I should invest time + effort into this JAVA nightmare that is SOLR. I started with squat and another word-indexor system that used out-of-band (not a dovecot plugin) software to provide rapid (sub-second) searches through tens-of-GB-scale mailboxes. Unlike what I was led to believe, the squat indexes worked surprisingly well, once you sorted out the odd resource size (ulimit-related) issues (vsz & friends) limitations. I did notice the "worst-case" search performance have worryingly high O(x) increases in time, but I'd not seen anything that was a dealbreaker. It goes without saying that various substring searches worked as expected, for the most part. My experiences with SOLR were similar to Messr. Moreau's: lots of startup errors with provided schemata files. Lots of JAVA nonsense issues. Lots of sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, version of SOLR, and dovecot to find the "best" working combination, only to find that the searches didn't work out as expected. I expected to be able to do date-ranging based searches. Didn't work. I expected to search CONTENTS of emails, and despite many days of tweaks, I couldn't get it to index even the basics like filenames/types of attachments, so I could exposed attachment-based searching to my users. So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the following functionality: 1) The ability to search for a string within any of the structured fields (from/subject) that returns correct results? 2) The ability to search for any string within the BODY of emails, including the MIME attachment boundaries? 3) The ability to do "ranging" searches for structures within emails that decompose to "dates" or other simple-numeric data? OPTIONALLY, and this is probably way outside of the scope of the above, despite the fact that it's listed as a "selling point" of SOLR versus other full text search engines: 4) The ability to do searches against any attachments that are able to be post-processed and hyper-indexed by SOLR+Tika? - SOLR seems to have "brand cachet", so presumably it actually works (for somebody). Dovecot has not a little "brand cachet", and for me, I have innate faith and trust in Timo and his software. I am no stranger to the "costs" of "free" software, in that you sacrifice your own blood, sweat, and tears just to get these disparate pieces to work together. I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. Maybe there's a super-secret (no advertised prices, "carrier-only" price list) with _Dovecot, Oy_ wherein the above ARE actually available for something less than 6.022 x 10^23 Euros per centi-second of licencing fees. But please, level with us faithful users. Does this morass of Java B.S. actually work, and if not, please just deprecate and remove this moribund software, and stop trying to bury the only
Re: Solr -> Xapian ?
A starting point would be to have a look at the current FTS plugins: https://github.com/dovecot/core/tree/master/src/plugins/fts-solrandhttps://github.com/dovecot/core/tree/master/src/plugins/fts-squat -M Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot: > Why not, but please guide me about the core structure (mandatory > funcitons, etc..) of a typical Dovecot FTS plugin > > > > > > > > On 2019-01-04 17:20, Aki Tuomi wrote: > > I hope you are aware that "linking with Xapian" requires somewhat > > more work than just -lxapian in linker? If you or someone feels > > like writing fts_xapian, go for it. > > > > Aki > > > > > > > On 04 January 2019 at 08:20 Joan Moreau via dovecot < > > > dovecot@dovecot.org> wrote: > > > > > > > > > What about consedering linking Dovecot with Xapian librairies > > > instead of > > > going to nightmare Solr ? > > > > > > https://xapian.org/features > > > > > > On 2019-01-02 17:10, John Tulp wrote: > > > > > > > > > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main > > > > problem is : After some time of indexing from Dovecot, Dovecot > > > > returns errors (invalid SID, etc...) and Solr return "out of > > > > range > > > > indexes" errors > > > > I've been watching the progress of this thread with no small > > > > concern, mainly > > > > because I've been tasked with providing a server-side email > > > > search facility > > > > with a budget and manpower level that comes down to mainly *1*, > > > > i.e., me. > > > > > > > > I was expecting, given the strongly worded language about "just > > > > use > > > > lucene/SOLR" and "ignore squat", that I should invest time + > > > > effort into this > > > > JAVA nightmare that is SOLR. > > > > > > > > I started with squat and another word-indexor system that used > > > > out-of-band > > > > (not a dovecot plugin) software to provide rapid (sub-second) > > > > searches through > > > > tens-of-GB-scale mailboxes. > > > > > > > > Unlike what I was led to believe, the squat indexes worked > > > > surprisingly well, > > > > once you sorted out the odd resource size (ulimit-related) > > > > issues (vsz & > > > > friends) limitations. I did notice the "worst-case" search > > > > performance have > > > > worryingly high O(x) increases in time, but I'd not seen > > > > anything that was a > > > > dealbreaker. It goes without saying that various substring > > > > searches worked as > > > > expected, for the most part. > > > > > > > > My experiences with SOLR were similar to Messr. Moreau's: lots > > > > of startup > > > > errors with provided schemata files. Lots of JAVA nonsense > > > > issues. Lots of > > > > sensitivity to WHICH Java runtime, etc, etc. I finally fixated > > > > a specific JVM, > > > > version of SOLR, and dovecot to find the "best" working > > > > combination, only to > > > > find that the searches didn't work out as expected. I expected > > > > to be able to > > > > do date-ranging based searches. Didn't work. I expected to > > > > search CONTENTS of > > > > emails, and despite many days of tweaks, I couldn't get it to > > > > index even the > > > > basics like filenames/types of attachments, so I could exposed > > > > attachment-based searching to my users. > > > > > > > > So, without rancour or antipathy, I ask the entire list: has > > > > ANYONE gotten a > > > > Dovecot/solr-fts-plugin setup to work that provides as a > > > > BASELINE, all of the > > > > following functionality: > > > > > > > > 1) The ability to search for a string within any of the > > > > structured fields > > > > (from/subject) that returns correct results? > > > > > > > > 2) The ability to search for any string within the BODY of > > > > emails, including > > > > the MIME attachment boundaries? > > > > > > > > 3) The ability to do "ranging" searches for structures within > > > > emails that > > > > decompose to "dates" or other simple-numeric data? > > > > > > > > OPTIONALLY, and this is probably way outside of the scope of > > > > the above, > > > > despite the fact that it's listed as a "selling point" of SOLR > > > > versus other > > > > full text search engines: > > > > > > > > 4) The ability to do searches against any attachments that are > > > > able to be > > > > post-processed and hyper-indexed by SOLR+Tika? > > > > > > > > - > > > > > > > > SOLR seems to have "brand cachet", so presumably it actually > > > > works (for somebody). > > > > > > > > Dovecot has not a little "brand cachet", and for me, I have > > > > innate faith and > > > > trust in Timo and his software. I am no stranger to the "costs" > > > > of "free" > > > > software, in that you sacrifice your own blood, sweat, and > > > > tears just to get > > > > these disparate pieces to work together. > > > > > > > > I *DO* respect that Timo has to keep the lights (and sauna) on > > > > in Finland. > > > > Maybe there's a super-secret (no advertised prices, "carrier- > > > > only" price list) > > > > wit
Re: Solr -> Xapian ?
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin On 2019-01-04 17:20, Aki Tuomi wrote: I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. Aki On 04 January 2019 at 08:20 Joan Moreau via dovecot wrote: What about consedering linking Dovecot with Xapian librairies instead of going to nightmare Solr ? https://xapian.org/features On 2019-01-02 17:10, John Tulp wrote: On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot returns errors (invalid SID, etc...) and Solr return "out of range indexes" errors I've been watching the progress of this thread with no small concern, mainly because I've been tasked with providing a server-side email search facility with a budget and manpower level that comes down to mainly *1*, i.e., me. I was expecting, given the strongly worded language about "just use lucene/SOLR" and "ignore squat", that I should invest time + effort into this JAVA nightmare that is SOLR. I started with squat and another word-indexor system that used out-of-band (not a dovecot plugin) software to provide rapid (sub-second) searches through tens-of-GB-scale mailboxes. Unlike what I was led to believe, the squat indexes worked surprisingly well, once you sorted out the odd resource size (ulimit-related) issues (vsz & friends) limitations. I did notice the "worst-case" search performance have worryingly high O(x) increases in time, but I'd not seen anything that was a dealbreaker. It goes without saying that various substring searches worked as expected, for the most part. My experiences with SOLR were similar to Messr. Moreau's: lots of startup errors with provided schemata files. Lots of JAVA nonsense issues. Lots of sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, version of SOLR, and dovecot to find the "best" working combination, only to find that the searches didn't work out as expected. I expected to be able to do date-ranging based searches. Didn't work. I expected to search CONTENTS of emails, and despite many days of tweaks, I couldn't get it to index even the basics like filenames/types of attachments, so I could exposed attachment-based searching to my users. So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the following functionality: 1) The ability to search for a string within any of the structured fields (from/subject) that returns correct results? 2) The ability to search for any string within the BODY of emails, including the MIME attachment boundaries? 3) The ability to do "ranging" searches for structures within emails that decompose to "dates" or other simple-numeric data? OPTIONALLY, and this is probably way outside of the scope of the above, despite the fact that it's listed as a "selling point" of SOLR versus other full text search engines: 4) The ability to do searches against any attachments that are able to be post-processed and hyper-indexed by SOLR+Tika? - SOLR seems to have "brand cachet", so presumably it actually works (for somebody). Dovecot has not a little "brand cachet", and for me, I have innate faith and trust in Timo and his software. I am no stranger to the "costs" of "free" software, in that you sacrifice your own blood, sweat, and tears just to get these disparate pieces to work together. I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. Maybe there's a super-secret (no advertised prices, "carrier-only" price list) with _Dovecot, Oy_ wherein the above ARE actually available for something less than 6.022 x 10^23 Euros per centi-second of licencing fees. But please, level with us faithful users. Does this morass of Java B.S. actually work, and if not, please just deprecate and remove this moribund software, and stop trying to bury the only FTS plugin many of us HAVE actually gotten to work. (Pretty please?) I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. to actually work, as I have. He persevered where I'd given up. He's vocal about it, and now I'm chiming in that this ornate collection of switchblades only cuts those who try to use them. Respectfully, =M= Fascinating... SOLR says the following are powered by SOLR... https://wiki.apache.org/solr/PublicServers Perhaps if you could find out from that list which of them are using SOLR in conjunction with Dovecot... food for thought...
Re: Solr -> Xapian ?
I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. Aki > On 04 January 2019 at 08:20 Joan Moreau via dovecot > wrote: > > > What about consedering linking Dovecot with Xapian librairies instead of > going to nightmare Solr ? > > https://xapian.org/features > > On 2019-01-02 17:10, John Tulp wrote: > > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : > > After some time of indexing from Dovecot, Dovecot > > returns errors (invalid SID, etc...) and Solr return "out of range > > indexes" errors > > I've been watching the progress of this thread with no small concern, mainly > > because I've been tasked with providing a server-side email search facility > > with a budget and manpower level that comes down to mainly *1*, i.e., me. > > > > I was expecting, given the strongly worded language about "just use > > lucene/SOLR" and "ignore squat", that I should invest time + effort into > > this > > JAVA nightmare that is SOLR. > > > > I started with squat and another word-indexor system that used out-of-band > > (not a dovecot plugin) software to provide rapid (sub-second) searches > > through > > tens-of-GB-scale mailboxes. > > > > Unlike what I was led to believe, the squat indexes worked surprisingly > > well, > > once you sorted out the odd resource size (ulimit-related) issues (vsz & > > friends) limitations. I did notice the "worst-case" search performance have > > worryingly high O(x) increases in time, but I'd not seen anything that was a > > dealbreaker. It goes without saying that various substring searches worked > > as > > expected, for the most part. > > > > My experiences with SOLR were similar to Messr. Moreau's: lots of startup > > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of > > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific > > JVM, > > version of SOLR, and dovecot to find the "best" working combination, only to > > find that the searches didn't work out as expected. I expected to be able to > > do date-ranging based searches. Didn't work. I expected to search CONTENTS > > of > > emails, and despite many days of tweaks, I couldn't get it to index even the > > basics like filenames/types of attachments, so I could exposed > > attachment-based searching to my users. > > > > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a > > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of > > the > > following functionality: > > > > 1) The ability to search for a string within any of the structured fields > > (from/subject) that returns correct results? > > > > 2) The ability to search for any string within the BODY of emails, including > > the MIME attachment boundaries? > > > > 3) The ability to do "ranging" searches for structures within emails that > > decompose to "dates" or other simple-numeric data? > > > > OPTIONALLY, and this is probably way outside of the scope of the above, > > despite the fact that it's listed as a "selling point" of SOLR versus other > > full text search engines: > > > > 4) The ability to do searches against any attachments that are able to be > > post-processed and hyper-indexed by SOLR+Tika? > > > > - > > > > SOLR seems to have "brand cachet", so presumably it actually works (for > > somebody). > > > > Dovecot has not a little "brand cachet", and for me, I have innate faith and > > trust in Timo and his software. I am no stranger to the "costs" of "free" > > software, in that you sacrifice your own blood, sweat, and tears just to get > > these disparate pieces to work together. > > > > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. > > Maybe there's a super-secret (no advertised prices, "carrier-only" price > > list) > > with _Dovecot, Oy_ wherein the above ARE actually available for something > > less > > than 6.022 x 10^23 Euros per centi-second of licencing fees. > > > > But please, level with us faithful users. Does this morass of Java B.S. > > actually work, and if not, please just deprecate and remove this moribund > > software, and stop trying to bury the only FTS plugin many of us HAVE > > actually > > gotten to work. (Pretty please?) > > > > I respect that Messr. Moreau has made an earnest effort to get this JAVA > > B.S. > > to actually work, as I have. > > > > He persevered where I'd given up. He's vocal about it, and now I'm chiming > > in > > that this ornate collection of switchblades only cuts those who try to use > > them. > > > > Respectfully, > > =M= > Fascinating... > > SOLR says the following are powered by SOLR... > > https://wiki.apache.org/solr/PublicServers > > Perhaps if you could find out from that list which of them are using > SOLR in conjunction with Dovecot... > > food for thought...
Solr -> Xapian ?
What about consedering linking Dovecot with Xapian librairies instead of going to nightmare Solr ? https://xapian.org/features On 2019-01-02 17:10, John Tulp wrote: On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : After some time of indexing from Dovecot, Dovecot returns errors (invalid SID, etc...) and Solr return "out of range indexes" errors I've been watching the progress of this thread with no small concern, mainly because I've been tasked with providing a server-side email search facility with a budget and manpower level that comes down to mainly *1*, i.e., me. I was expecting, given the strongly worded language about "just use lucene/SOLR" and "ignore squat", that I should invest time + effort into this JAVA nightmare that is SOLR. I started with squat and another word-indexor system that used out-of-band (not a dovecot plugin) software to provide rapid (sub-second) searches through tens-of-GB-scale mailboxes. Unlike what I was led to believe, the squat indexes worked surprisingly well, once you sorted out the odd resource size (ulimit-related) issues (vsz & friends) limitations. I did notice the "worst-case" search performance have worryingly high O(x) increases in time, but I'd not seen anything that was a dealbreaker. It goes without saying that various substring searches worked as expected, for the most part. My experiences with SOLR were similar to Messr. Moreau's: lots of startup errors with provided schemata files. Lots of JAVA nonsense issues. Lots of sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM, version of SOLR, and dovecot to find the "best" working combination, only to find that the searches didn't work out as expected. I expected to be able to do date-ranging based searches. Didn't work. I expected to search CONTENTS of emails, and despite many days of tweaks, I couldn't get it to index even the basics like filenames/types of attachments, so I could exposed attachment-based searching to my users. So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the following functionality: 1) The ability to search for a string within any of the structured fields (from/subject) that returns correct results? 2) The ability to search for any string within the BODY of emails, including the MIME attachment boundaries? 3) The ability to do "ranging" searches for structures within emails that decompose to "dates" or other simple-numeric data? OPTIONALLY, and this is probably way outside of the scope of the above, despite the fact that it's listed as a "selling point" of SOLR versus other full text search engines: 4) The ability to do searches against any attachments that are able to be post-processed and hyper-indexed by SOLR+Tika? - SOLR seems to have "brand cachet", so presumably it actually works (for somebody). Dovecot has not a little "brand cachet", and for me, I have innate faith and trust in Timo and his software. I am no stranger to the "costs" of "free" software, in that you sacrifice your own blood, sweat, and tears just to get these disparate pieces to work together. I *DO* respect that Timo has to keep the lights (and sauna) on in Finland. Maybe there's a super-secret (no advertised prices, "carrier-only" price list) with _Dovecot, Oy_ wherein the above ARE actually available for something less than 6.022 x 10^23 Euros per centi-second of licencing fees. But please, level with us faithful users. Does this morass of Java B.S. actually work, and if not, please just deprecate and remove this moribund software, and stop trying to bury the only FTS plugin many of us HAVE actually gotten to work. (Pretty please?) I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S. to actually work, as I have. He persevered where I'd given up. He's vocal about it, and now I'm chiming in that this ornate collection of switchblades only cuts those who try to use them. Respectfully, =M= Fascinating... SOLR says the following are powered by SOLR... https://wiki.apache.org/solr/PublicServers Perhaps if you could find out from that list which of them are using SOLR in conjunction with Dovecot... food for thought...