Re: [Dovecot] Solr Plugin make only INBOX able to search
On Mon, Sep 14, 2009 at 2:54 AM, Timo Sirainen wrote: > > BTW. Note that Solr schema file was changed a bit to use less disk > space. But that shouldn't have caused this kind of a bug. The problem is caused by my custom schema. My changes are breaking the search someway. Thank you Timo. >
Re: [Dovecot] Solr Plugin make only INBOX able to search
On Mon, Sep 14, 2009 at 2:54 AM, Timo Sirainen wrote: > Hmm. Seems to work here: > > x create "Test Folder" > x OK Create completed. > x copy 1:* "Test Folder" > x OK [COPYUID 1251759136 20163:20174 1:12] Copy completed. > x search text multiple > * SEARCH 1 > x OK Search completed (0.000 secs). > > Are the messages added to the database? Can you find them with Solr's > own query tool? No. > What Solr version are you using? I'm testing with > v1.3.0. > I am using 1.4-dev. Maybe it was not a smart choice. > BTW. Note that Solr schema file was changed a bit to use less disk > space. But that shouldn't have caused this kind of a bug. > Ok, I will test solr v1.3.0 and the new schema. I will give news soon. Thank you!
Re: [Dovecot] FTS Plugin design
Hi again! After sometime using my changes on this plugin I found one major problem. When a message have two attachments with same name or one with content-type equal to "message/*", my solr schema design does not work because as attachment unique identifier I used attachment's name what is not correct. Now I am trying to find a way to know the mime part id of the parts used on fts_build_mail. Is that already possible or I need to do that by my own? Thank you in advance, Rui Carneiro
Re: [Dovecot] Solr Plugin make only INBOX able to search
Hi again, Anyone could help me?
[Dovecot] Solr Plugin make only INBOX able to search
Hi all, I have downloaded and installed dovecot 1.2.4 with Solr plugin. After that, when I try to do some searches the results are weird. All messages that aren't on INBOX folder are ignored. For example, if I have a folder named "Test Folder", when I try to do ". search text " the result is always empty. I tried TEXT, SUBJECT and BODY arguments and none of them returned any email. Any idea of the problem's source? Regards, Rui Carneiro
Re: [Dovecot] FTS Plugin design
Citando Timo Sirainen : > > At least for now. Memory leaks don't cause crashes. Ok. > gdb -p `pidof imap` > cont > > bt full I think it won't be necessary. It is not crashing anymore. Maybe it was a bug in my code. Tomorrow (or in the next day) I will send you the code. Thank you for all the support! Regards, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Citando Timo Sirainen : > So valgrind didn't find anything wrong. We should ignore LEAK SUMMARY? > What does gdb show as the backtrace? My gdb is not writing where he should (or not writing at all). This shouldn't be enough? mail_executable = /usr/local/libexec/dovecot/gdbhelper /usr/local/libexec/dovecot/imap The crash occurs after indexing all stuff and when imap is returning the result. Thank you, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Citando Timo Sirainen : > I guess it works around some other bug then. If it's a memory-related > bug you could also see if valgrind complains something: > > protocol imap { > .. > mail_executable = /usr/bin/valgrind /usr/local/libexec/dovecot/imap > } Here is the output (I cloned the http://hg.dovecot.org/dovecot-1.2 and made no changes to this test): ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 123 from 2) malloc/free: in use at exit: 94,040 bytes in 1,032 blocks. malloc/free: 1,704 allocs, 672 frees, 1,042,476 bytes allocated. For counts of detected errors, rerun with: -v searching for pointers to 1,032 not-freed blocks. checked 111,072 bytes. 88,161 (328 direct, 87,833 indirect) bytes in 1 blocks are definitely lost in loss record 30 of 45 at 0x4C24384: calloc (vg_replace_malloc.c:397) by 0x4AF165: pool_system_malloc (mempool-system.c:77) by 0x63E0DA2: ??? by 0x63DF91D: ??? by 0x5DBAF27: ??? by 0x5DBBE50: ??? by 0x46BBFF: mailbox_transaction_begin (mail-storage.c:794) by 0x42976F: imap_search_start (imap-search.c:540) by 0x4206D7: cmd_search (cmd-search.c:50) by 0x4232CB: client_command_input (client.c:608) by 0x423389: client_command_input (client.c:657) by 0x4239F4: client_handle_input (client.c:698) LEAK SUMMARY: definitely lost: 328 bytes in 1 blocks. indirectly lost: 87,833 bytes in 1,016 blocks. possibly lost: 0 bytes in 0 blocks. still reachable: 5,879 bytes in 15 blocks. suppressed: 0 bytes in 0 blocks. Reachable blocks (those to which a pointer was found) are not shown. To see them, rerun with: --leak-check=full --show-reachable=yes
Re: [Dovecot] FTS Plugin design
Citando Timo Sirainen : > The problem is something else. The Solr code simply tries to keep the > send buffer smaller than that, nothing would break if you sent a larger > buffer. Show gdb backtrace of the crash? > I said it was from the buff size because when I increased it Dovecot didn't crash. It's Friday and I will not be able to do the gdb backtrace on weekend but it will be the first thing I will do Monday morning. Regards, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Hi Timo, I almost finish the changes on fts plugin. By now, it seems to work fine with attachments (extracting and sending them to Solr). I only have a problem with the max size of the command (cmd) that we can send to Solr: #define SOLR_CMDBUF_SIZE (1024*64) By now, if we send some message bigger than this value the fts-plugin crash. There is anything in your TODO-List that solves this problem? Regards, Rui Carneiro PS: asap I will send you my code for your approval :) -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Now, with attachment. /* Copyright (c) 2006-2009 Dovecot authors, see the included COPYING file */ #include "lib.h" #include "buffer.h" #include "base64.h" #include "str.h" #include "unichar.h" #include "charset-utf8.h" #include "quoted-printable.h" #include "rfc822-parser.h" #include "rfc2231-parser.h" #include "message-parser.h" #include "message-header-decode.h" #include "message-decoder.h" enum content_type { CONTENT_TYPE_UNKNOWN = 0, CONTENT_TYPE_BINARY, CONTENT_TYPE_QP, CONTENT_TYPE_BASE64 }; /* base64 takes max 4 bytes per character, q-p takes max 3. */ #define MAX_ENCODING_BUF_SIZE 3 /* UTF-8 takes max 5 bytes per character. Not sure about others, but I'd think 10 is more than enough for everyone.. */ #define MAX_TRANSLATION_BUF_SIZE 10 struct message_decoder_context { enum message_decoder_flags flags; struct message_part *prev_part; struct message_header_line hdr; buffer_t *buf, *buf2; char *charset_trans_charset; struct charset_translation *charset_trans; char translation_buf[MAX_TRANSLATION_BUF_SIZE]; unsigned int translation_size; char encoding_buf[MAX_ENCODING_BUF_SIZE]; unsigned int encoding_size; char *content_charset; enum content_type content_type; unsigned int charset_utf8:1; unsigned int binary_input:1; }; struct message_decoder_context * message_decoder_init(enum message_decoder_flags flags) { struct message_decoder_context *ctx; ctx = i_new(struct message_decoder_context, 1); ctx->flags = flags; ctx->buf = buffer_create_dynamic(default_pool, 8192); ctx->buf2 = buffer_create_dynamic(default_pool, 8192); return ctx; } void message_decoder_deinit(struct message_decoder_context **_ctx) { struct message_decoder_context *ctx = *_ctx; *_ctx = NULL; if (ctx->charset_trans != NULL) charset_to_utf8_end(&ctx->charset_trans); buffer_free(&ctx->buf); buffer_free(&ctx->buf2); i_free(ctx->charset_trans_charset); i_free(ctx->content_charset); i_free(ctx); } static void parse_content_transfer_encoding(struct message_decoder_context *ctx, struct message_header_line *hdr) { struct rfc822_parser_context parser; string_t *value; value = t_str_new(64); rfc822_parser_init(&parser, hdr->full_value, hdr->full_value_len, NULL); (void)rfc822_skip_lwsp(&parser); (void)rfc822_parse_mime_token(&parser, value); ctx->content_type = CONTENT_TYPE_UNKNOWN; switch (str_len(value)) { case 4: if (i_memcasecmp(str_data(value), "7bit", 4) == 0 || i_memcasecmp(str_data(value), "8bit", 4) == 0) ctx->content_type = CONTENT_TYPE_BINARY; break; case 6: if (i_memcasecmp(str_data(value), "base64", 6) == 0) ctx->content_type = CONTENT_TYPE_BASE64; else if (i_memcasecmp(str_data(value), "binary", 6) == 0) ctx->content_type = CONTENT_TYPE_BINARY; break; case 16: if (i_memcasecmp(str_data(value), "quoted-printable", 16) == 0) ctx->content_type = CONTENT_TYPE_QP; break; } } static void parse_content_type(struct message_decoder_context *ctx, struct message_header_line *hdr) { struct rfc822_parser_context parser; const char *const *results; string_t *str; if (ctx->content_charset != NULL) return; rfc822_parser_init(&parser, hdr->full_value, hdr->full_value_len, NULL); (void)rfc822_skip_lwsp(&parser); str = t_str_new(64); if (rfc822_parse_content_type(&parser, str) <= 0) return; (void)rfc2231_parse(&parser, &results); for (; *results != NULL; results += 2) { if (strcasecmp(results[0], "charset") == 0) { ctx->content_charset = i_strdup(results[1]); ctx->charset_utf8 = charset_is_utf8(results[1]); break; } } } static bool message_decode_header(struct message_decoder_context *ctx, struct message_header_line *hdr, struct message_block *output) { bool dtcase = (ctx->flags & MESSAGE_DECODER_FLAG_DTCASE) != 0; size_t value_len; if (hdr->continues) { hdr->use_full_value = TRUE; return FALSE; } T_BEGIN { if (hdr->name_len == 12 && strcasecmp(hdr->name, "Content-Type") == 0) parse_content_type(ctx, hdr); if (hdr->name_len == 25 && strcasecmp(hdr->name, "Content-Transfer-Encoding") == 0) parse_content_transfer_encoding(ctx, hdr); } T_END; buffer_set_used_size(ctx->buf, 0); message_header_decode_utf8(hdr->full_value, hdr->full_value_len, ctx->buf, dtcase); value_len = ctx->buf->used; if (dtcase) { (void)uni_utf8_to_decomposed_titlecase(hdr->name, hdr->name_len, ctx->buf); buffer_append_c(ctx->buf, '\0'); } ctx->hdr = *hdr; ctx->hdr.full_value = ctx->buf->data; ctx->hdr.full_value_len = value_len; ctx->hdr.value_len = 0; if (dtcase) { ctx->hdr.name = CONST_PTR_OFFSET(ctx->buf->data, ctx->hdr.full_value_len); ctx->hdr.name_len = ctx->buf->used - 1 - value_len; } output->hdr = &ctx->hdr; return TRUE; } static void translation_buf_decode(struct message_decoder_context *ctx, const unsigned char **data, size_t *size) { unsigned char trans_buf[MAX_TRANSLATION_BUF_SIZE+1]; unsigned int data_wanted, skip; size_t
Re: [Dovecot] FTS Plugin design
On Tue, May 19, 2009 at 8:51 PM, Timo Sirainen wrote: > You forgot the attachment. > Oh Sorry, I am not at the office now (almost 10pm here) I will send it tomorrow morning. Rui Carneiro --- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Citando Timo Sirainen : > All the data comes from lib-mail/message-decoder.c. Hmm. Looks like it > tries to force giving only valid UTF-8 output. I guess it should have > some flag or something that makes it do that only for text/* parts, not > for binary parts. OK, implemented, see if it works with this and using > the flag: > > http://hg.dovecot.org/dovecot-1.2/rev/44548a7fb10d > It is working now but I needed to do some changes on your code. When you check charset_utf8 and charset_trans you have a problem on attachments case. Attachments part do not have any charset defined on headers so, by default, charset_utf8=1 and charset_trans is garbage (I have no idea where that garbage came from). To avoid this problem swap the some lines of code that set ctx->binary_input to the function's beginning. Please see the attachment to checked any problem that may exist. Thank you, Rui Carneiro --- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Citando Timo Sirainen : > Nope. If you still see corruption, try with some simple test mails and > see if it's adding garbage, losing contents or adding more content. I tried something more advanced than that. I hexdumped my pdf test file and on the first line I get: 25 50 44 46 2d 31 2e 33 0a 25 e2 e3 cf d3 0a 31 Where "e2 e3 cf d3" is binary data. When I do the same for my copied file I get: 25 50 44 46 2d 31 2e 33 0a 25 ef bf bd 0a 31 20 It is weird but the binary data changed. Further, I print to logs the 11 character from the first block.data just before fts_backend_build_more() and the value is EF (the correct one would be E2). I think binary data is being corrupted anywhere before fts_backend_build_more() and I don't have any idea where. Any help would be appreciated. Thank you, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Hi again, I am having some troubles sending all data to a file. When I finish to send all data to a file, I tried to open it and the file is corrupted. The first think I noticed is that all chars are capitalized what destroy all the file format. Where are the chars capitalized? Any other idea why files are getting corrupted? Thank you, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Citando Timo Sirainen : > 1. You notice a non-text/* content-type and initialize text extraction > for the MIME part. Like: > > struct attachment_extract_context * > attachment_extract_init(const char *content_type); > > 2. After this you feed all the input belonging to that MIME part to: > > int attachment_extract_add(struct attachment_extract_context *ctx, > const struct message_block *input); > > Don't output anything to FTS backend at this point. The > attachment_extract_add() would probably just basically write to a > temporary file. > > 3. Finally you'll notice that the MIME part ends (either you get headers > for the next MIME part or the entire message ends). Then finish the > extraction, which actually executes the whatever conversion binaries: > > int attachment_extract_finish(struct attachment_extract_context *ctx); > > 4. Get the resulting text to fts_backend_build_more() somehow. Either > some attachment_extract_add_to_fts() which internally adds it or some > kind of an iterator that returns the text in smaller blocks. Either > would work.. > > That kind of an API would also make it possible to pretty easily modify > in future to not write temporary files for specific content types if > it's not required. > I tried your approach and I think it is working pretty well. Now I only need to look carefully to the output of external programs and build the XML correctly to send to Solr. Thanks Timo Regards, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
Hi again, On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen wrote: > - fts_build_mail() indexes a single mail. It parses the messages and > returns the data in small blocks. For text/* and message/rfc822 parts > those blocks are currently sent to FTS backend. This is where I think > you should look into hooking your attachment parsing. Change > fts_build_want_index_part() to look for more content-types that you're > interested in and then before feeding the blocks to FTS backend put them > through your own converter function, something like: > > int attachment_extract_text(struct attachment_extract_context *ctx, > const struct message_block *input, struct message_block *output); Let's take the example of an application-pdf content-type. Before I converter all pdf data to text I need to gather all data before. The actual process is feeding FTS backend with small parts of data and appending them on "build_more" functions (e.g. fts_backend_solr_build_more()). So where should I call attachment_extract_text()? In fts_backend_solr_build_more() and not making append to cmd until data is extracted? Or gather all information before (e.g. fts_build_mail()) and send all in once to FTS backend? I hope I've made myself clear. Regards, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] fts-solr plugin issue (Marked invalid)
Citando Nikolai Derzhak : > But in sum: when dovecot try to index some mail's, that solr tokenizer not > eat (error 500, Marked invalid), > dovecot stop indexing of box and retry attempts in each next search with same > result. I think u might have a look on solr-connection.c and look to the functions that communicate with Solr and change the error handling. I think this "miss feature" will be useful to me soon. Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net PS: Sorry but my current knowledge on this is not very much at the moment.
Re: [Dovecot] fts-solr plugin issue (Marked invalid)
I do not have sure if I understood your problem correctly. Are you trying to index attachments from messages? Or Dovecot is indexing some "bad" parts and you just do not know why? Regards, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net Citando Nikolai Derzhak : > OK. Concentrating problem in one question. > How to ignore "bad" message and index next one in indexing procedure (fts > plugin) ?. > Now, one "error 500" from solr and dovecot (# 1.1.11: > /etc/dovecot/dovecot.conf > # OS: Linux 2.6.21.7-2.fc8xen i686 Ubuntu 8.04.2 ext3 > ) stop and each next search query repeat the story. > I've explored fts, and ftp-solr directories in src, without success for now. > Timo, you understand code much bettter, can you help me and point to place in > code, > or probably create some patch, if possible ?.
Re: [Dovecot] FTS Plugin design
On Thu, Apr 23, 2009 at 5:47 AM, wrote: Note that some formats might require to seek to some point in the file [1] (typically the end), so reading from stdin is awkward (it would require stdin to be seekable, so either the app or the caller would have to put the whole file somewhere anyway). [1] Notably PDF has some index tables at EOF - 1k if I remember correctly. I hadn't thought on that before but I think you are right. The only question here is writing data to memory or hd. Thank you all, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net
Re: [Dovecot] FTS Plugin design
On Wed, Apr 22, 2009 at 5:38 PM, Timo Sirainen wrote: > Maybe those programs could be changed and just require the newer > versions?.. I will talk with the developers of those applications about the possibility of supporting stdin input (if not supported yet). I think the API that fts plugin uses to do the conversion should be > generic enough that both approaches would work. Then it would be easier > to implement one or another or both eventually. I think I will try the external applications approach. My developing time available is not to much. I will develop the API as much as generic I can for possible improvements in the future. Regards, Rui Carneiro
Re: [Dovecot] FTS Plugin design
Hi, Almost full text search engines (C/C++) I looked (Swish-E, Wumpus, Lemur and Xapian) do not use any kind of library or parser. Instead, they use other applications like pdftotext, catdoc, catppt (etc) and call them with execvp (or equivalent). Using this approach on my project have some pros and cons: Pros: - The existing libraries to extract the content of pdf, doc (etc) are not very stable. - Easier to handle errors (even if those applications crash dovecot will be still running) - Less developing time Cons: - Some programs to parse special formats (p.e. catppt and pdftotext) do not accept input from stdin (we need to create temporary files). What approach would be better? Using applications like pdftotext and catdoc or, on the other hand, use their libraries and do it almost from scratch? Regards Rui Carneiro On Tue, Apr 21, 2009 at 5:52 PM, Rui Carneiro wrote: > Great idea! > > I will give news soon. > > > On Tue, Apr 21, 2009 at 5:32 PM, Timo Sirainen wrote: > >> I've no idea, but you could at least look at some of the other full text >> search engines. I remember them advertising indexing support for all kinds >> of formats. Maybe they're using some specific library or maybe it would be >> easy to extract their parsing code. >> > -- mobile: +351 963446125 mail: rui@gmail.com mail: ei04...@fe.up.pt website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>
Re: [Dovecot] FTS Plugin design
Great idea! I will give news soon. On Tue, Apr 21, 2009 at 5:32 PM, Timo Sirainen wrote: > I've no idea, but you could at least look at some of the other full text > search engines. I remember them advertising indexing support for all kinds > of formats. Maybe they're using some specific library or maybe it would be > easy to extract their parsing code. >
Re: [Dovecot] FTS Plugin design
Hi again, Anyone know some good libraries to handle the content of files like pdf, ppt, doc, etc? I am already indexing attachments all I need now is extract the text of them. Regards, Rui Carneiro On Mon, Apr 20, 2009 at 3:29 PM, Rui Carneiro wrote: > Hi, > > The problem was on the flag. My hexa to binary conversions was wrong. > > Regards, > Rui Carneiro > > > > On Fri, Apr 17, 2009 at 10:03 AM, Rui Carneiro wrote: > >> Thank you for all tips. The design look more clear to me now. >> >> I have one more question. I looked into fts_build_want_index_part() and I >> saw that I need to add some flags to message_part_flags, what values should >> I choose? My first approach was to follow your schema and set >> MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this? >> >> I already had changed parse_content_type() to set ctx->part->flags >> correctly but if i choose my custom flag dovecot assume that all attachment >> lines are headers. I already tried to set those ctx->part->flags as TEXT and >> the fts_backend was feeded correctly with all attachment lines. >> >> I don't know if this is related with the value of >> MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting >> block.hdr = NULL or some more code to handle new flags). >> >> Thank you, >> Rui Carneiro >> >> >> On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen wrote: >> >>> On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote: >>> > I didn't understood yet what is the plugin's design and how the plugins >>> are >>> > called from the core system and I was wondering if anyone could help me >>> with >>> > that. >>> >>> fts-storage.c hooks into all the functions in mail-storage API that it >>> needs to. Currently indexing isn't done while messages are being saved, >>> but instead just before searching. The searching functions are: >>> >>> - fts_mailbox_search_init() tries to figure out if FTS can optimize the >>> search. If it does, it tries to figure out if FTS index is up-to-date >>> and if not, starts the search. >>> >>> - fts_mailbox_search_next_nonblock() continues the indexing (or >>> searching after indexing) for a while. The idea is that IMAP connection >>> is able to process other commands while doing a long-running search. So >>> fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It >>> would be nice if that value was dynamically calculated and also based on >>> bytes instead of messages, but that's maybe too much trouble. >>> >>> - fts_mailbox_search_next_update_seq() uses the fts search results and >>> updates mail-storage's search stuff so that it doesn't go through >>> messages that don't match. >>> >>> - fts_build_mail() indexes a single mail. It parses the messages and >>> returns the data in small blocks. For text/* and message/rfc822 parts >>> those blocks are currently sent to FTS backend. This is where I think >>> you should look into hooking your attachment parsing. Change >>> fts_build_want_index_part() to look for more content-types that you're >>> interested in and then before feeding the blocks to FTS backend put them >>> through your own converter function, something like: >>> >>> int attachment_extract_text(struct attachment_extract_context *ctx, >>> const struct message_block *input, struct message_block *output); >>> >>> >>> >> >> >> -- >> mobile: +351 963446125 >> mail: rui@gmail.com >> mail: ei04...@fe.up.pt >> website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073> >> > > > > -- > mobile: +351 963446125 > mail: rui@gmail.com > mail: ei04...@fe.up.pt > website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073> > -- mobile: +351 963446125 mail: rui@gmail.com mail: ei04...@fe.up.pt website: http://paginas.fe.up.pt/~ei04073
Re: [Dovecot] FTS Plugin design
Hi, The problem was on the flag. My hexa to binary conversions was wrong. Regards, Rui Carneiro On Fri, Apr 17, 2009 at 10:03 AM, Rui Carneiro wrote: > Thank you for all tips. The design look more clear to me now. > > I have one more question. I looked into fts_build_want_index_part() and I > saw that I need to add some flags to message_part_flags, what values should > I choose? My first approach was to follow your schema and set > MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this? > > I already had changed parse_content_type() to set ctx->part->flags > correctly but if i choose my custom flag dovecot assume that all attachment > lines are headers. I already tried to set those ctx->part->flags as TEXT and > the fts_backend was feeded correctly with all attachment lines. > > I don't know if this is related with the value of > MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting > block.hdr = NULL or some more code to handle new flags). > > Thank you, > Rui Carneiro > > > On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen wrote: > >> On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote: >> > I didn't understood yet what is the plugin's design and how the plugins >> are >> > called from the core system and I was wondering if anyone could help me >> with >> > that. >> >> fts-storage.c hooks into all the functions in mail-storage API that it >> needs to. Currently indexing isn't done while messages are being saved, >> but instead just before searching. The searching functions are: >> >> - fts_mailbox_search_init() tries to figure out if FTS can optimize the >> search. If it does, it tries to figure out if FTS index is up-to-date >> and if not, starts the search. >> >> - fts_mailbox_search_next_nonblock() continues the indexing (or >> searching after indexing) for a while. The idea is that IMAP connection >> is able to process other commands while doing a long-running search. So >> fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It >> would be nice if that value was dynamically calculated and also based on >> bytes instead of messages, but that's maybe too much trouble. >> >> - fts_mailbox_search_next_update_seq() uses the fts search results and >> updates mail-storage's search stuff so that it doesn't go through >> messages that don't match. >> >> - fts_build_mail() indexes a single mail. It parses the messages and >> returns the data in small blocks. For text/* and message/rfc822 parts >> those blocks are currently sent to FTS backend. This is where I think >> you should look into hooking your attachment parsing. Change >> fts_build_want_index_part() to look for more content-types that you're >> interested in and then before feeding the blocks to FTS backend put them >> through your own converter function, something like: >> >> int attachment_extract_text(struct attachment_extract_context *ctx, >> const struct message_block *input, struct message_block *output); >> >> >> > > > -- > mobile: +351 963446125 > mail: rui@gmail.com > mail: ei04...@fe.up.pt > website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073> > -- mobile: +351 963446125 mail: rui@gmail.com mail: ei04...@fe.up.pt website: http://paginas.fe.up.pt/~ei04073
Re: [Dovecot] FTS Plugin design
Thank you for all tips. The design look more clear to me now. I have one more question. I looked into fts_build_want_index_part() and I saw that I need to add some flags to message_part_flags, what values should I choose? My first approach was to follow your schema and set MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this? I already had changed parse_content_type() to set ctx->part->flags correctly but if i choose my custom flag dovecot assume that all attachment lines are headers. I already tried to set those ctx->part->flags as TEXT and the fts_backend was feeded correctly with all attachment lines. I don't know if this is related with the value of MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting block.hdr = NULL or some more code to handle new flags). Thank you, Rui Carneiro On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen wrote: > On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote: > > I didn't understood yet what is the plugin's design and how the plugins > are > > called from the core system and I was wondering if anyone could help me > with > > that. > > fts-storage.c hooks into all the functions in mail-storage API that it > needs to. Currently indexing isn't done while messages are being saved, > but instead just before searching. The searching functions are: > > - fts_mailbox_search_init() tries to figure out if FTS can optimize the > search. If it does, it tries to figure out if FTS index is up-to-date > and if not, starts the search. > > - fts_mailbox_search_next_nonblock() continues the indexing (or > searching after indexing) for a while. The idea is that IMAP connection > is able to process other commands while doing a long-running search. So > fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It > would be nice if that value was dynamically calculated and also based on > bytes instead of messages, but that's maybe too much trouble. > > - fts_mailbox_search_next_update_seq() uses the fts search results and > updates mail-storage's search stuff so that it doesn't go through > messages that don't match. > > - fts_build_mail() indexes a single mail. It parses the messages and > returns the data in small blocks. For text/* and message/rfc822 parts > those blocks are currently sent to FTS backend. This is where I think > you should look into hooking your attachment parsing. Change > fts_build_want_index_part() to look for more content-types that you're > interested in and then before feeding the blocks to FTS backend put them > through your own converter function, something like: > > int attachment_extract_text(struct attachment_extract_context *ctx, > const struct message_block *input, struct message_block *output); > > > -- mobile: +351 963446125 mail: rui@gmail.com mail: ei04...@fe.up.pt website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>
[Dovecot] FTS Plugin design
Hi all, Currently I am developing some changes on the solr plugin. I want this plugin indexing also the attachment's content. I have already started to look on plugin's source but I am having some problems understanding how it works. I didn't understood yet what is the plugin's design and how the plugins are called from the core system and I was wondering if anyone could help me with that. Sorry if this doubts sound stupid but I am newcomer on Dovecot. Regards, Rui Carneiro
Re: [Dovecot] auth-master: Permission denied [sigh]
Hi, I was having problems with permissions on auth-master too. I solve them creating manually the folder /var/run/dovecot with correct permissions but i see you already did that :\ On Sun, Apr 12, 2009 at 5:27 PM, James Butler wrote: > I've been messing with this for too long, now, and I'm blind to whatever's > wrong. Or I'm simply being dense. Either way, I need help with a common > issue. > > I'm trying to get Postfix+Spamassassin+Dovecot going on Fedora 10. (I'll > get back to the global Sieve thingy soon, but I need to get this going, > first.) > > When using the simple: > mailbox_command = /usr/local/libexec/dovecot/deliver > everything is cool, except there's no Spamassassin involvement, obviously. > > The problem shows itself when the Spamassassin user hands off to the > recipient user and Deliver + the recipient user tries to access > /var/run/dovecot/auth-master. > > Thank you for any insight you can provide. > > /var/run/dovecot: 755 root:dovecot > /var/run/dovecot/login: 750 root:dovecot > /var/run/dovecot/auth-master: 750 root:dovecot > (I think. auth-master is a temporary file? Comes and goes.) > > >From /etc/postfix/main.cf > > mailbox_transport = spamassassin > > >From /etc/postfix/master.cf: > > spamassassin unix - n n - - pipe > user=spam argv=/usr/bin/spamc -f -e /usr/libexec/dovecot/deliver > -f ${sender} -d ${user} -m ${extension} > > Here's my 'socket listen' section from /usr/local/etc/dovecot.conf: > > socket listen { > master { > path = /var/run/dovecot/auth-master > mode = 0666 > #user = > group = dovecot > } > client { > path = /var/run/dovecot/auth-client > mode = 0666 > #user = > group = dovecot > } > } > > >From /var/log/maillog: > > Postfix receives the message: > > postfix/smtpd[29447]: connect from \ > IP-ADD-RE-SS.ptr.example-send.com[IP.ADD.RE.SS] > postfix/smtpd[29447]: 60990FA01BA: \ > client=IP-ADD-RE-SS.ptr.example-send.com[IP.ADD.RE.SS] > postfix/cleanup[29451]: 60990FA01BA: \ > message-id=<49e20bf2.4090...@example-send.com> > postfix/qmgr[29441]: 60990FA01BA: from=, \ > size=812, nrcpt=1 (queue active) > postfix/smtpd[29447]: disconnect from \ > IP-ADD-RE-SS.ptr.example-send.com[IP.ADD.RE.SS] > > Spamassassin processes the message as user 'spam': > > spamd[4121]: spamd: processing message\ > <49e20bf2.4090...@example-send.com> for spam:653 > spamd[4121]: spamd: clean message (3.0/5.0) for spam:653 in 5.2 seconds,\ > 793 bytes. > spamd[4121]: spamd: result: . 2 - RDNS_DYNAMIC,TVD_SPACE_RATIO \ > scantime=5.2,size=793,user=spam,uid=653,required_score=5.0, \ > rhost=localhost.localdomain,raddr=127.0.0.1,rport=42493, \ > mid=<49e20bf2.4090...@example-send.com>,autolearn=no > > Spamassassin pipes result to Deliver which runs as recipient user. > > Deliver as recipient user doesn't have permission to auth: > > deliver(recipient): Can't connect to auth server at \ > /var/run/dovecot/auth-master: Permission denied > postfix/pipe[29452]: 60990FA01BA: to=, \ > relay=spamassassin, delay=6, delays=0.33/0.01/0/5.7, dsn=4.3.0, \ > status=deferred (temporary failure) > > 1) I must use the 'user=' arg for spamc > 2) Can't use 'user=${user}' or $user: > fatal: get_service_attr: unknown username: ${user} > 3) Must use '-d ${user}' Deliver arg, otherwise > message gets delivered to user 'spam' > > AArrgh! TIA. > > -- telemóvel: 963446125 mail: rui@gmail.com mail: ei04...@fe.up.pt website: http://paginas.fe.up.pt/~ei04073
Re: [Dovecot] Compile and configure Solr plugin
Hi all, I already find the problem. I was trying to compile and install Dovecot with a previous (I didn't know about it) installed version from Ubuntu repositories. I removed the installed version and now it works just fine. Sorry about the spam :P Regards, Rui Carneiro On Tue, Apr 7, 2009 at 3:10 PM, Rui Carneiro wrote: > Hi all, > > I'm having some kind of troubles on Solr's integration. I configured > Dovecot with the Solr argument (--with-solr) and everything proceeded just > fine. But, when I started Dovecot I got this error: > > Plugin fts_solr not found from directory /usr/lib64/dovecot/modules/imap > Error: imap dump-capability process returned 89 > Fatal: Invalid configuration in /etc/dovecot/dovecot.conf > > In fact, there is no Solr plugin in that directory. So I tried to make a > symbolic link from where the compiled Solr plugin was to this directory: > ln -s /usr/local/lib/dovecot/lib21_fts_solr_plugin.so > > And now I have the following error: > > dlopen(/usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so) failed: > /usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so: undefined symbol: > mailbox_get_virtual_box_patterns > Couldn't load required plugins > Error: imap dump-capability process returned 89 > Fatal: Invalid configuration in /etc/dovecot/dovecot.conf > > I'm missing some steps in this installation for sure, do anyone have any > clue? > > Thanks, > Rui Carneiro > -- telemóvel: 963446125 mail: rui@gmail.com mail: ei04...@fe.up.pt website: http://paginas.fe.up.pt/~ei04073
[Dovecot] Compile and configure Solr plugin
Hi all, I'm having some kind of troubles on Solr's integration. I configured Dovecot with the Solr argument (--with-solr) and everything proceeded just fine. But, when I started Dovecot I got this error: Plugin fts_solr not found from directory /usr/lib64/dovecot/modules/imap Error: imap dump-capability process returned 89 Fatal: Invalid configuration in /etc/dovecot/dovecot.conf In fact, there is no Solr plugin in that directory. So I tried to make a symbolic link from where the compiled Solr plugin was to this directory: ln -s /usr/local/lib/dovecot/lib21_fts_solr_plugin.so And now I have the following error: dlopen(/usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so) failed: /usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so: undefined symbol: mailbox_get_virtual_box_patterns Couldn't load required plugins Error: imap dump-capability process returned 89 Fatal: Invalid configuration in /etc/dovecot/dovecot.conf I'm missing some steps in this installation for sure, do anyone have any clue? Thanks, Rui Carneiro
Re: [Dovecot] Solr's index update
On Tue, Mar 31, 2009 at 7:28 PM, Timo Sirainen wrote: > On Tue, 2009-03-31 at 19:21 +0100, Rui Carneiro wrote: > > Another question. I read this on the TODO list: > > > > fts-solr: handle DELETE, RENAME > > > > I am interested to look deeper on this. Any start advice? > > Since Solr data can't be modified, both of these have to be handled the > same way: Just deleting the data from Solr indexes. You'll probably have > to do this like: > > 1. Hook into mailbox_list.delete_mailbox() in fts plugin (similar to > like how e.g. quota plugin does in quota_mailbox_list_delete()). > > 2. Add a new delete_mailbox() function to struct fts_backend_vfuncs and > have your delete_mailbox() call that before calling > super.delete_mailbox(). > > 3. Hook into the delete_mailbox() in fts-solr and have it execute a > query that deletes everything from the given mailbox. > Ok, I will take a look soon Thank you for your help ;)
Re: [Dovecot] Solr's index update
On Tue, Mar 31, 2009 at 5:10 PM, Timo Sirainen wrote: > On Mar 31, 2009, at 11:25 AM, Rui Carneiro wrote: > > Hi all, >> >> In the wiki says this: "Currently the indexes are updated only while >> searching" @ http://wiki.dovecot.org/Plugins/FTS >> >> This also is applied to Solr Indexes? >> > > Yes. > > If not, when Solr Indexes are updated? >> > > If you want them to be updated more often, you can issue SEARCH commands in > a cronjob or something. I will take your advice :) Another question. I read this on the TODO list: fts-solr: handle DELETE, RENAME I am interested to look deeper on this. Any start advice?
[Dovecot] Solr's index update
Hi all, In the wiki says this: "Currently the indexes are updated only while searching" @ http://wiki.dovecot.org/Plugins/FTS This also is applied to Solr Indexes? If not, when Solr Indexes are updated? Thank you, Rui Carneiro -- mail: rui@gmail.com, rui.carne...@portugalmail.net website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>