Re: [Dovecot] Solr Plugin make only INBOX able to search

2009-09-23 Thread Rui Carneiro
On Mon, Sep 14, 2009 at 2:54 AM, Timo Sirainen  wrote:
>
> BTW. Note that Solr schema file was changed a bit to use less disk
> space. But that shouldn't have caused this kind of a bug.


The problem is caused by my custom schema. My changes are breaking the
search someway.

Thank you Timo.

>


Re: [Dovecot] Solr Plugin make only INBOX able to search

2009-09-14 Thread Rui Carneiro
On Mon, Sep 14, 2009 at 2:54 AM, Timo Sirainen  wrote:

> Hmm. Seems to work here:
>
> x create "Test Folder"
> x OK Create completed.
> x copy 1:* "Test Folder"
> x OK [COPYUID 1251759136 20163:20174 1:12] Copy completed.
> x search text multiple
> * SEARCH 1
> x OK Search completed (0.000 secs).
>
> Are the messages added to the database? Can you find them with Solr's
> own query tool?


No.


> What Solr version are you using? I'm testing with
> v1.3.0.
>

I am using 1.4-dev. Maybe it was not a smart choice.


> BTW. Note that Solr schema file was changed a bit to use less disk
> space. But that shouldn't have caused this kind of a bug.
>

Ok, I will test solr v1.3.0 and the new schema. I will give news soon. Thank
you!


Re: [Dovecot] FTS Plugin design

2009-09-08 Thread rui . carneiro

Hi again!

After sometime using my changes on this plugin I found one major  
problem. When a message have two attachments with same name or one  
with content-type equal to "message/*", my solr schema design does not  
work because as attachment unique identifier I used attachment's name  
what is not correct.


Now I am trying to find a way to know the mime part id of the parts  
used on fts_build_mail. Is that already possible or I need to do that  
by my own?


Thank you in advance,
Rui Carneiro


Re: [Dovecot] Solr Plugin make only INBOX able to search

2009-09-07 Thread rui . carneiro

Hi again,

Anyone could help me?


[Dovecot] Solr Plugin make only INBOX able to search

2009-08-19 Thread rui . carneiro

Hi all,

I have downloaded and installed dovecot 1.2.4 with Solr plugin. After  
that, when I try to do some searches the results are weird.


All messages that aren't on INBOX folder are ignored. For example, if  
I have a folder named "Test Folder", when I try to do ". search text  
" the result is always empty.


I tried TEXT, SUBJECT and BODY arguments and none of them returned any email.

Any idea of the problem's source?

Regards,
Rui Carneiro


Re: [Dovecot] FTS Plugin design

2009-05-26 Thread Rui Carneiro
Citando Timo Sirainen :
> 
> At least for now. Memory leaks don't cause crashes.

Ok.

> gdb -p `pidof imap`
> cont
> 
> bt full

I think it won't be necessary. It is not crashing anymore. Maybe it was a bug 
in my code.

Tomorrow (or in the next day) I will send you the code.

Thank you for all the support!

Regards,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-26 Thread Rui Carneiro
Citando Timo Sirainen :
> So valgrind didn't find anything wrong. 

We should ignore LEAK SUMMARY?

> What does gdb show as the backtrace?

My gdb is not writing where he should (or not writing at all). This shouldn't 
be enough?

mail_executable = /usr/local/libexec/dovecot/gdbhelper 
/usr/local/libexec/dovecot/imap

The crash occurs after indexing all stuff and when imap is returning the result.

Thank you,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-25 Thread Rui Carneiro
Citando Timo Sirainen :
> I guess it works around some other bug then. If it's a memory-related
> bug you could also see if valgrind complains something:
> 
> protocol imap {
>   ..
>   mail_executable = /usr/bin/valgrind /usr/local/libexec/dovecot/imap
> }

Here is the output (I cloned the http://hg.dovecot.org/dovecot-1.2 and made no 
changes to this test):

ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 123 from 2)
malloc/free: in use at exit: 94,040 bytes in 1,032 blocks.
malloc/free: 1,704 allocs, 672 frees, 1,042,476 bytes allocated.
For counts of detected errors, rerun with: -v
searching for pointers to 1,032 not-freed blocks.
checked 111,072 bytes.

88,161 (328 direct, 87,833 indirect) bytes in 1 blocks are definitely lost in 
loss record 30 of 45
   at 0x4C24384: calloc (vg_replace_malloc.c:397)
   by 0x4AF165: pool_system_malloc (mempool-system.c:77)
   by 0x63E0DA2: ???
   by 0x63DF91D: ???
   by 0x5DBAF27: ???
   by 0x5DBBE50: ???
   by 0x46BBFF: mailbox_transaction_begin (mail-storage.c:794)
   by 0x42976F: imap_search_start (imap-search.c:540)
   by 0x4206D7: cmd_search (cmd-search.c:50)
   by 0x4232CB: client_command_input (client.c:608)
   by 0x423389: client_command_input (client.c:657)
   by 0x4239F4: client_handle_input (client.c:698)

LEAK SUMMARY:
   definitely lost: 328 bytes in 1 blocks.
   indirectly lost: 87,833 bytes in 1,016 blocks.
 possibly lost: 0 bytes in 0 blocks.
   still reachable: 5,879 bytes in 15 blocks.
suppressed: 0 bytes in 0 blocks.
Reachable blocks (those to which a pointer was found) are not shown.
To see them, rerun with: --leak-check=full --show-reachable=yes


Re: [Dovecot] FTS Plugin design

2009-05-22 Thread Rui Carneiro
Citando Timo Sirainen :
> The problem is something else. The Solr code simply tries to keep the
> send buffer smaller than that, nothing would break if you sent a larger
> buffer. Show gdb backtrace of the crash?
> 

I said it was from the buff size because when I increased it Dovecot didn't 
crash. 

It's Friday and I will not be able to do the gdb backtrace on weekend but it 
will be the first thing I will do Monday morning.

Regards,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-22 Thread Rui Carneiro
Hi Timo,

I almost finish the changes on fts plugin. By now, it seems to work fine with 
attachments (extracting and sending them to Solr). I only have a problem with 
the max size of the command (cmd) that we can send to Solr:

#define SOLR_CMDBUF_SIZE (1024*64)

By now, if we send some message bigger than this value the fts-plugin crash.

There is anything in your TODO-List that solves this problem?

Regards,
Rui Carneiro

PS: asap I will send you my code for your approval :)

-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-20 Thread Rui Carneiro
Now, with attachment.
/* Copyright (c) 2006-2009 Dovecot authors, see the included COPYING file */

#include "lib.h"
#include "buffer.h"
#include "base64.h"
#include "str.h"
#include "unichar.h"
#include "charset-utf8.h"
#include "quoted-printable.h"
#include "rfc822-parser.h"
#include "rfc2231-parser.h"
#include "message-parser.h"
#include "message-header-decode.h"
#include "message-decoder.h"

enum content_type {
	CONTENT_TYPE_UNKNOWN = 0,
	CONTENT_TYPE_BINARY,
	CONTENT_TYPE_QP,
	CONTENT_TYPE_BASE64
};

/* base64 takes max 4 bytes per character, q-p takes max 3. */
#define MAX_ENCODING_BUF_SIZE 3

/* UTF-8 takes max 5 bytes per character. Not sure about others, but I'd think
   10 is more than enough for everyone.. */
#define MAX_TRANSLATION_BUF_SIZE 10

struct message_decoder_context {
	enum message_decoder_flags flags;
	struct message_part *prev_part;

	struct message_header_line hdr;
	buffer_t *buf, *buf2;

	char *charset_trans_charset;
	struct charset_translation *charset_trans;
	char translation_buf[MAX_TRANSLATION_BUF_SIZE];
	unsigned int translation_size;

	char encoding_buf[MAX_ENCODING_BUF_SIZE];
	unsigned int encoding_size;

	char *content_charset;
	enum content_type content_type;

	unsigned int charset_utf8:1;
	unsigned int binary_input:1;
};

struct message_decoder_context *
message_decoder_init(enum message_decoder_flags flags)
{
	struct message_decoder_context *ctx;

	ctx = i_new(struct message_decoder_context, 1);
	ctx->flags = flags;
	ctx->buf = buffer_create_dynamic(default_pool, 8192);
	ctx->buf2 = buffer_create_dynamic(default_pool, 8192);
	return ctx;
}

void message_decoder_deinit(struct message_decoder_context **_ctx)
{
	struct message_decoder_context *ctx = *_ctx;

	*_ctx = NULL;

	if (ctx->charset_trans != NULL)
		charset_to_utf8_end(&ctx->charset_trans);

	buffer_free(&ctx->buf);
	buffer_free(&ctx->buf2);
	i_free(ctx->charset_trans_charset);
	i_free(ctx->content_charset);
	i_free(ctx);
}

static void
parse_content_transfer_encoding(struct message_decoder_context *ctx,
struct message_header_line *hdr)
{
	struct rfc822_parser_context parser;
	string_t *value;

	value = t_str_new(64);
	rfc822_parser_init(&parser, hdr->full_value, hdr->full_value_len, NULL);

	(void)rfc822_skip_lwsp(&parser);
	(void)rfc822_parse_mime_token(&parser, value);

	ctx->content_type = CONTENT_TYPE_UNKNOWN;
	switch (str_len(value)) {
	case 4:
		if (i_memcasecmp(str_data(value), "7bit", 4) == 0 ||
		i_memcasecmp(str_data(value), "8bit", 4) == 0)
			ctx->content_type = CONTENT_TYPE_BINARY;
		break;
	case 6:
		if (i_memcasecmp(str_data(value), "base64", 6) == 0)
			ctx->content_type = CONTENT_TYPE_BASE64;
		else if (i_memcasecmp(str_data(value), "binary", 6) == 0)
			ctx->content_type = CONTENT_TYPE_BINARY;
		break;
	case 16:
		if (i_memcasecmp(str_data(value), "quoted-printable", 16) == 0)
			ctx->content_type = CONTENT_TYPE_QP;
		break;
	}
}

static void
parse_content_type(struct message_decoder_context *ctx,
		   struct message_header_line *hdr)
{
	struct rfc822_parser_context parser;
	const char *const *results;
	string_t *str;

	if (ctx->content_charset != NULL)
		return;

	rfc822_parser_init(&parser, hdr->full_value, hdr->full_value_len, NULL);
	(void)rfc822_skip_lwsp(&parser);
	str = t_str_new(64);
	if (rfc822_parse_content_type(&parser, str) <= 0)
		return;

	(void)rfc2231_parse(&parser, &results);
	for (; *results != NULL; results += 2) {
		if (strcasecmp(results[0], "charset") == 0) {
			ctx->content_charset = i_strdup(results[1]);
			ctx->charset_utf8 = charset_is_utf8(results[1]);
			break;
		}
	}
}

static bool message_decode_header(struct message_decoder_context *ctx,
  struct message_header_line *hdr,
  struct message_block *output)
{
	bool dtcase = (ctx->flags & MESSAGE_DECODER_FLAG_DTCASE) != 0;
	size_t value_len;

	if (hdr->continues) {
		hdr->use_full_value = TRUE;
		return FALSE;
	}

	T_BEGIN {
		if (hdr->name_len == 12 &&
		strcasecmp(hdr->name, "Content-Type") == 0)
			parse_content_type(ctx, hdr);
		if (hdr->name_len == 25 &&
		strcasecmp(hdr->name, "Content-Transfer-Encoding") == 0)
			parse_content_transfer_encoding(ctx, hdr);
	} T_END;

	buffer_set_used_size(ctx->buf, 0);
	message_header_decode_utf8(hdr->full_value, hdr->full_value_len,
   ctx->buf, dtcase);
	value_len = ctx->buf->used;

	if (dtcase) {
		(void)uni_utf8_to_decomposed_titlecase(hdr->name, hdr->name_len,
		   ctx->buf);
		buffer_append_c(ctx->buf, '\0');
	}

	ctx->hdr = *hdr;
	ctx->hdr.full_value = ctx->buf->data;
	ctx->hdr.full_value_len = value_len;
	ctx->hdr.value_len = 0;
	if (dtcase) {
		ctx->hdr.name = CONST_PTR_OFFSET(ctx->buf->data,
		 ctx->hdr.full_value_len);
		ctx->hdr.name_len = ctx->buf->used - 1 - value_len;
	}

	output->hdr = &ctx->hdr;
	return TRUE;
}

static void translation_buf_decode(struct message_decoder_context *ctx,
   const unsigned char **data, size_t *size)
{
	unsigned char trans_buf[MAX_TRANSLATION_BUF_SIZE+1];
	unsigned int data_wanted, skip;
	size_t 

Re: [Dovecot] FTS Plugin design

2009-05-19 Thread Rui Carneiro
On Tue, May 19, 2009 at 8:51 PM, Timo Sirainen  wrote:

> You forgot the attachment.
>

Oh Sorry, I am not at the office now (almost 10pm here) I will send it
tomorrow morning.

Rui Carneiro
---
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-19 Thread Rui Carneiro
Citando Timo Sirainen :
> All the data comes from lib-mail/message-decoder.c. Hmm. Looks like it
> tries to force giving only valid UTF-8 output. I guess it should have
> some flag or something that makes it do that only for text/* parts, not
> for binary parts. OK, implemented, see if it works with this and using
> the flag:
> 
> http://hg.dovecot.org/dovecot-1.2/rev/44548a7fb10d
> 

It is working now but I needed to do some changes on your code.

When you check charset_utf8 and charset_trans you have a problem on attachments 
case. Attachments part do not have any charset defined on headers so, by 
default, charset_utf8=1 and charset_trans is garbage (I have no idea where that 
garbage came from).

To avoid this problem swap the some lines of code that set ctx->binary_input to 
the function's beginning.

Please see the attachment to checked any problem that may exist.

Thank you,
Rui Carneiro
---
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-18 Thread Rui Carneiro
Citando Timo Sirainen :
> Nope. If you still see corruption, try with some simple test mails and
> see if it's adding garbage, losing contents or adding more content.

I tried something more advanced than that. I hexdumped my pdf test file and on 
the first line I get:

  25 50 44 46 2d 31 2e 33  0a 25 e2 e3 cf d3 0a 31

Where "e2 e3 cf d3" is binary data. When I do the same for my copied file I get:

  25 50 44 46 2d 31 2e 33  0a 25 ef bf bd 0a 31 20

It is weird but the binary data changed.

Further, I print to logs the 11 character from the first block.data just before 
fts_backend_build_more() and the value is EF (the correct one would be E2).

I think binary data is being corrupted anywhere before fts_backend_build_more() 
and I don't have any idea where.

Any help would be appreciated.

Thank you,
Rui Carneiro

-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-18 Thread Rui Carneiro
Hi again,

I am having some troubles sending all data to a file. When I finish to send all 
data to a file, I tried to open it and the file is corrupted.

The first think I noticed is that all chars are capitalized what destroy all 
the file format.

Where are the chars capitalized?
Any other idea why files are getting corrupted?

Thank you,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-15 Thread Rui Carneiro
Citando Timo Sirainen :
> 1. You notice a non-text/* content-type and initialize text extraction
> for the MIME part. Like:
> 
> struct attachment_extract_context *
> attachment_extract_init(const char *content_type);
> 
> 2. After this you feed all the input belonging to that MIME part to:
> 
> int attachment_extract_add(struct attachment_extract_context *ctx,
> const struct message_block *input);
> 
> Don't output anything to FTS backend at this point. The
> attachment_extract_add() would probably just basically write to a
> temporary file.
> 
> 3. Finally you'll notice that the MIME part ends (either you get headers
> for the next MIME part or the entire message ends). Then finish the
> extraction, which actually executes the whatever conversion binaries:
> 
> int attachment_extract_finish(struct attachment_extract_context *ctx);
> 
> 4. Get the resulting text to fts_backend_build_more() somehow. Either
> some attachment_extract_add_to_fts() which internally adds it or some
> kind of an iterator that returns the text in smaller blocks. Either
> would work..
> 
> That kind of an API would also make it possible to pretty easily modify
> in future to not write temporary files for specific content types if
> it's not required.
> 

I tried your approach and I think it is working pretty well. Now I only need to 
look carefully to the output of external programs and build the XML correctly 
to send to Solr.

Thanks Timo

Regards,
Rui Carneiro

-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-05-05 Thread Rui Carneiro
Hi again,

On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen  wrote:

>  - fts_build_mail() indexes a single mail. It parses the messages and
> returns the data in small blocks. For text/* and message/rfc822 parts
> those blocks are currently sent to FTS backend. This is where I think
> you should look into hooking your attachment parsing. Change
> fts_build_want_index_part() to look for more content-types that you're
> interested in and then before feeding the blocks to FTS backend put them
> through your own converter function, something like:
>
> int attachment_extract_text(struct attachment_extract_context *ctx,
> const struct message_block *input, struct message_block *output);


Let's take the example of an application-pdf content-type. Before I
converter all pdf data to text I need to gather all data before. The actual
process is feeding FTS backend with small parts of data and appending them
on "build_more" functions (e.g. fts_backend_solr_build_more()).

So where should I call attachment_extract_text()? In
fts_backend_solr_build_more() and not making append to cmd until data is
extracted? Or gather all information before (e.g. fts_build_mail()) and send
all in once to FTS backend?

I hope I've made myself clear.

Regards,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] fts-solr plugin issue (Marked invalid)

2009-05-04 Thread Rui Carneiro
Citando Nikolai Derzhak :

> But in sum: when dovecot try to index some mail's, that solr tokenizer not
> eat (error 500, Marked invalid),
> dovecot stop indexing of box and retry attempts in each next search with same
> result.

I think u might have a look on solr-connection.c and look to the functions that 
communicate with Solr and change the error handling.

I think this "miss feature" will be useful to me soon.

Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net

PS: Sorry but my current knowledge on this is not very much at the moment.


Re: [Dovecot] fts-solr plugin issue (Marked invalid)

2009-05-04 Thread Rui Carneiro
I do not have sure if I understood your problem correctly.

Are you trying to index attachments from messages? Or Dovecot is indexing some 
"bad" parts and you just do not know why?

Regards,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net

Citando Nikolai Derzhak :

> OK. Concentrating problem in one question.
> How to ignore "bad" message and index next one in indexing procedure (fts
> plugin) ?.
> Now, one "error 500" from solr and dovecot (# 1.1.11:
> /etc/dovecot/dovecot.conf
> # OS: Linux 2.6.21.7-2.fc8xen i686 Ubuntu 8.04.2 ext3
> ) stop and each next search query repeat the story.
> I've explored fts, and ftp-solr directories in src, without success for now.
> Timo, you understand code much bettter, can you help me and point to place in
> code,
> or probably create some patch, if possible ?.


Re: [Dovecot] FTS Plugin design

2009-04-23 Thread rui . carneiro
On Thu, Apr 23, 2009 at 5:47 AM,  wrote:

Note that some formats might require to seek to some point in the file [1]
(typically the end), so reading from stdin is awkward (it would require
stdin to be seekable, so either the app or the caller would have to put
the whole file somewhere anyway).

[1] Notably PDF has some index tables at EOF - 1k if I remember
correctly.

I hadn't thought on that before but I think you are right. The only question 
here is writing data to memory or hd.

Thank you all,
Rui Carneiro

--
Portugalmail, Comunicações S.A.
www.portugalmail.net


Re: [Dovecot] FTS Plugin design

2009-04-22 Thread Rui Carneiro
On Wed, Apr 22, 2009 at 5:38 PM, Timo Sirainen  wrote:

> Maybe those programs could be changed and just require the newer
> versions?..


I will talk with the developers of those applications about the possibility
of supporting stdin input (if not supported yet).

I think the API that fts plugin uses to do the conversion should be
> generic enough that both approaches would work. Then it would be easier
> to implement one or another or both eventually.


I think I will try the external applications approach. My developing time
available is not to much.
I will develop the API  as much as generic I can for possible improvements
in the future.

Regards,
Rui Carneiro


Re: [Dovecot] FTS Plugin design

2009-04-22 Thread Rui Carneiro
Hi,

Almost full text search engines (C/C++) I looked (Swish-E, Wumpus, Lemur and
Xapian) do not use any kind of library or parser. Instead, they use other
applications like pdftotext, catdoc, catppt (etc) and call them with execvp
(or equivalent). Using this approach on my project have some pros and cons:

Pros:
- The existing libraries to extract the content of pdf, doc (etc) are not
very stable.
- Easier to handle errors (even if those applications crash dovecot will be
still running)
- Less developing time

Cons:
- Some programs to parse special formats (p.e. catppt and pdftotext) do not
accept input from stdin (we need to create temporary files).

What approach would be better? Using applications like pdftotext and catdoc
or, on the other hand, use their libraries and do it almost from scratch?

Regards
Rui Carneiro

On Tue, Apr 21, 2009 at 5:52 PM, Rui Carneiro  wrote:

> Great idea!
>
> I will give news soon.
>
>
> On Tue, Apr 21, 2009 at 5:32 PM, Timo Sirainen  wrote:
>
>> I've no idea, but you could at least look at some of the other full text
>> search engines. I remember them advertising indexing support for all kinds
>> of formats. Maybe they're using some specific library or maybe it would be
>> easy to extract their parsing code.
>>
>


-- 
mobile: +351 963446125
mail: rui@gmail.com
mail: ei04...@fe.up.pt
website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>


Re: [Dovecot] FTS Plugin design

2009-04-21 Thread Rui Carneiro
Great idea!

I will give news soon.

On Tue, Apr 21, 2009 at 5:32 PM, Timo Sirainen  wrote:

> I've no idea, but you could at least look at some of the other full text
> search engines. I remember them advertising indexing support for all kinds
> of formats. Maybe they're using some specific library or maybe it would be
> easy to extract their parsing code.
>


Re: [Dovecot] FTS Plugin design

2009-04-21 Thread Rui Carneiro
Hi again,

Anyone know some good libraries to handle the content of files like pdf,
ppt, doc, etc? I am already indexing attachments all I need now is extract
the text of them.

Regards,
Rui Carneiro

On Mon, Apr 20, 2009 at 3:29 PM, Rui Carneiro  wrote:

> Hi,
>
> The problem was on the flag. My hexa to binary conversions was wrong.
>
> Regards,
> Rui Carneiro
>
>
>
> On Fri, Apr 17, 2009 at 10:03 AM, Rui Carneiro  wrote:
>
>> Thank you for all tips. The design look more clear to me now.
>>
>> I have one more question. I looked into fts_build_want_index_part() and I
>> saw that I need to add some flags to message_part_flags, what values should
>> I choose? My first approach was to follow your schema and set
>> MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this?
>>
>> I already had changed parse_content_type() to set ctx->part->flags
>> correctly but if i choose my custom flag dovecot assume that all attachment
>> lines are headers. I already tried to set those ctx->part->flags as TEXT and
>> the fts_backend was feeded correctly with all attachment lines.
>>
>> I don't know if this is related with the value of
>> MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting
>> block.hdr = NULL or some more code to handle new flags).
>>
>> Thank you,
>> Rui Carneiro
>>
>>
>> On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen  wrote:
>>
>>> On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
>>> > I didn't understood yet what is the plugin's design and how the plugins
>>> are
>>> > called from the core system and I was wondering if anyone could help me
>>> with
>>> > that.
>>>
>>> fts-storage.c hooks into all the functions in mail-storage API that it
>>> needs to. Currently indexing isn't done while messages are being saved,
>>> but instead just before searching. The searching functions are:
>>>
>>>  - fts_mailbox_search_init() tries to figure out if FTS can optimize the
>>> search. If it does, it tries to figure out if FTS index is up-to-date
>>> and if not, starts the search.
>>>
>>>  - fts_mailbox_search_next_nonblock() continues the indexing (or
>>> searching after indexing) for a while. The idea is that IMAP connection
>>> is able to process other commands while doing a long-running search. So
>>> fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It
>>> would be nice if that value was dynamically calculated and also based on
>>> bytes instead of messages, but that's maybe too much trouble.
>>>
>>>  - fts_mailbox_search_next_update_seq() uses the fts search results and
>>> updates mail-storage's search stuff so that it doesn't go through
>>> messages that don't match.
>>>
>>>  - fts_build_mail() indexes a single mail. It parses the messages and
>>> returns the data in small blocks. For text/* and message/rfc822 parts
>>> those blocks are currently sent to FTS backend. This is where I think
>>> you should look into hooking your attachment parsing. Change
>>> fts_build_want_index_part() to look for more content-types that you're
>>> interested in and then before feeding the blocks to FTS backend put them
>>> through your own converter function, something like:
>>>
>>> int attachment_extract_text(struct attachment_extract_context *ctx,
>>> const struct message_block *input, struct message_block *output);
>>>
>>>
>>>
>>
>>
>> --
>> mobile: +351 963446125
>> mail: rui@gmail.com
>> mail: ei04...@fe.up.pt
>> website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>
>>
>
>
>
> --
> mobile: +351 963446125
> mail: rui@gmail.com
> mail: ei04...@fe.up.pt
> website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>
>



-- 
mobile: +351 963446125
mail: rui@gmail.com
mail: ei04...@fe.up.pt
website: http://paginas.fe.up.pt/~ei04073


Re: [Dovecot] FTS Plugin design

2009-04-20 Thread Rui Carneiro
Hi,

The problem was on the flag. My hexa to binary conversions was wrong.

Regards,
Rui Carneiro


On Fri, Apr 17, 2009 at 10:03 AM, Rui Carneiro  wrote:

> Thank you for all tips. The design look more clear to me now.
>
> I have one more question. I looked into fts_build_want_index_part() and I
> saw that I need to add some flags to message_part_flags, what values should
> I choose? My first approach was to follow your schema and set
> MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this?
>
> I already had changed parse_content_type() to set ctx->part->flags
> correctly but if i choose my custom flag dovecot assume that all attachment
> lines are headers. I already tried to set those ctx->part->flags as TEXT and
> the fts_backend was feeded correctly with all attachment lines.
>
> I don't know if this is related with the value of
> MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting
> block.hdr = NULL or some more code to handle new flags).
>
> Thank you,
> Rui Carneiro
>
>
> On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen  wrote:
>
>> On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
>> > I didn't understood yet what is the plugin's design and how the plugins
>> are
>> > called from the core system and I was wondering if anyone could help me
>> with
>> > that.
>>
>> fts-storage.c hooks into all the functions in mail-storage API that it
>> needs to. Currently indexing isn't done while messages are being saved,
>> but instead just before searching. The searching functions are:
>>
>>  - fts_mailbox_search_init() tries to figure out if FTS can optimize the
>> search. If it does, it tries to figure out if FTS index is up-to-date
>> and if not, starts the search.
>>
>>  - fts_mailbox_search_next_nonblock() continues the indexing (or
>> searching after indexing) for a while. The idea is that IMAP connection
>> is able to process other commands while doing a long-running search. So
>> fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It
>> would be nice if that value was dynamically calculated and also based on
>> bytes instead of messages, but that's maybe too much trouble.
>>
>>  - fts_mailbox_search_next_update_seq() uses the fts search results and
>> updates mail-storage's search stuff so that it doesn't go through
>> messages that don't match.
>>
>>  - fts_build_mail() indexes a single mail. It parses the messages and
>> returns the data in small blocks. For text/* and message/rfc822 parts
>> those blocks are currently sent to FTS backend. This is where I think
>> you should look into hooking your attachment parsing. Change
>> fts_build_want_index_part() to look for more content-types that you're
>> interested in and then before feeding the blocks to FTS backend put them
>> through your own converter function, something like:
>>
>> int attachment_extract_text(struct attachment_extract_context *ctx,
>> const struct message_block *input, struct message_block *output);
>>
>>
>>
>
>
> --
> mobile: +351 963446125
> mail: rui@gmail.com
> mail: ei04...@fe.up.pt
> website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>
>



-- 
mobile: +351 963446125
mail: rui@gmail.com
mail: ei04...@fe.up.pt
website: http://paginas.fe.up.pt/~ei04073


Re: [Dovecot] FTS Plugin design

2009-04-17 Thread Rui Carneiro
Thank you for all tips. The design look more clear to me now.

I have one more question. I looked into fts_build_want_index_part() and I
saw that I need to add some flags to message_part_flags, what values should
I choose? My first approach was to follow your schema and set
MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this?

I already had changed parse_content_type() to set ctx->part->flags correctly
but if i choose my custom flag dovecot assume that all attachment lines are
headers. I already tried to set those ctx->part->flags as TEXT and the
fts_backend was feeded correctly with all attachment lines.

I don't know if this is related with the value of
MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting
block.hdr = NULL or some more code to handle new flags).

Thank you,
Rui Carneiro

On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen  wrote:

> On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
> > I didn't understood yet what is the plugin's design and how the plugins
> are
> > called from the core system and I was wondering if anyone could help me
> with
> > that.
>
> fts-storage.c hooks into all the functions in mail-storage API that it
> needs to. Currently indexing isn't done while messages are being saved,
> but instead just before searching. The searching functions are:
>
>  - fts_mailbox_search_init() tries to figure out if FTS can optimize the
> search. If it does, it tries to figure out if FTS index is up-to-date
> and if not, starts the search.
>
>  - fts_mailbox_search_next_nonblock() continues the indexing (or
> searching after indexing) for a while. The idea is that IMAP connection
> is able to process other commands while doing a long-running search. So
> fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It
> would be nice if that value was dynamically calculated and also based on
> bytes instead of messages, but that's maybe too much trouble.
>
>  - fts_mailbox_search_next_update_seq() uses the fts search results and
> updates mail-storage's search stuff so that it doesn't go through
> messages that don't match.
>
>  - fts_build_mail() indexes a single mail. It parses the messages and
> returns the data in small blocks. For text/* and message/rfc822 parts
> those blocks are currently sent to FTS backend. This is where I think
> you should look into hooking your attachment parsing. Change
> fts_build_want_index_part() to look for more content-types that you're
> interested in and then before feeding the blocks to FTS backend put them
> through your own converter function, something like:
>
> int attachment_extract_text(struct attachment_extract_context *ctx,
> const struct message_block *input, struct message_block *output);
>
>
>


-- 
mobile: +351 963446125
mail: rui@gmail.com
mail: ei04...@fe.up.pt
website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>


[Dovecot] FTS Plugin design

2009-04-13 Thread Rui Carneiro
Hi all,

Currently I am developing some changes on the solr plugin. I want this
plugin indexing also the attachment's content. I have already started to
look on plugin's source but I am having some problems understanding how it
works.

I didn't understood yet what is the plugin's design and how the plugins are
called from the core system and I was wondering if anyone could help me with
that.

Sorry if this doubts sound stupid but I am newcomer on Dovecot.

Regards,
Rui Carneiro


Re: [Dovecot] auth-master: Permission denied [sigh]

2009-04-12 Thread Rui Carneiro
Hi,

I was having problems with permissions on auth-master too. I solve them
creating manually the folder /var/run/dovecot with correct permissions but i
see you already did that :\

On Sun, Apr 12, 2009 at 5:27 PM, James Butler wrote:

> I've been messing with this for too long, now, and I'm blind to whatever's
> wrong. Or I'm simply being dense. Either way, I need help with a common
> issue.
>
> I'm trying to get Postfix+Spamassassin+Dovecot going on Fedora 10. (I'll
> get back to the global Sieve thingy soon, but I need to get this going,
> first.)
>
> When using the simple:
>  mailbox_command = /usr/local/libexec/dovecot/deliver
> everything is cool, except there's no Spamassassin involvement, obviously.
>
> The problem shows itself when the Spamassassin user hands off to the
> recipient user and Deliver + the recipient user tries to access
> /var/run/dovecot/auth-master.
>
> Thank you for any insight you can provide.
>
> /var/run/dovecot: 755 root:dovecot
> /var/run/dovecot/login: 750 root:dovecot
> /var/run/dovecot/auth-master: 750 root:dovecot
> (I think. auth-master is a temporary file? Comes and goes.)
>
> >From /etc/postfix/main.cf
>
> mailbox_transport = spamassassin
>
> >From /etc/postfix/master.cf:
>
> spamassassin unix - n n - - pipe
>  user=spam argv=/usr/bin/spamc -f -e /usr/libexec/dovecot/deliver
>  -f ${sender} -d ${user} -m ${extension}
>
> Here's my 'socket listen' section from /usr/local/etc/dovecot.conf:
>
> socket listen {
>  master {
>  path = /var/run/dovecot/auth-master
>  mode = 0666
>  #user =
>  group = dovecot
>  }
>  client {
>  path = /var/run/dovecot/auth-client
>  mode = 0666
>  #user =
>  group = dovecot
>  }
> }
>
> >From /var/log/maillog:
>
> Postfix receives the message:
>
> postfix/smtpd[29447]: connect from \
>  IP-ADD-RE-SS.ptr.example-send.com[IP.ADD.RE.SS]
> postfix/smtpd[29447]: 60990FA01BA: \
>  client=IP-ADD-RE-SS.ptr.example-send.com[IP.ADD.RE.SS]
> postfix/cleanup[29451]: 60990FA01BA: \
>  message-id=<49e20bf2.4090...@example-send.com>
> postfix/qmgr[29441]: 60990FA01BA: from=, \
>  size=812, nrcpt=1 (queue active)
> postfix/smtpd[29447]: disconnect from \
>  IP-ADD-RE-SS.ptr.example-send.com[IP.ADD.RE.SS]
>
> Spamassassin processes the message as user 'spam':
>
> spamd[4121]: spamd: processing message\
>  <49e20bf2.4090...@example-send.com> for spam:653
> spamd[4121]: spamd: clean message (3.0/5.0) for spam:653 in 5.2 seconds,\
>  793 bytes.
> spamd[4121]: spamd: result: . 2 - RDNS_DYNAMIC,TVD_SPACE_RATIO \
>  scantime=5.2,size=793,user=spam,uid=653,required_score=5.0, \
>  rhost=localhost.localdomain,raddr=127.0.0.1,rport=42493, \
>  mid=<49e20bf2.4090...@example-send.com>,autolearn=no
>
> Spamassassin pipes result to Deliver which runs as recipient user.
>
> Deliver as recipient user doesn't have permission to auth:
>
> deliver(recipient): Can't connect to auth server at \
>  /var/run/dovecot/auth-master: Permission denied
> postfix/pipe[29452]: 60990FA01BA: to=, \
>  relay=spamassassin, delay=6, delays=0.33/0.01/0/5.7, dsn=4.3.0, \
>  status=deferred (temporary failure)
>
> 1) I must use the 'user=' arg for spamc
> 2) Can't use 'user=${user}' or $user:
>   fatal: get_service_attr: unknown username: ${user}
> 3) Must use '-d ${user}' Deliver arg, otherwise
>   message gets delivered to user 'spam'
>
> AArrgh! TIA.
>
>


-- 
telemóvel: 963446125
mail: rui@gmail.com
mail: ei04...@fe.up.pt
website: http://paginas.fe.up.pt/~ei04073


Re: [Dovecot] Compile and configure Solr plugin

2009-04-07 Thread Rui Carneiro
Hi all,

I already find the problem. I was trying to compile and install Dovecot with
a previous (I didn't know about it) installed version from Ubuntu
repositories.
I removed the installed version and now it works just fine.

Sorry about the spam :P

Regards,
Rui Carneiro

On Tue, Apr 7, 2009 at 3:10 PM, Rui Carneiro  wrote:

> Hi all,
>
> I'm having some kind of troubles on Solr's integration. I configured
> Dovecot with the Solr argument (--with-solr) and everything proceeded just
> fine. But, when I started Dovecot I got this error:
>
> Plugin fts_solr not found from directory /usr/lib64/dovecot/modules/imap
> Error: imap dump-capability process returned 89
> Fatal: Invalid configuration in /etc/dovecot/dovecot.conf
>
> In fact, there is no Solr plugin in that directory. So I tried to make a
> symbolic link from where the compiled Solr plugin was to this directory:
> ln -s /usr/local/lib/dovecot/lib21_fts_solr_plugin.so
>
> And now I have the following error:
>
> dlopen(/usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so) failed:
> /usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so: undefined symbol:
> mailbox_get_virtual_box_patterns
> Couldn't load required plugins
> Error: imap dump-capability process returned 89
> Fatal: Invalid configuration in /etc/dovecot/dovecot.conf
>
> I'm missing some steps in this installation for sure, do anyone have any
> clue?
>
> Thanks,
> Rui Carneiro
>



-- 
telemóvel: 963446125
mail: rui@gmail.com
mail: ei04...@fe.up.pt
website: http://paginas.fe.up.pt/~ei04073


[Dovecot] Compile and configure Solr plugin

2009-04-07 Thread Rui Carneiro
Hi all,

I'm having some kind of troubles on Solr's integration. I configured Dovecot
with the Solr argument (--with-solr) and everything proceeded just fine.
But, when I started Dovecot I got this error:

Plugin fts_solr not found from directory /usr/lib64/dovecot/modules/imap
Error: imap dump-capability process returned 89
Fatal: Invalid configuration in /etc/dovecot/dovecot.conf

In fact, there is no Solr plugin in that directory. So I tried to make a
symbolic link from where the compiled Solr plugin was to this directory:
ln -s /usr/local/lib/dovecot/lib21_fts_solr_plugin.so

And now I have the following error:

dlopen(/usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so) failed:
/usr/lib64/dovecot/modules/imap/lib21_fts_solr_plugin.so: undefined symbol:
mailbox_get_virtual_box_patterns
Couldn't load required plugins
Error: imap dump-capability process returned 89
Fatal: Invalid configuration in /etc/dovecot/dovecot.conf

I'm missing some steps in this installation for sure, do anyone have any
clue?

Thanks,
Rui Carneiro


Re: [Dovecot] Solr's index update

2009-03-31 Thread Rui Carneiro
On Tue, Mar 31, 2009 at 7:28 PM, Timo Sirainen  wrote:

> On Tue, 2009-03-31 at 19:21 +0100, Rui Carneiro wrote:
> > Another question. I read this on the TODO list:
> >
> > fts-solr: handle DELETE, RENAME
> >
> > I am interested to look deeper on this. Any start advice?
>
> Since Solr data can't be modified, both of these have to be handled the
> same way: Just deleting the data from Solr indexes. You'll probably have
> to do this like:
>
> 1. Hook into mailbox_list.delete_mailbox() in fts plugin (similar to
> like how e.g. quota plugin does in quota_mailbox_list_delete()).
>
> 2. Add a new delete_mailbox() function to struct fts_backend_vfuncs and
> have your delete_mailbox() call that before calling
> super.delete_mailbox().
>
> 3. Hook into the delete_mailbox() in fts-solr and have it execute a
> query that deletes everything from the given mailbox.
>

Ok, I will take a look soon

Thank you for your help ;)


Re: [Dovecot] Solr's index update

2009-03-31 Thread Rui Carneiro
On Tue, Mar 31, 2009 at 5:10 PM, Timo Sirainen  wrote:

> On Mar 31, 2009, at 11:25 AM, Rui Carneiro wrote:
>
>  Hi all,
>>
>> In the wiki says this: "Currently the indexes are updated only while
>> searching" @ http://wiki.dovecot.org/Plugins/FTS
>>
>> This also is applied to Solr Indexes?
>>
>
> Yes.
>
>  If not, when Solr Indexes are updated?
>>
>
> If you want them to be updated more often, you can issue SEARCH commands in
> a cronjob or something.


I will take your advice :)

Another question. I read this on the TODO list:

fts-solr: handle DELETE, RENAME

I am interested to look deeper on this. Any start advice?


[Dovecot] Solr's index update

2009-03-31 Thread Rui Carneiro
Hi all,

In the wiki says this: "Currently the indexes are updated only while
searching" @ http://wiki.dovecot.org/Plugins/FTS

This also is applied to Solr Indexes? If not, when Solr Indexes are updated?

Thank you,
Rui Carneiro
-- 
mail: rui@gmail.com, rui.carne...@portugalmail.net
website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>