[notmuch] automatically assigning tags to new messages?

2009-12-18 Thread Alex Ghitza

Dear notmuch crowd,

I heard about notmuch mail a few days ago and I started playing with
it.  So far, it makes me very happy, but there are some things that I
need to learn how to do.  I'll start with the most important one:
tagging incoming messages automatically.

What is the recommended way of achieving this?  There is a variety of
things that I would like to automatically do to incoming mail:
- if it comes from a mailing list (e.g. notmuch), tag it +notmuch and
  +unread, but not +inbox
- if it's one of the numerous pointless weekly newsletters that I'm
  getting from my university, tag it +unimelb, but not +unread or +inbox 
- if it is coming from me, tag it +sent, but not +unread or +inbox

There's probably more, but this is a good start.  Any advice?

[My current email setup is as follows: get messages from the Gmail
account using fetchmail; run procmail with a bunch of recipes that put
the messages into various maildirs; read the results with mutt.  (Oh,
and send mail with msmtp from mutt.)]


Best,
Alex


-- 
Alex Ghitza -- Lecturer in Mathematics -- The University of Melbourne
-- Australia -- http://www.ms.unimelb.edu.au/~aghitza/


[notmuch] Missing messages breaking threads

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 12:52:58 -0800, Carl Worth  wrote:
> On Fri, 18 Dec 2009 19:53:13 +, James Westby  jameswestby.net> wrote:
> Oh, I was assuming you wouldn't index any text. The UI can add "missing
> message" for a document with no filename, for example.

Works for me.

> > So, to summarise, I should first look at storing filesizes, then
> > the collision code to make it index further when the filesize grows,
> > and then finally the code to add documents for missing messages?
> 
> Some of the code areas to be touched will be changing soon, (at least as
> far as when filenames appear and disappear). Hopefully I'll have
> something posted for that sooner rather than later to avoid having to
> redo too much work.

That would be great. I'm learning all the code anyway, so there's not
a whole lot of knowledge being thrown away.

I've just sent an initial cut at the fist step.

> > The only thing I am unclear on is how to handle existing databases?
> > Do we have any concept of versioning? Or should I just assume that
> > filesize: may not be in the document and act appropriately?
> 
> My current, outstanding patch is going to be the first trigger for a
> "flag day" where we'll all need to rewrite our databases.
> 
> We don't have any concept of versioning yet, but it would obviously be
> easy to have a new version document with an increasing integer.
> 
> But even with my current patch I'm considering doing a graceful upgrade
> of the database in-place rather than making the user do something like a
> dump, delete, rebuild, restore. That would give a much better experience
> than "Your database is out-of-date, please rebuild it", so we'll see if
> I pursue that in the end.

That sounds nice, I'd certainly prefer this sort of thing as it evolves.

Thanks,

James


[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Sat, 19 Dec 2009 01:35:46 +, James Westby <jw+debian at jameswestby.net> 
wrote:
> On Fri, 18 Dec 2009 16:57:16 -0800, Carl Worth  wrote:
> > You can, actually. Just set the NOTMUCH_CONFIG environment variable to
> > your alternate configuration file. (And yes, we're missing any mention
> > of this in our documentation.)
> 
> Sweet. Where would be the best place to document it? Just in the
> man page?

Currently we're replicating all of our documentation both in the man
page and in the output from "notmuch help". It's annoying to have to
add everything in two places, but I don't have a good idea for making
that sharable yet.

Anyone have a solution here?

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/3cae398e/attachment-0001.pgp>


[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
When indexing a message store the filesize along with it so that
when we store all the filenames for a message-id we can know if
any of them have different content cheaply.

The value stored is defined to be the largest filesize of any
of the files for that message.

This changes the API for efficiency reasons. The size is often
known to the caller, and so we save a second stat by asking them
to provide it. If they don't know it they can pass -1 and the
stat will be done for them.

We store the filesize such that we can query a range. Thus it
would be possible to query "filesize:0..100" if you somehow
knew the raw message was less that 100 bytes.
---

  Here's the first part, storing the filesize. I'm using
  add_value so that we can make it sortable, is that valid
  for retrieving it as well?

  The only thing I'm not sure about is if it works. Is there
  a way to inspect a document to see the values that are
  stored? Doing a search isn't working, so I imagine I made
  a mistake.

  Thanks,

  James

 lib/database.cc   |   17 +
 lib/message.cc|   25 +
 lib/notmuch-private.h |8 +++-
 lib/notmuch.h |5 +
 notmuch-new.c |2 +-
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b6c4d07..0ec77cd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -454,6 +454,17 @@ notmuch_database_create (const char *path)
 return notmuch;
 }

+struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
+FilesizeValueRangeProcessor() {}
+
+Xapian::valueno operator()(std::string , std::string &) {
+if (begin.substr(0, 9) != "filesize:")
+return Xapian::BAD_VALUENO;
+begin.erase(0, 9);
+return NOTMUCH_VALUE_FILESIZE;
+}
+};
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
   notmuch_database_mode_t mode)
@@ -463,6 +474,7 @@ notmuch_database_open (const char *path,
 struct stat st;
 int err;
 unsigned int i;
+FilesizeValueRangeProcessor filesize_proc;

 if (asprintf (_path, "%s/%s", path, ".notmuch") == -1) {
notmuch_path = NULL;
@@ -508,6 +520,7 @@ notmuch_database_open (const char *path,
notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
notmuch->query_parser->set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch->query_parser->add_valuerangeprocessor 
(notmuch->value_range_processor);
+   notmuch->query_parser->add_valuerangeprocessor (_proc);

for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = _PREFIX_EXTERNAL[i];
@@ -889,6 +902,7 @@ _notmuch_database_link_message (notmuch_database_t *notmuch,
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
+ const off_t size,
  notmuch_message_t **message_ret)
 {
 notmuch_message_file_t *message_file;
@@ -992,6 +1006,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, "type", "mail");
+   ret = _notmuch_message_set_filesize (message, filename, size);
+   if (ret)
+   goto DONE;
} else {
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..2bfc5ed 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -426,6 +426,31 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 message->doc.set_data (s);
 }

+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+struct stat st;
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+if (realsize < 0) {
+   if (stat (filename, )) {
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+   } else {
+   realsize = st.st_size;
+   }
+}
+
+message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
+Xapian::sortable_serialise (realsize));
+
+  DONE:
+return ret;
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 116f63d..1ba3055 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -100,7 +100,8 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE 
(1, 2);

 typedef enum {
 NOTMUCH_VALUE_TIMESTAMP = 0,
-NOTMUCH_VALUE_MESSAGE_ID
+NOTMUCH_VALUE_MESSAGE_ID,
+NOTMUCH_VALUE_FILESIZE
 } notmuch_value_t;

 /* Xapian (with flint backend) complains if we provide a term longer
@@ -193,6 

[notmuch] wish: more informative citations

2009-12-18 Thread David Bremner

Wouldn't it be nice if citations showed the first line or so of the text
being cited?

Stealing text from another thread, 

 > In case of a citation following immediately new contents. When the citation
 > was collapsed:
 > 
 > [1-line citation. Click/Enter to show.]
 > Lorem ipsum dolor sit amet, consectetur adipisicin

Would be displayed as something like:

> In case of a citation following [ Click/Enter to show 3 more lines ]

Actually I'm not too sure about the format, but I thought I'd through
that out there.

Happy hacking,

David


[notmuch] [PATCH] Add an "--output=(json|text|)" command-line option to both notmuch-search and notmuch-show.

2009-12-18 Thread David Bremner
On Fri, 18 Dec 2009 09:33:43 -0800, Carl Worth  wrote:

> I think that selecting *what* to emit is orthogonal from selecting *how*
> to format that output. 

I can see that point of view.

> See some ideas in the TODO file, (where I proposed --for and --format
> options for these).

It's a detail, but could you choose two names that are not substrings of
each other?  Eventually we do want tab completion on the command line to
work :).  Also, "search --for tags foo" suggests to me that
searching for tags matching foo.  What about using --output for that?
One thing that is not completely clear to me at this point is what the
difference is between 

notmuch search --for messages  search-terms

and 

notmuch show search-terms

David


[notmuch] Missing messages breaking threads

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 11:41:18 -0800, Carl Worth  wrote:
> On Fri, 18 Dec 2009 19:02:21 +, James Westby  jameswestby.net> wrote:
> > Therefore I'd like to fix this. The obvious way is to
> > introduce documents in to the db for each id we see, and
> > threading should then naturally work better.
> 
> That sounds like a fine idea.

Good, at least I'm not totally off the map.

> > The only issue I see with doing this is with mail delays.
> > Once we do this we will sometimes receive a message that
> > already has a dummy document. What happens currently with
> > message-id collisions?
> 
> The current message-ID collision logic is pretty brain-dead. It just
> says "Oh, I've seen a file with this message before, so I'll skip this
> additional file".
> 
> But I'm just putting the finishing touches on a patch that instead does:
> 
>   Oh, and here's an additional filename for that message ID. Add
>   that too, please.
> 
> Beyond that, all we would need to do as well is to also index the new
> content. I don't want to do useless re-indexing when files just get
> renamed. So maybe all we need to do is to save the filesize of the
> last-indexed file for a document and then when we encounter a file with
> the same message ID and a larger file size, then index it as well?

I would say different file size, but I imagine larger is the majority
of interesting cases.

> That would even take care of providing the opportunity to index
> additional mailing-list-added content for messages also sent directly
> via CC.
> 
> The file-size heuristic wouldn't be perfect for these other cases. I
> guess we save a list of sha-1 sums for indexed files or so, (assuming
> that's cheaper than just re-indexing---before the Xapian Defect 250 fix
> I'm sure it is, but after I'm not sure---we maybe should just always
> re-index---but I think I have seen the TermGenerator appear in profiles
> of indexing runs.)

I'm not sure this is needed too much, but would obviously be
correct.

On Xapian 250, I have a very slow spinning disk, and it was hitting
me hard, making processing my inbox far too slow. I built Xapian SVN
with the patch from the bug and it is now lightning fast, so
consider this another endorsement. I also tried the supplemental
patch and it showed no further improvement for notmuch tag.

> >   * When we get a message-id conflict check for dummy:True
> > and replace the document if it is there.
> > 
> > How does this sound?
> 
> That sounds fine. It's the same as what I propose above with
> "filesize:0" instead of "dummy:true".

That works. However, we would want the old content to go away
in these cases wouldn't we.

Or do we not index whatever dummy text we add? Or do we not
even put it in? Or not even show it at all? I was just thinking
of having "Missing messages..." showing up as the start of
the thread, but maybe it's no needed.

> > There could be an issue with synthesising too many threads
> > and then ending up having to try and put a message in two
> > threads? I see there is code for merging threads, would that
> > handle this?
> 
> It should, yes.
> 
> The current logic is that a message can only appear in a single
> thread. So if a message has children or parents with distinct thread IDs
> then those threads are merged.
> 
> I can imagine some strange cross-posting scenario where one could argue
> that the merging shouldn't happen, but I'm not sure we want to try to
> respect that.

Fair enough.

So, to summarise, I should first look at storing filesizes, then
the collision code to make it index further when the filesize grows,
and then finally the code to add documents for missing messages?

The only thing I am unclear on is how to handle existing databases?
Do we have any concept of versioning? Or should I just assume that
filesize: may not be in the document and act appropriately?

Thanks,

James



[notmuch] wish: more informative citations

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 20:47:20 -0400, David Bremner  wrote:
> Would be displayed as something like:
> 
> > In case of a citation following [ Click/Enter to show 3 more lines ]
> 
> Actually I'm not too sure about the format, but I thought I'd through
> that out there.

That's a fine idea. Along with this would be getting rid of the
stupidity of displaying [1 line citation] rather than just displaying
the citation itself!

And I really want my keybinding for displaying all the citations in the
current message. And code to recognize top-posted copies as citations
and hiding that. And, and...

-Carl

...and more time to do all this stuff. I've got an ever-growing TODO
list and a backlog of patches to be reviewed that goes back several
weeks now. I'm hoping that I'll be able to sneak some time over the
upcoming holidays...
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/0ce61e8b/attachment.pgp>


[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Sat, 19 Dec 2009 00:08:24 +, James Westby <jw+debian at jameswestby.net> 
wrote:
> Thanks, I found delve, which at least showed that something was
> being stored. It's in the xapian-tools package, and
> 
>delve -V2 
> 
> prints out the filesize value for each document.

Ah, right. I had forgotten about that.

> It would be great if we could specify an alternative configuration
> file for testing so that I can set up a small maildir and test
> against that.

You can, actually. Just set the NOTMUCH_CONFIG environment variable to
your alternate configuration file. (And yes, we're missing any mention
of this in our documentation.)

> Correct, I hadn't read the documentation closely enough. After fixing
> that and doing some testing I have this working now. Patch incoming.

Cool!

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/b5785db1/attachment.pgp>


[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 21:21:03 +, James Westby <jw+debian at jameswestby.net> 
wrote:
>   Here's the first part, storing the filesize. I'm using
>   add_value so that we can make it sortable, is that valid
>   for retrieving it as well?

Yes, a value makes sense here and should make the value easy to
retrieve.

>   The only thing I'm not sure about is if it works. Is there
>   a way to inspect a document to see the values that are
>   stored?

I usually use a little tool I wrote called xapian-dump. It currently
exists only in the git history of notmuch. Look at commit:

22691064666c03c5e76bc787395bfe586929f4cc

or so.

> Doing a search isn't working, so I imagine I made a mistake.

Let's see... (just reviewing here, not testing)..

> +struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
> +FilesizeValueRangeProcessor() {}
> +
> +Xapian::valueno operator()(std::string , std::string &) {
> +if (begin.substr(0, 9) != "filesize:")
> +return Xapian::BAD_VALUENO;
> +begin.erase(0, 9);
> +return NOTMUCH_VALUE_FILESIZE;
> +}
> +};

If the file size is just an integer, then you shouldn't need a custom
ValueRangeProcessor. One of the existing processors in Xapian should
work fine.

Having not ever written a custom processor, I can't say whether the one
above is correct or not.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/8d9da30f/attachment.pgp>


[notmuch] automatically assigning tags to new messages?

2009-12-18 Thread Marten Veldthuis
On Fri, 18 Dec 2009 22:21:54 +1100, Alex Ghitza  wrote:
> I heard about notmuch mail a few days ago and I started playing with
> it.  So far, it makes me very happy, but there are some things that I
> need to learn how to do.  I'll start with the most important one:
> tagging incoming messages automatically.

I've got a script somewhere which I invoke after each mail sync. I'll
give some examples from that for your list below.

> What is the recommended way of achieving this?  There is a variety of
> things that I would like to automatically do to incoming mail:
> - if it comes from a mailing list (e.g. notmuch), tag it +notmuch and
>   +unread, but not +inbox

notmuch tag +list +notmuch -inbox  to:notmuch at notmuchmail.organd not 
tag:notmuch and tag:inbox

> - if it's one of the numerous pointless weekly newsletters that I'm
>   getting from my university, tag it +unimelb, but not +unread or
>   +inbox 

notmuch tag +unimelb -unread -inbox   to:foo and not tag:unimelb and tag:unread 
and tag:inbox

> - if it is coming from me, tag it +sent, but not +unread or +inbox

Not quite sure. Currently I'm not doing this, don't know if this is
possible within a single incantation of notmuch-tag. I think you
probably need a first search to get message ids, and then tag only those
message ids (doing it like the others above would tag all messages in
the thread with sent, which is probably not what you want).

-- 
- Marten


[notmuch] Missing messages breaking threads

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 19:53:13 +, James Westby <jw+debian at jameswestby.net> 
wrote:
> Or do we not index whatever dummy text we add? Or do we not
> even put it in? Or not even show it at all? I was just thinking
> of having "Missing messages..." showing up as the start of
> the thread, but maybe it's no needed.

Oh, I was assuming you wouldn't index any text. The UI can add "missing
message" for a document with no filename, for example.

> So, to summarise, I should first look at storing filesizes, then
> the collision code to make it index further when the filesize grows,
> and then finally the code to add documents for missing messages?

Some of the code areas to be touched will be changing soon, (at least as
far as when filenames appear and disappear). Hopefully I'll have
something posted for that sooner rather than later to avoid having to
redo too much work.

> The only thing I am unclear on is how to handle existing databases?
> Do we have any concept of versioning? Or should I just assume that
> filesize: may not be in the document and act appropriately?

My current, outstanding patch is going to be the first trigger for a
"flag day" where we'll all need to rewrite our databases.

We don't have any concept of versioning yet, but it would obviously be
easy to have a new version document with an increasing integer.

But even with my current patch I'm considering doing a graceful upgrade
of the database in-place rather than making the user do something like a
dump, delete, rebuild, restore. That would give a much better experience
than "Your database is out-of-date, please rebuild it", so we'll see if
I pursue that in the end.

-Carl


-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/185328f8/attachment.pgp>


[notmuch] Missing messages breaking threads

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 19:02:21 +, James Westby <jw+debian at jameswestby.net> 
wrote:
> I like the architecture of notmuch, and have just switched
> to using it as my primary client, so thanks.

You're quite welcome, James. Welcome to notmuch!

> Therefore I'd like to fix this. The obvious way is to
> introduce documents in to the db for each id we see, and
> threading should then naturally work better.

That sounds like a fine idea.

> The only issue I see with doing this is with mail delays.
> Once we do this we will sometimes receive a message that
> already has a dummy document. What happens currently with
> message-id collisions?

The current message-ID collision logic is pretty brain-dead. It just
says "Oh, I've seen a file with this message before, so I'll skip this
additional file".

But I'm just putting the finishing touches on a patch that instead does:

Oh, and here's an additional filename for that message ID. Add
that too, please.

Beyond that, all we would need to do as well is to also index the new
content. I don't want to do useless re-indexing when files just get
renamed. So maybe all we need to do is to save the filesize of the
last-indexed file for a document and then when we encounter a file with
the same message ID and a larger file size, then index it as well?

That would even take care of providing the opportunity to index
additional mailing-list-added content for messages also sent directly
via CC.

The file-size heuristic wouldn't be perfect for these other cases. I
guess we save a list of sha-1 sums for indexed files or so, (assuming
that's cheaper than just re-indexing---before the Xapian Defect 250 fix
I'm sure it is, but after I'm not sure---we maybe should just always
re-index---but I think I have seen the TermGenerator appear in profiles
of indexing runs.)

>   * When we get a message-id conflict check for dummy:True
> and replace the document if it is there.
> 
> How does this sound?

That sounds fine. It's the same as what I propose above with
"filesize:0" instead of "dummy:true".

> There could be an issue with synthesising too many threads
> and then ending up having to try and put a message in two
> threads? I see there is code for merging threads, would that
> handle this?

It should, yes.

The current logic is that a message can only appear in a single
thread. So if a message has children or parents with distinct thread IDs
then those threads are merged.

I can imagine some strange cross-posting scenario where one could argue
that the merging shouldn't happen, but I'm not sure we want to try to
respect that.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/5cda441f/attachment.pgp>


[notmuch] [PATCH] JSON output for notmuch-search and notmuch-show.

2009-12-18 Thread Scott Robinson
Excerpts from Carl Worth's message of Fri Dec 18 09:31:39 -0800 2009:
> [...]
> I don't know why, but I think I'd prefer --format for the name here.

ACK

> [...]
> It looks like the new documentation is missing that point, (and the man
> page in notmuch.1 is missing an update as well).

ACK

> [...]
> That's just fine. The old numbering semantics were quite bizarre and
> nothing I wanted to set it stone.

Cool. :-)

Resubmit a full patch, or submit another one on top of it?
-- 
Scott Robinson | http://quadhome.com/

Q: Why are my replies five sentences or less?
A: http://five.sentenc.es/


[notmuch] [PATCH] Add an "--output=(json|text|)" command-line option to both notmuch-search and notmuch-show.

2009-12-18 Thread Scott Robinson
Excerpts from Carl Worth's message of Fri Dec 18 09:33:43 -0800 2009:
> On Fri, 18 Dec 2009 08:59:55 -0400, david at tethera.net wrote:
> > It took me a little work to apply Scott's patch, so rather than asking
> > him to resend it from git-send-email, I am just sending. I hope no-one
> > is offended (much).
> 
> I think that's great! Collaboration is what this is all about.

Me too!

I've never used git-send-email. I'll give it a whirl on my next patch.

> > I'm thinking that the patch I sent out last night to only dump message
> > ids could be reworked to use the framework of this patch.  I also
> > think it would be reasonably simple to add an --output=mbox option,
> > for archiving and so on.
> 
> I think that selecting *what* to emit is orthogonal from selecting *how*
> to format that output. See some ideas in the TODO file, (where I
> proposed --for and --format options for these). Having a way to do mbox
> output for export would indeed be very nice.

Haha! I originally used "--format" and changed for some reason that escapes me
now.

Implementing an "mbox" formatted output in the current logic wouldn't be
archive perfect. The message body is emitted on a per-part basis.

What I would do is change the semantics of format->body to be called from
show_message. Then the text and json parts would point at the original
implementation passing off their per-part function pointers. And, a new mbox
implementation would just dump the full message body.
-- 
Scott Robinson | http://quadhome.com/

Q: Why are my replies five sentences or less?
A: http://five.sentenc.es/


[notmuch] First attempt to add smart completion in notmuch-search

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 16:00:49 +0100, racin at free.fr wrote:
> Here is a first attempt to add "smart completion" to notmuch-search.

Hi Matthieu,

This all sounds quite interesting!

I look forward to actually seeing the patch. ;-)

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/a322e374/attachment.pgp>


[notmuch] automatically assigning tags to new messages?

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 12:54:39 +0100, Marten Veldthuis  
wrote:
> > - if it is coming from me, tag it +sent, but not +unread or +inbox
> 
> Not quite sure. Currently I'm not doing this, don't know if this is
> possible within a single incantation of notmuch-tag. I think you
> probably need a first search to get message ids, and then tag only those
> message ids (doing it like the others above would tag all messages in
> the thread with sent, which is probably not what you want).

Hi Marten,

I'm not sure what's different about this case. A command like those you
provided earlier should work fine.

The "notmuch tag" command only tags individual messages explicitly
matched by the search terms. It never expands the tagging to unmatched
messages in the same thread.

-Carl

PS. I've talked before about allowing for the configuration file to do
automatic tagging of messages. I've also talked about making something
like "virtual tags" where any automatically-applied tags would act
somehow differently than standard flags.

More recently, my thinking is taking me away from both of those ideas. I
think now that what I want in the configuration file is simply a set of
saved search strings. Something like:

[search]
interesting = to:notmuchmail.org and not from:cworth

and then this could be used within a search string such as:

notmuch show search:interesting

This would make it very clear that "saved searches" are separate from
tags, and you might very well want to combine them in a single search:

notmuch show search:interesting or tag:interesting

As I think about this, I think these saved searches could displace much
of my use of tags, (at least all of the tags which I'm automatically
applying in the script I run after "notmuch new"). The big difference
would be that the UI wouldn't provide an indication of a message
matching particular saved searches the way it does for tags. But I might
actually prefer that, (since currently, I have so many
automatically-applied tags on every message that the display is often
just a lot of noise).

Anyway, that's something I plan to experiment with.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/bcd45838/attachment.pgp>


[notmuch] Rather simple optimization for notmuch tag

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson  
wrote:
> I was updating my poll script that tags messages, and a common idiom is
> to put
>  tag +mytag  and not tag:mytag
> 
> I don't know anything about efficiency, but for the simple single-tag
> case, couldn't we imply the "and not tag:mytag" from the +mytag action
> list for the tag command?

On one level, it really shouldn't be a performance issue to tag messages
that already have a particular tag. (And in fact, the recently proposed
patches to fix Xapian defect 250 even address this I think.)

In the meantime, it is fairly annoying to have to type this, and yes,
the tag command could infer that and append it to the search string
automatically. That's a good idea, really.

> The similar (dual?, rusty math terminology, beware of Math-tetanus) case
> of "tag -mytag  and tag:mytag" could be similarly optimized,
> since the tag removal action ought to be a null action in the case that
> the search terms matched on a thread or message, but the tag to be
> removed isn't attached to the message/thread returned.

Yes, that would work too.

One potential snag with both ideas is that the "notmuch tag"
command-line as currently implemented allows for multiple tag additions
and removals with a single search. So the optimization here couldn't be
used unless there was just a single tag action.

So that's another reason to really just want the lower-level
optimization to be in place.

-Carl

-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/00565d5b/attachment.pgp>


[notmuch] [PATCH] Add an "--output=(json|text|)" command-line option to both notmuch-search and notmuch-show.

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 08:59:55 -0400, david at tethera.net wrote:
> It took me a little work to apply Scott's patch, so rather than asking
> him to resend it from git-send-email, I am just sending. I hope no-one
> is offended (much).

I think that's great! Collaboration is what this is all about.

> I'm thinking that the patch I sent out last night to only dump message
> ids could be reworked to use the framework of this patch.  I also
> think it would be reasonably simple to add an --output=mbox option,
> for archiving and so on.

I think that selecting *what* to emit is orthogonal from selecting *how*
to format that output. See some ideas in the TODO file, (where I
proposed --for and --format options for these). Having a way to do mbox
output for export would indeed be very nice.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091218/55d68a46/attachment.pgp>


[notmuch] [PATCH] Add an "--output=(json|text|)" command-line option to both notmuch-search and notmuch-show.

2009-12-18 Thread da...@tethera.net
From: Scott Robinson 

In the case of notmuch-show, "--output=json" also implies
"--entire-thread" as the thread structure is implicit in the emitted
document tree.

As a coincidence to the implementation, multipart message ID numbers are
now incremented with each part printed. This changes the previous
semantics, which were unclear and not necessary related to the actual
ordering of the message parts.

Edited-By: David Bremner 
Reviewed-By: David Bremner 
---

It took me a little work to apply Scott's patch, so rather than asking
him to resend it from git-send-email, I am just sending. I hope no-one
is offended (much).

Other than manually extracting the patch from the output of notmuch
show (for me the message arrived base64 encoded), I deleted trailing
whitespace on line 465. 

It compiles, it doesn't seem to screw up the original output, and at
least in a few tests, it generates parseable json. Yay!.

I'm thinking that the patch I sent out last night to only dump message
ids could be reworked to use the framework of this patch.  I also
think it would be reasonably simple to add an --output=mbox option,
for archiving and so on.

 Makefile.local   |3 +-
 json.c   |   73 ++
 notmuch-client.h |3 +
 notmuch-search.c |  163 +---
 notmuch-show.c   |  275 ++
 notmuch.c|   24 --
 show-message.c   |4 +-
 7 files changed, 481 insertions(+), 64 deletions(-)
 create mode 100644 json.c

diff --git a/Makefile.local b/Makefile.local
index 933ff4c..53b474b 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -18,7 +18,8 @@ notmuch_client_srcs = \
notmuch-tag.c   \
notmuch-time.c  \
query-string.c  \
-   show-message.c
+   show-message.c  \
+   json.c

 notmuch_client_modules = $(notmuch_client_srcs:.c=.o)
 notmuch: $(notmuch_client_modules) lib/notmuch.a
diff --git a/json.c b/json.c
new file mode 100644
index 000..ee563d6
--- /dev/null
+++ b/json.c
@@ -0,0 +1,73 @@
+/* notmuch - Not much of an email program, (just index and search)
+ *
+ * Copyright ?? 2009 Carl Worth
+ * Copyright ?? 2009 Keith Packard
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Authors: Carl Worth 
+ * Keith Packard 
+ */
+
+#include "notmuch-client.h"
+
+/*
+ * json_quote_str derived from cJSON's print_string_ptr,
+ * Copyright (c) 2009 Dave Gamble
+ */
+
+char *
+json_quote_str(const void *ctx, const char *str)
+{
+const char *ptr;
+char *ptr2;
+char *out;
+int len = 0;
+
+if (!str)
+   return NULL;
+
+for (ptr = str; *ptr; len++, ptr++) {
+   if (*ptr < 32 || *ptr == '\"' || *ptr == '\\')
+   len++;
+}
+
+out = talloc_array (ctx, char, len + 3);
+
+ptr = str;
+ptr2 = out;
+
+*ptr2++ = '\"';
+while (*ptr) {
+   if (*ptr > 31 && *ptr != '\"' && *ptr != '\\') {
+   *ptr2++ = *ptr++;
+   } else {
+   *ptr2++ = '\\';
+   switch (*ptr++) {
+   case '\"':  *ptr2++ = '\"'; break;
+   case '\\':  *ptr2++ = '\\'; break;
+   case '\b':  *ptr2++ = 'b';  break;
+   case '\f':  *ptr2++ = 'f';  break;
+   case '\n':  *ptr2++ = 'n';  break;
+   case '\r':  *ptr2++ = 'r';  break;
+   case '\t':  *ptr2++ = 't';  break;
+   default: ptr2--;break;
+   }
+   }
+}
+*ptr2++ = '\"';
+*ptr2++ = '\0';
+
+return out;
+}
diff --git a/notmuch-client.h b/notmuch-client.h
index 50a30fe..7b844b9 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -143,6 +143,9 @@ notmuch_status_t
 show_message_body (const char *filename,
   void (*show_part) (GMimeObject *part, int *part_count));

+char *
+json_quote_str (const void *ctx, const char *str);
+
 /* notmuch-config.c */

 typedef struct _notmuch_config notmuch_config_t;
diff --git a/notmuch-search.c b/notmuch-search.c
index dc44eb6..e243747 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -20,8 +20,120 @@

 #include "notmuch-client.h"

+typedef struct search_format {
+const char *results_start;
+const char *thread_start;
+void (*thread) (const void *ctx,
+  

[notmuch] automatically assigning tags to new messages?

2009-12-18 Thread Alex Ghitza

Dear notmuch crowd,

I heard about notmuch mail a few days ago and I started playing with
it.  So far, it makes me very happy, but there are some things that I
need to learn how to do.  I'll start with the most important one:
tagging incoming messages automatically.

What is the recommended way of achieving this?  There is a variety of
things that I would like to automatically do to incoming mail:
- if it comes from a mailing list (e.g. notmuch), tag it +notmuch and
  +unread, but not +inbox
- if it's one of the numerous pointless weekly newsletters that I'm
  getting from my university, tag it +unimelb, but not +unread or +inbox 
- if it is coming from me, tag it +sent, but not +unread or +inbox

There's probably more, but this is a good start.  Any advice?

[My current email setup is as follows: get messages from the Gmail
account using fetchmail; run procmail with a bunch of recipes that put
the messages into various maildirs; read the results with mutt.  (Oh,
and send mail with msmtp from mutt.)]


Best,
Alex


-- 
Alex Ghitza -- Lecturer in Mathematics -- The University of Melbourne
-- Australia -- http://www.ms.unimelb.edu.au/~aghitza/
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] automatically assigning tags to new messages?

2009-12-18 Thread Marten Veldthuis
On Fri, 18 Dec 2009 22:21:54 +1100, Alex Ghitza aghi...@gmail.com wrote:
 I heard about notmuch mail a few days ago and I started playing with
 it.  So far, it makes me very happy, but there are some things that I
 need to learn how to do.  I'll start with the most important one:
 tagging incoming messages automatically.

I've got a script somewhere which I invoke after each mail sync. I'll
give some examples from that for your list below.

 What is the recommended way of achieving this?  There is a variety of
 things that I would like to automatically do to incoming mail:
 - if it comes from a mailing list (e.g. notmuch), tag it +notmuch and
   +unread, but not +inbox

notmuch tag +list +notmuch -inbox  to:notmuch@notmuchmail.organd not 
tag:notmuch and tag:inbox

 - if it's one of the numerous pointless weekly newsletters that I'm
   getting from my university, tag it +unimelb, but not +unread or
   +inbox 

notmuch tag +unimelb -unread -inbox   to:foo and not tag:unimelb and tag:unread 
and tag:inbox

 - if it is coming from me, tag it +sent, but not +unread or +inbox

Not quite sure. Currently I'm not doing this, don't know if this is
possible within a single incantation of notmuch-tag. I think you
probably need a first search to get message ids, and then tag only those
message ids (doing it like the others above would tag all messages in
the thread with sent, which is probably not what you want).

-- 
- Marten
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH] JSON output for notmuch-search and notmuch-show.

2009-12-18 Thread Carl Worth
On Thu, 17 Dec 2009 21:33:54 -0800, Scott Robinson sc...@quadhome.com wrote:
 I took an earlier suggestion and didn't use cJSON, instead writing custom code
 for emitting the new format.

Nice! I have a few comments below.

 Added an --output=(json|text|) command-line option to both
 notmuch-search and notmuch-show.

I don't know why, but I think I'd prefer --format for the name here.

 In the case of notmuch-show, --output=json also implies
 --entire-thread as the thread structure is implicit in the emitted
 document tree.

It looks like the new documentation is missing that point, (and the man
page in notmuch.1 is missing an update as well).

 As a coincidence to the implementation, multipart message ID numbers are
 now incremented with each part printed. This changes the previous
 semantics, which were unclear and not necessary related to the actual
 ordering of the message parts.

That's just fine. The old numbering semantics were quite bizarre and
nothing I wanted to set it stone.

-Carl


pgpQHJQbXaM0f.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH] Add an --output=(json|text|) command-line option to both notmuch-search and notmuch-show.

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 08:59:55 -0400, da...@tethera.net wrote:
 It took me a little work to apply Scott's patch, so rather than asking
 him to resend it from git-send-email, I am just sending. I hope no-one
 is offended (much).

I think that's great! Collaboration is what this is all about.

 I'm thinking that the patch I sent out last night to only dump message
 ids could be reworked to use the framework of this patch.  I also
 think it would be reasonably simple to add an --output=mbox option,
 for archiving and so on.

I think that selecting *what* to emit is orthogonal from selecting *how*
to format that output. See some ideas in the TODO file, (where I
proposed --for and --format options for these). Having a way to do mbox
output for export would indeed be very nice.

-Carl


pgp3wBJvDcTzl.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Rather simple optimization for notmuch tag

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 00:49:00 -0700, Mark Anderson markr.ander...@amd.com 
wrote:
 I was updating my poll script that tags messages, and a common idiom is
 to put
  tag +mytag search_terms and not tag:mytag
 
 I don't know anything about efficiency, but for the simple single-tag
 case, couldn't we imply the and not tag:mytag from the +mytag action
 list for the tag command?

On one level, it really shouldn't be a performance issue to tag messages
that already have a particular tag. (And in fact, the recently proposed
patches to fix Xapian defect 250 even address this I think.)

In the meantime, it is fairly annoying to have to type this, and yes,
the tag command could infer that and append it to the search string
automatically. That's a good idea, really.

 The similar (dual?, rusty math terminology, beware of Math-tetanus) case
 of tag -mytag search-terms and tag:mytag could be similarly optimized,
 since the tag removal action ought to be a null action in the case that
 the search terms matched on a thread or message, but the tag to be
 removed isn't attached to the message/thread returned.

Yes, that would work too.

One potential snag with both ideas is that the notmuch tag
command-line as currently implemented allows for multiple tag additions
and removals with a single search. So the optimization here couldn't be
used unless there was just a single tag action.

So that's another reason to really just want the lower-level
optimization to be in place.

-Carl



pgpX3lcvcBjhJ.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Missing messages breaking threads

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 19:53:13 +, James Westby jw+deb...@jameswestby.net 
wrote:
 Or do we not index whatever dummy text we add? Or do we not
 even put it in? Or not even show it at all? I was just thinking
 of having Missing messages... showing up as the start of
 the thread, but maybe it's no needed.

Oh, I was assuming you wouldn't index any text. The UI can add missing
message for a document with no filename, for example.

 So, to summarise, I should first look at storing filesizes, then
 the collision code to make it index further when the filesize grows,
 and then finally the code to add documents for missing messages?

Some of the code areas to be touched will be changing soon, (at least as
far as when filenames appear and disappear). Hopefully I'll have
something posted for that sooner rather than later to avoid having to
redo too much work.

 The only thing I am unclear on is how to handle existing databases?
 Do we have any concept of versioning? Or should I just assume that
 filesize: may not be in the document and act appropriately?

My current, outstanding patch is going to be the first trigger for a
flag day where we'll all need to rewrite our databases.

We don't have any concept of versioning yet, but it would obviously be
easy to have a new version document with an increasing integer.

But even with my current patch I'm considering doing a graceful upgrade
of the database in-place rather than making the user do something like a
dump, delete, rebuild, restore. That would give a much better experience
than Your database is out-of-date, please rebuild it, so we'll see if
I pursue that in the end.

-Carl




pgpXvyQBVFou6.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
When indexing a message store the filesize along with it so that
when we store all the filenames for a message-id we can know if
any of them have different content cheaply.

The value stored is defined to be the largest filesize of any
of the files for that message.

This changes the API for efficiency reasons. The size is often
known to the caller, and so we save a second stat by asking them
to provide it. If they don't know it they can pass -1 and the
stat will be done for them.

We store the filesize such that we can query a range. Thus it
would be possible to query filesize:0..100 if you somehow
knew the raw message was less that 100 bytes.
---

  Here's the first part, storing the filesize. I'm using
  add_value so that we can make it sortable, is that valid
  for retrieving it as well?

  The only thing I'm not sure about is if it works. Is there
  a way to inspect a document to see the values that are
  stored? Doing a search isn't working, so I imagine I made
  a mistake.

  Thanks,

  James

 lib/database.cc   |   17 +
 lib/message.cc|   25 +
 lib/notmuch-private.h |8 +++-
 lib/notmuch.h |5 +
 notmuch-new.c |2 +-
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b6c4d07..0ec77cd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -454,6 +454,17 @@ notmuch_database_create (const char *path)
 return notmuch;
 }
 
+struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
+FilesizeValueRangeProcessor() {}
+
+Xapian::valueno operator()(std::string begin, std::string ) {
+if (begin.substr(0, 9) != filesize:)
+return Xapian::BAD_VALUENO;
+begin.erase(0, 9);
+return NOTMUCH_VALUE_FILESIZE;
+}
+};
+
 notmuch_database_t *
 notmuch_database_open (const char *path,
   notmuch_database_mode_t mode)
@@ -463,6 +474,7 @@ notmuch_database_open (const char *path,
 struct stat st;
 int err;
 unsigned int i;
+FilesizeValueRangeProcessor filesize_proc;
 
 if (asprintf (notmuch_path, %s/%s, path, .notmuch) == -1) {
notmuch_path = NULL;
@@ -508,6 +520,7 @@ notmuch_database_open (const char *path,
notmuch-query_parser-set_stemmer (Xapian::Stem (english));
notmuch-query_parser-set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch-query_parser-add_valuerangeprocessor 
(notmuch-value_range_processor);
+   notmuch-query_parser-add_valuerangeprocessor (filesize_proc);
 
for (i = 0; i  ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = BOOLEAN_PREFIX_EXTERNAL[i];
@@ -889,6 +902,7 @@ _notmuch_database_link_message (notmuch_database_t *notmuch,
 notmuch_status_t
 notmuch_database_add_message (notmuch_database_t *notmuch,
  const char *filename,
+ const off_t size,
  notmuch_message_t **message_ret)
 {
 notmuch_message_file_t *message_file;
@@ -992,6 +1006,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
_notmuch_message_set_filename (message, filename);
_notmuch_message_add_term (message, type, mail);
+   ret = _notmuch_message_set_filesize (message, filename, size);
+   if (ret)
+   goto DONE;
} else {
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..2bfc5ed 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -426,6 +426,31 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 message-doc.set_data (s);
 }
 
+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+struct stat st;
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+if (realsize  0) {
+   if (stat (filename, st)) {
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+   } else {
+   realsize = st.st_size;
+   }
+}
+
+message-doc.add_value (NOTMUCH_VALUE_FILESIZE,
+Xapian::sortable_serialise (realsize));
+
+  DONE:
+return ret;
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 116f63d..1ba3055 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -100,7 +100,8 @@ _internal_error (const char *format, ...) PRINTF_ATTRIBUTE 
(1, 2);
 
 typedef enum {
 NOTMUCH_VALUE_TIMESTAMP = 0,
-NOTMUCH_VALUE_MESSAGE_ID
+NOTMUCH_VALUE_MESSAGE_ID,
+NOTMUCH_VALUE_FILESIZE
 } notmuch_value_t;
 
 /* Xapian (with flint backend) complains if we provide a term longer
@@ 

Re: [notmuch] Missing messages breaking threads

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 12:52:58 -0800, Carl Worth cwo...@cworth.org wrote:
 On Fri, 18 Dec 2009 19:53:13 +, James Westby jw+deb...@jameswestby.net 
 wrote:
 Oh, I was assuming you wouldn't index any text. The UI can add missing
 message for a document with no filename, for example.

Works for me.

  So, to summarise, I should first look at storing filesizes, then
  the collision code to make it index further when the filesize grows,
  and then finally the code to add documents for missing messages?
 
 Some of the code areas to be touched will be changing soon, (at least as
 far as when filenames appear and disappear). Hopefully I'll have
 something posted for that sooner rather than later to avoid having to
 redo too much work.

That would be great. I'm learning all the code anyway, so there's not
a whole lot of knowledge being thrown away.

I've just sent an initial cut at the fist step.

  The only thing I am unclear on is how to handle existing databases?
  Do we have any concept of versioning? Or should I just assume that
  filesize: may not be in the document and act appropriately?
 
 My current, outstanding patch is going to be the first trigger for a
 flag day where we'll all need to rewrite our databases.
 
 We don't have any concept of versioning yet, but it would obviously be
 easy to have a new version document with an increasing integer.
 
 But even with my current patch I'm considering doing a graceful upgrade
 of the database in-place rather than making the user do something like a
 dump, delete, rebuild, restore. That would give a much better experience
 than Your database is out-of-date, please rebuild it, so we'll see if
 I pursue that in the end.

That sounds nice, I'd certainly prefer this sort of thing as it evolves.

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread Carl Worth
On Fri, 18 Dec 2009 21:21:03 +, James Westby jw+deb...@jameswestby.net 
wrote:
   Here's the first part, storing the filesize. I'm using
   add_value so that we can make it sortable, is that valid
   for retrieving it as well?

Yes, a value makes sense here and should make the value easy to
retrieve.

   The only thing I'm not sure about is if it works. Is there
   a way to inspect a document to see the values that are
   stored?

I usually use a little tool I wrote called xapian-dump. It currently
exists only in the git history of notmuch. Look at commit:

22691064666c03c5e76bc787395bfe586929f4cc

or so.

 Doing a search isn't working, so I imagine I made a mistake.

Let's see... (just reviewing here, not testing)..

 +struct FilesizeValueRangeProcessor : public Xapian::ValueRangeProcessor {
 +FilesizeValueRangeProcessor() {}
 +
 +Xapian::valueno operator()(std::string begin, std::string ) {
 +if (begin.substr(0, 9) != filesize:)
 +return Xapian::BAD_VALUENO;
 +begin.erase(0, 9);
 +return NOTMUCH_VALUE_FILESIZE;
 +}
 +};

If the file size is just an integer, then you shouldn't need a custom
ValueRangeProcessor. One of the existing processors in Xapian should
work fine.

Having not ever written a custom processor, I can't say whether the one
above is correct or not.

-Carl


pgp7QrUqZ9sn5.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] wish: more informative citations

2009-12-18 Thread David Bremner

Wouldn't it be nice if citations showed the first line or so of the text
being cited?

Stealing text from another thread, 

  In case of a citation following immediately new contents. When the citation
  was collapsed:
  
  [1-line citation. Click/Enter to show.]
  Lorem ipsum dolor sit amet, consectetur adipisicin

Would be displayed as something like:

 In case of a citation following [ Click/Enter to show 3 more lines ]

Actually I'm not too sure about the format, but I thought I'd through
that out there.

Happy hacking,

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Reindex larger files that duplicate ids we have

2009-12-18 Thread James Westby
When we see a message where we already have the file
id stored, check if the size is larger. If it is then
re-index and set the file size and name to be the
new message.
---

  Here's the (quite simple) patch to implement indexing the
  largest copy of each mail that we have.

  Does the re-indexing replace the old terms? In the case
  where you had a collision with different text this could
  make a search return mails that don't contain that text.
  I don't think it's a big issue though, even if that is the
  case.

  Thanks,

  James

 lib/database.cc   |4 +++-
 lib/index.cc  |   27 +++
 lib/message.cc|   31 ++-
 lib/notmuch-private.h |   13 +
 lib/notmuch.h |5 +++--
 5 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index d834d94..64f29b9 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1000,7 +1000,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (ret)
goto DONE;
} else {
-   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+   ret = _notmuch_message_possibly_reindex (message, filename, size);
+   if (!ret)
+   ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
goto DONE;
}
 
diff --git a/lib/index.cc b/lib/index.cc
index 125fa6c..14c3268 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -312,3 +312,30 @@ _notmuch_message_index_file (notmuch_message_t *message,
 
 return ret;
 }
+
+notmuch_status_t
+_notmuch_message_possibly_reindex (notmuch_message_t *message,
+const char *filename,
+const off_t size)
+{
+off_t realsize = size;
+off_t stored_size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+ret = _notmuch_message_size_on_disk (message, filename, realsize);
+if (ret)
+goto DONE;
+stored_size = _notmuch_message_get_filesize (message);
+if (realsize  stored_size) {
+   ret = _notmuch_message_index_file (message, filename);
+   if (ret)
+   goto DONE;
+   ret = _notmuch_message_set_filesize (message, filename, realsize);
+   _notmuch_message_set_filename (message, filename);
+   _notmuch_message_sync (message);
+}
+
+  DONE:
+return ret;
+
+}
diff --git a/lib/message.cc b/lib/message.cc
index 2bfc5ed..cc32741 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -427,23 +427,38 @@ _notmuch_message_set_filename (notmuch_message_t *message,
 }
 
 notmuch_status_t
-_notmuch_message_set_filesize (notmuch_message_t *message,
+_notmuch_message_size_on_disk (notmuch_message_t *message,
   const char *filename,
-  const off_t size)
+  off_t *size)
 {
 struct stat st;
-off_t realsize = size;
 notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
 
-if (realsize  0) {
+if (*size  0) {
if (stat (filename, st)) {
ret = NOTMUCH_STATUS_FILE_ERROR;
goto DONE;
} else {
-   realsize = st.st_size;
+   *size = st.st_size;
}
 }
 
+  DONE:
+return ret;
+}
+
+notmuch_status_t
+_notmuch_message_set_filesize (notmuch_message_t *message,
+  const char *filename,
+  const off_t size)
+{
+off_t realsize = size;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
+
+ret = _notmuch_message_size_on_disk (message, filename, realsize);
+if (ret)
+goto DONE;
+
 message-doc.add_value (NOTMUCH_VALUE_FILESIZE,
 Xapian::sortable_serialise (realsize));
 
@@ -451,6 +466,12 @@ _notmuch_message_set_filesize (notmuch_message_t *message,
 return ret;
 }
 
+off_t
+_notmuch_message_get_filesize (notmuch_message_t *message)
+{
+return Xapian::sortable_unserialise (message-doc.get_value 
(NOTMUCH_VALUE_FILESIZE));
+}
+
 const char *
 notmuch_message_get_filename (notmuch_message_t *message)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 1ba3055..cf65fd9 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -199,6 +199,14 @@ _notmuch_message_set_filesize (notmuch_message_t *message,
   const char *filename,
   const off_t size);
 
+off_t
+_notmuch_message_get_filesize (notmuch_message_t *message);
+
+notmuch_status_t
+_notmuch_message_size_on_disk (notmuch_message_t *message,
+  const char *filename,
+  off_t *size);
+
 void
 _notmuch_message_ensure_thread_id (notmuch_message_t *message);
 
@@ -218,6 +226,11 @@ notmuch_status_t
 _notmuch_message_index_file (notmuch_message_t *message,
 const char *filename);
 
+notmuch_status_t
+_notmuch_message_possibly_reindex (notmuch_message_t *message,
+const 

Re: [notmuch] [PATCH] Store the size of the file for each message

2009-12-18 Thread James Westby
On Fri, 18 Dec 2009 16:57:16 -0800, Carl Worth cwo...@cworth.org wrote:
 You can, actually. Just set the NOTMUCH_CONFIG environment variable to
 your alternate configuration file. (And yes, we're missing any mention
 of this in our documentation.)

Sweet. Where would be the best place to document it? Just in the
man page?

Thanks,

James
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Fix-up some outdated comments.

2009-12-18 Thread James Westby
---
 lib/message.cc |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index cc32741..7129d59 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -391,7 +391,7 @@ notmuch_message_get_replies (notmuch_message_t *message)
  * multiple filenames for email messages with identical message IDs.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 void
 _notmuch_message_set_filename (notmuch_message_t *message,
   const char *filename)
@@ -622,7 +622,7 @@ _notmuch_message_close (notmuch_message_t *message)
  * names to prefix values.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 notmuch_private_status_t
 _notmuch_message_add_term (notmuch_message_t *message,
   const char *prefix_name,
@@ -679,7 +679,7 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
  * names to prefix values.
  *
  * This change will not be reflected in the database until the next
- * call to _notmuch_message_set_sync. */
+ * call to _notmuch_message_sync. */
 notmuch_private_status_t
 _notmuch_message_remove_term (notmuch_message_t *message,
  const char *prefix_name,
-- 
1.6.3.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] keeping a copy of sent mail locally

2009-12-18 Thread Alex Ghitza

Hello,

Many thanks to Marten and Carl for the advice on using scripts for
assigning tags automatically.  It works like a charm.

The next hurdle seems to be dealing with sent mail.  I would like each
message that I send to be saved in my local mail folder and treated the
same as all my other messages -- so it will get indexed and put in the
right thread, etc.  (For example, right now the thread that started with
my question about automatic tags only has the two replies in it, and its
subject is Re: [notmuch] automatically...)  Bcc-ing myself on every
sent message is suboptimal for a number of reasons: (1) gmail throws
away the bcc-ed copy since it has the same message id as the one sitting
in the gmail sent mail, and so the bcc-ed copy never makes it back to my
local mail;  (2) even if this was working, it would be an unnecessary
waste of bandwidth. 

After looking around for a little bit, the only other option I could see
was to use the FCC mail header.  Unfortunately this wants a filename to
save to (rather than just a directory); so I have to manually add the
FCC: header, put in a filename that doesn't yet exist, type 'y' to
confirm that I want the file to be created.  It would be great if I
could just set the directory where sent mail should go to as a global
option, and then everything would happen automatically without any more
effort from me.

I realise that this is more of an emacs question than a notmuch
question, but I'm hoping that somebody on this list has an elegant
solution to this.


Best,
Alex


-- 
Alex Ghitza -- Lecturer in Mathematics -- The University of Melbourne
-- Australia -- http://www.ms.unimelb.edu.au/~aghitza/
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch