A tool for printing from notmuch

2011-01-29 Thread Sebastian Spaeth
On Sat, 29 Jan 2011 15:09:14 -0500, Jesse Rosenthal  
wrote:
> So BS is the best I could find for this job

No doubt. I once tried to scrape http://theeconomist.com. It has so
broken html that all parsers broke down. BeautifulSoup at least made it
through and didn't completely fail. so I agree it is the best thing for
surely broken html email

Sebastian
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/4e2d0240/attachment.pgp>


A tool for printing from notmuch

2011-01-29 Thread Sebastian Spaeth
On Fri, 28 Jan 2011 15:25:28 -0500, Jesse Rosenthal  
wrote:
> Dear all,
> 
> Printing from notmuch is a bit of a pain.

Hi Jesse,

that sounds like a fantastic solution and I will look into it, so far I
have been printing the buffers which does not include attachments at
all.

I prefer to not have dependencies outside the std lib in python, but for
xml/html parsing, there is really nothing appropriate, it seems.

Thanks

Sebastian
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/5fc38473/attachment.pgp>


[PATCH 0/4] Versatile date/time parser

2011-01-29 Thread Sebastian Spaeth
On Sat, 29 Jan 2011 19:50:57 +0100, Michal Sojka  wrote:
> On Sat, 29 Jan 2011, Tom Prince wrote:
> > On 2011-01-23, Michal Sojka wrote:
> > > Hi all,
> > > 
> > > the following patch series brings into notmuch date/time parser stolen
> > > from GNU coreutils. It can be applied on top of custom query parser
> > > patches from Austin Clements.
> > > 
> > > This is RFC and it not meant for merging.
> > 
> > Another source for date parsing is perhaps date.c from git, which
> > (probably) has much smaller (none?) dependencies.
> 
> Hmm, but Git is GPLv2 and notmuch is GPLv3 and these are not compatible.

Keith Packard once sent date parsing code to this list which I had used
to create my date parsing branch (feels like ages ago). Perhaps that
might be useful? I don't have a message id handy though.

Sebastian
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/4d94196e/attachment.pgp>


[PATCH 0/4] Versatile date/time parser

2011-01-29 Thread Michal Sojka
On Sat, 29 Jan 2011, Tom Prince wrote:
> On 2011-01-23, Michal Sojka wrote:
> > Hi all,
> > 
> > the following patch series brings into notmuch date/time parser stolen
> > from GNU coreutils. It can be applied on top of custom query parser
> > patches from Austin Clements.
> > 
> > This is RFC and it not meant for merging.
> 
> Another source for date parsing is perhaps date.c from git, which
> (probably) has much smaller (none?) dependencies.

Hmm, but Git is GPLv2 and notmuch is GPLv3 and these are not compatible.

-Michal


notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Daniel Kahn Gillmor
On 01/28/2011 08:05 PM, Stewart Smith wrote:
> I'm about at the point where I'm going to take my git mail store
> experiments and get them really to work (and everyone will have to use
> 'notmuch cat' or the like to access the messages)

Would this hypothetical git-based mail store retain the atomicity and
lockless concurrent-access of a maildir?  That is, could it be used in a
server environment?

> which should provide
> both great storage efficiency, much faster backups of your Maildir as
> well as having way fewer paths to traverse checking for new mail.

when you say "backups of your Maildir" do you mean "backups of your
git-based mail store" ?  or is this somehow a literal Maildir stored in git?

Intrigued,

--dkg

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1030 bytes
Desc: OpenPGP digital signature
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/3740e20e/attachment.pgp>


[PATCH 1/3] new: Do not defer maildir flag synchronization during the first run

2011-01-29 Thread Rob Browning
Austin Clements  writes:

> Sure. I've been wanting to take a crack at notmuch new's atomicity for
> a while. Though you'll have to get through some of my outstanding
> patches. I can only keep so many branches in my head. ]:--8)
>
> rlb, you expressed an interest in solving this problem, too. Did you
> make any headway?

No, I haven't done anything there yet.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4


Re: [PATCH 1/3] new: Do not defer maildir flag synchronization during the first run

2011-01-29 Thread Rob Browning
Austin Clements  writes:

> Sure. I've been wanting to take a crack at notmuch new's atomicity for
> a while. Though you'll have to get through some of my outstanding
> patches. I can only keep so many branches in my head. ]:--8)
>
> rlb, you expressed an interest in solving this problem, too. Did you
> make any headway?

No, I haven't done anything there yet.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Daniel Kahn Gillmor
On 01/28/2011 08:05 PM, Stewart Smith wrote:
> I'm about at the point where I'm going to take my git mail store
> experiments and get them really to work (and everyone will have to use
> 'notmuch cat' or the like to access the messages)

Would this hypothetical git-based mail store retain the atomicity and
lockless concurrent-access of a maildir?  That is, could it be used in a
server environment?

> which should provide
> both great storage efficiency, much faster backups of your Maildir as
> well as having way fewer paths to traverse checking for new mail.

when you say "backups of your Maildir" do you mean "backups of your
git-based mail store" ?  or is this somehow a literal Maildir stored in git?

Intrigued,

--dkg



signature.asc
Description: OpenPGP digital signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Stewart Smith
On Thu, 27 Jan 2011 13:40:25 -0500, micah anderson  wrote:
> Due to my harddisk in my laptop being slow (5400RPM), my notmuch
> database growing, and perhaps some fragmentation somewhere, this has
> become *incredibly* annoying for me. I am checking email every 30
> minutes, and I'm nicing and ionicing the processes so I can use my
> machine, but while those processes are running, I'm effectively locked
> out of a good portion of my email. 

I used to use spinning rust and also noticed things were slow. This
is in fact mostly not xapian - but rather crawling the Maildir. I
improved this early on in notmuch history by reducing the number of
seeks needed when traversing the Maildir hierarchy (e.g. stat in
i-node order, which is roughly on-disk order).

I'm about at the point where I'm going to take my git mail store
experiments and get them really to work (and everyone will have to use
'notmuch cat' or the like to access the messages) which should provide
both great storage efficiency, much faster backups of your Maildir as
well as having way fewer paths to traverse checking for new mail.

-- 
Stewart Smith
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


A tool for printing from notmuch

2011-01-29 Thread Jesse Rosenthal
Hi Sebastian,

On Sat, 29 Jan 2011 20:58:53 +0100, Sebastian Spaeth  
wrote:
> I prefer to not have dependencies outside the std lib in python, but for
> xml/html parsing, there is really nothing appropriate, it seems.

I agree. And I'll admit I mainly chose BeautifulSoup out of
familiarity. But you really can't count on email html being well-formed
-- just vaguely renderable. And you certainly can't count on it being
xhtml. So the built-in parsers wouldn't be of much help. And, in fact,
if someone pastes a Word doc into Outlook, then the MS-specific tags and
styles will even choke libtidy. 

So BS is the best I could find for this job (putting a title into the
header and a table into the top of the body or html that might or might
not even have a header or a body tag). And it's always available in
Debian/Arch/Fedora/ports/MacPorts.

The alternative, since we're trying leaving the email's html alone, is
to do our business with splits and regexes. But that seems like a bad
road to head down.

Best,
Jesse



[PATCH] Add a few tests for searching LWN emails.

2011-01-29 Thread Matthieu Lemerre

> Yes, I believe this is related to the dot in the name. From my
> recollection a name with an address requires quoting. So the header that
> is currently formatted as:
> 
>   From: LWN.net Weekly Notification 
> 
> should instead be:
> 
>   From: "LWN.net Weekly Notification" 

Hi all,

I had already reported this problem in id:"87ipzvk2xh.fsf at free.fr".

Recent versions of GMime perform more robust parsing that fix the
problem, but unfortunately debian only ship old versions of the package.

I don't believe we will be able to make all people from whom we receive
email always send RFC2822-compliant email addresses :)

Matthieu


[PATCH 0/4] Versatile date/time parser

2011-01-29 Thread Austin Clements
What about CVS's getdate?  Is GPLv1 compatible?  As far as I can tell,
CVS's getdate depends only on yacc/bison and is probably
back-in-time-biased rather than forward-in-time-biased like the
coreutils getdate.

On Sat, Jan 29, 2011 at 1:50 PM, Michal Sojka  wrote:
> On Sat, 29 Jan 2011, Tom Prince wrote:
>> On 2011-01-23, Michal Sojka wrote:
>> > Hi all,
>> >
>> > the following patch series brings into notmuch date/time parser stolen
>> > from GNU coreutils. It can be applied on top of custom query parser
>> > patches from Austin Clements.
>> >
>> > This is RFC and it not meant for merging.
>>
>> Another source for date parsing is perhaps date.c from git, which
>> (probably) has much smaller (none?) dependencies.
>
> Hmm, but Git is GPLv2 and notmuch is GPLv3 and these are not compatible.
>
> -Michal
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>


[PATCH] Add a few tests for searching LWN emails.

2011-01-29 Thread Chris Wilson
On Fri, 28 Jan 2011 15:17:54 -0700, Jake Edge  wrote:
> Hi Carl and Thomas, 
> 
> On Sat, 29 Jan 2011 05:59:38 +1000 Carl Worth wrote:
> 
> > Yes, I believe this is related to the dot in the name. From my
> > recollection a name with an address requires quoting. So the header
> > that is currently formatted as:
> > 
> > From: LWN.net Weekly Notification 
> > 
> > should instead be:
> > 
> > From: "LWN.net Weekly Notification" 
> 
> I am by no means an expert, but http://www.ietf.org/rfc/rfc2821.txt
> would seem to indicate that names with a '.' in them don't need to be
> quoted as there are several lines in the Scenarios section that look
> like:
> 
> C: From: John Q. Public 
> 
> unless the problem is XXX.yyy (i.e. no spaces on either side of the
> '.'), but that seems like a pretty arbitrary differentiator (i.e. 'Q. '
> is fine, but 'LWN.net' isn't)

The syntax is defined in http://www.faqs.org/rfcs/rfc2822.html in
particular section 3.2.4. From that it appears the unquoted use of
[a-Z][.a-Z]* is valid. However, I shall leave the intricacies to those
whose understand and appreciate the whole problem...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


Re: A tool for printing from notmuch

2011-01-29 Thread Sebastian Spaeth
On Sat, 29 Jan 2011 15:09:14 -0500, Jesse Rosenthal  wrote:
> So BS is the best I could find for this job

No doubt. I once tried to scrape http://theeconomist.com. It has so
broken html that all parsers broke down. BeautifulSoup at least made it
through and didn't completely fail. so I agree it is the best thing for
surely broken html email

Sebastian


pgpBf3HpzeOcB.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: A tool for printing from notmuch

2011-01-29 Thread Jesse Rosenthal
Hi Sebastian,

On Sat, 29 Jan 2011 20:58:53 +0100, Sebastian Spaeth  
wrote:
> I prefer to not have dependencies outside the std lib in python, but for
> xml/html parsing, there is really nothing appropriate, it seems.

I agree. And I'll admit I mainly chose BeautifulSoup out of
familiarity. But you really can't count on email html being well-formed
-- just vaguely renderable. And you certainly can't count on it being
xhtml. So the built-in parsers wouldn't be of much help. And, in fact,
if someone pastes a Word doc into Outlook, then the MS-specific tags and
styles will even choke libtidy. 

So BS is the best I could find for this job (putting a title into the
header and a table into the top of the body or html that might or might
not even have a header or a body tag). And it's always available in
Debian/Arch/Fedora/ports/MacPorts.

The alternative, since we're trying leaving the email's html alone, is
to do our business with splits and regexes. But that seems like a bad
road to head down.

Best,
Jesse

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: A tool for printing from notmuch

2011-01-29 Thread Sebastian Spaeth
On Fri, 28 Jan 2011 15:25:28 -0500, Jesse Rosenthal  wrote:
> Dear all,
> 
> Printing from notmuch is a bit of a pain.

Hi Jesse,

that sounds like a fantastic solution and I will look into it, so far I
have been printing the buffers which does not include attachments at
all.

I prefer to not have dependencies outside the std lib in python, but for
xml/html parsing, there is really nothing appropriate, it seems.

Thanks

Sebastian


pgp0Up1C9zTpp.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/4] Versatile date/time parser

2011-01-29 Thread Sebastian Spaeth
On Sat, 29 Jan 2011 19:50:57 +0100, Michal Sojka  wrote:
> On Sat, 29 Jan 2011, Tom Prince wrote:
> > On 2011-01-23, Michal Sojka wrote:
> > > Hi all,
> > > 
> > > the following patch series brings into notmuch date/time parser stolen
> > > from GNU coreutils. It can be applied on top of custom query parser
> > > patches from Austin Clements.
> > > 
> > > This is RFC and it not meant for merging.
> > 
> > Another source for date parsing is perhaps date.c from git, which
> > (probably) has much smaller (none?) dependencies.
> 
> Hmm, but Git is GPLv2 and notmuch is GPLv3 and these are not compatible.

Keith Packard once sent date parsing code to this list which I had used
to create my date parsing branch (feels like ages ago). Perhaps that
might be useful? I don't have a message id handy though.

Sebastian


pgpvQ6NuoMyPk.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/4] Versatile date/time parser

2011-01-29 Thread Austin Clements
What about CVS's getdate?  Is GPLv1 compatible?  As far as I can tell,
CVS's getdate depends only on yacc/bison and is probably
back-in-time-biased rather than forward-in-time-biased like the
coreutils getdate.

On Sat, Jan 29, 2011 at 1:50 PM, Michal Sojka  wrote:
> On Sat, 29 Jan 2011, Tom Prince wrote:
>> On 2011-01-23, Michal Sojka wrote:
>> > Hi all,
>> >
>> > the following patch series brings into notmuch date/time parser stolen
>> > from GNU coreutils. It can be applied on top of custom query parser
>> > patches from Austin Clements.
>> >
>> > This is RFC and it not meant for merging.
>>
>> Another source for date parsing is perhaps date.c from git, which
>> (probably) has much smaller (none?) dependencies.
>
> Hmm, but Git is GPLv2 and notmuch is GPLv3 and these are not compatible.
>
> -Michal
> ___
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Mike Kelly
On Fri, 28 Jan 2011 11:57:34 -0500
Austin Clements  wrote:

> Yes, exactly.  All of this.  Unfortunately, Xapian doesn't expose the
> ability to block on the lock (see the fcntl call in
> backends/flint_lock.cc, which is hard-coded to the non-blocking
> F_SETLK instead of F_SETLKW), so we'd either need a new Xapian
> option, or we would just have to wrap our own flock/fcntl lock around
> things as you suggest.

Hrm. Do you know if Xapian upstream would be open to a patch to support
optional blocking locks? We can't be the only ones hitting these sorts
of issues.

-- 
Mike Kelly
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/6a776456/attachment.pgp>


notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Stewart Smith
On Thu, 27 Jan 2011 13:40:25 -0500, micah anderson  wrote:
> Due to my harddisk in my laptop being slow (5400RPM), my notmuch
> database growing, and perhaps some fragmentation somewhere, this has
> become *incredibly* annoying for me. I am checking email every 30
> minutes, and I'm nicing and ionicing the processes so I can use my
> machine, but while those processes are running, I'm effectively locked
> out of a good portion of my email. 

I used to use spinning rust and also noticed things were slow. This
is in fact mostly not xapian - but rather crawling the Maildir. I
improved this early on in notmuch history by reducing the number of
seeks needed when traversing the Maildir hierarchy (e.g. stat in
i-node order, which is roughly on-disk order).

I'm about at the point where I'm going to take my git mail store
experiments and get them really to work (and everyone will have to use
'notmuch cat' or the like to access the messages) which should provide
both great storage efficiency, much faster backups of your Maildir as
well as having way fewer paths to traverse checking for new mail.

-- 
Stewart Smith


About the json output and the number of results shown.

2011-01-29 Thread Mike Kelly
On Sat, 29 Jan 2011 06:44:40 +1000
Carl Worth  wrote:

> On Wed, 12 Jan 2011 22:39:45 +, Mike Kelly 
> wrote:
> > For starters, if I'm simply trying to retrieve a single message, the
> > interface is rather awkard. I seem to need to do something like:
> > 
> > my $json = `notmuch show --format=json id:$message_id`;
> > my $parsed_json = decode_json($json);
> > my $message = $parsed_json->[0][0][0];
> 
> That does seem fairly awkward, yes. Do you have a suggestion for how
> you'd like the output to be structured instead?

Well, if I ask for a single message, I'd expect to just get a single
message. So, $message = $parsed_json, without the extra single-entry
arrays.

> > And, when I'm doing my search earlier to even find those message
> > ids, I need to do a check to `notmuch count` first to see if I'll
> > even get any results, because the 0 result case is not valid JSON.
> 
> Yikes! That's a bug in notmuch that we should get fixed rather than
> you just working around it. I just started adding a test for this
> case. Currently:
> 
>   notmuch search --format=json "string that matches nothing"
> 
> returns nothing. Presumably, this should return just an empty json
> array instead, (that is, "[]")?

Yeah, should be "[]".

Thanks.

-- 
Mike Kelly
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/b19f2056/attachment.pgp>


Re: [PATCH 0/4] Versatile date/time parser

2011-01-29 Thread Michal Sojka
On Sat, 29 Jan 2011, Tom Prince wrote:
> On 2011-01-23, Michal Sojka wrote:
> > Hi all,
> > 
> > the following patch series brings into notmuch date/time parser stolen
> > from GNU coreutils. It can be applied on top of custom query parser
> > patches from Austin Clements.
> > 
> > This is RFC and it not meant for merging.
> 
> Another source for date parsing is perhaps date.c from git, which
> (probably) has much smaller (none?) dependencies.

Hmm, but Git is GPLv2 and notmuch is GPLv3 and these are not compatible.

-Michal
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch's idea of concurrency / failing an invocation

2011-01-29 Thread Mike Kelly
On Fri, 28 Jan 2011 11:57:34 -0500
Austin Clements  wrote:

> Yes, exactly.  All of this.  Unfortunately, Xapian doesn't expose the
> ability to block on the lock (see the fcntl call in
> backends/flint_lock.cc, which is hard-coded to the non-blocking
> F_SETLK instead of F_SETLKW), so we'd either need a new Xapian
> option, or we would just have to wrap our own flock/fcntl lock around
> things as you suggest.

Hrm. Do you know if Xapian upstream would be open to a patch to support
optional blocking locks? We can't be the only ones hitting these sorts
of issues.

-- 
Mike Kelly


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: About the json output and the number of results shown.

2011-01-29 Thread Mike Kelly
On Sat, 29 Jan 2011 06:44:40 +1000
Carl Worth  wrote:

> On Wed, 12 Jan 2011 22:39:45 +, Mike Kelly 
> wrote:
> > For starters, if I'm simply trying to retrieve a single message, the
> > interface is rather awkard. I seem to need to do something like:
> > 
> > my $json = `notmuch show --format=json id:$message_id`;
> > my $parsed_json = decode_json($json);
> > my $message = $parsed_json->[0][0][0];
> 
> That does seem fairly awkward, yes. Do you have a suggestion for how
> you'd like the output to be structured instead?

Well, if I ask for a single message, I'd expect to just get a single
message. So, $message = $parsed_json, without the extra single-entry
arrays.

> > And, when I'm doing my search earlier to even find those message
> > ids, I need to do a check to `notmuch count` first to see if I'll
> > even get any results, because the 0 result case is not valid JSON.
> 
> Yikes! That's a bug in notmuch that we should get fixed rather than
> you just working around it. I just started adding a test for this
> case. Currently:
> 
>   notmuch search --format=json "string that matches nothing"
> 
> returns nothing. Presumably, this should return just an empty json
> array instead, (that is, "[]")?

Yeah, should be "[]".

Thanks.

-- 
Mike Kelly


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Various small clean-ups to doc ID set code.

2011-01-29 Thread Carl Worth
On Wed, 8 Dec 2010 17:01:53 -0500, Austin Clements  wrote:
> Remove the repeated "sizeof (doc_ids->bitmap[0])" that bothered cworth
> by instead defining macros to compute the word and bit offset of a
> given bit in the bitmap.
> 
> Don't require the caller of _notmuch_doc_id_set_init to pass in a
> correct bound; instead compute it from the array.  This simplifies the
> caller and makes this interface easier to use correctly.
...
> +#define BITMAP_WORD(bit) ((bit) / sizeof (unsigned int))
> +#define BITMAP_BIT(bit) ((bit) % sizeof (unsigned int))

These macros look great, they definitely simplify the code.

>  _notmuch_doc_id_set_init (void *ctx,
> notmuch_doc_id_set_t *doc_ids,
> -   GArray *arr, unsigned int bound)
> +   GArray *arr)
...
> +for (unsigned int i = 0; i < arr->len; i++)
> + max = MAX(max, g_array_index (arr, unsigned int, i));

And computing an argument automatically definitely makes the interface
easier to use. So that's good too. But those two changes are independent
so really need to be in separate commits.

> -if (doc_id >= doc_ids->bound)
> +if (doc_id > doc_ids->max)

And this looks really like a *third* independent change to me.

A code change like the above has the chance to introduce (or fix) an
off-by-one bug---or even leave the code effectively unchanged as the
intent is here.

In order to distinguish all of those cases, I'd like to see a change
like this as a minimal change, and described in the commit
message. (Rather than hidden amongst "various cleanups" that are mostly
about replacing some common code with a macro.)

So I'd be happy to see this patch broken up and sent again.

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/9a91e46e/attachment.pgp>


[PATCH 3/4] Optimize thread search using matched docid sets.

2011-01-29 Thread Carl Worth
On Wed, 8 Dec 2010 16:58:44 -0500, Austin Clements  wrote:
> Now that this is in (and I have a temporary respite from TA duties),
> I'm going to finish up and send out my other ~1.7X improvement, just
> to get it out of my queue.  Then I'll look at making a performance
> regression suite.  Were you thinking of some standard set of timed
> operations wrapped in a little script that can tell you if you've made
> things worse, or something more elaborate?

I recently started making a perf/notmuch-perf script for notmuch (see
below). I was doing this in preparation for my linux.conf.au talk on
notmuch, (though I ended up not talking about performance in concrete
terms).

I don't know how much further I'll run with this now, but if this is a
useful starting place for anyone, let me know and I can obviously add
this to the repository.

So the idea with this script is that the timed operations actually
depend on local data, (your current mail collection as indicated by
NOTMUCH_CONFIG). So the operations aren't standardized to enable
comparison between different people, (unless they also agree on some
common mail collection).

My script as attached runs only "notmuch new" to time the original
indexing. Beyond that I'd like to time some common operations,
(adding a new message, searching for a single message, searching for
many messages, searching for all messages, etc.).

And then on top of this, I'd like to have a little utility that could
compare several different runs captured previously. That would let me do
the regression testing I'd like to ensure we never make performance
worse.

Please feel free to run with this or with your own approach as you see
fit.

-Carl

-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/46532cee/attachment.pgp>
-- next part --
A non-text attachment was scrubbed...
Name: notmuch-perf
Type: application/octet-stream
Size: 603 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/46532cee/attachment.obj>


[PATCH] Have to configure and build inside the source directory.

2011-01-29 Thread Carl Worth
> +if ! { : < configure; } 2> /dev/null; then
> +cat < +*** Error: You have to configure and build in the source directory.
> +
> +EOF
> +exit 1
> +fi

Rather than documenting a limitation here, why don't we do what people
actually want.

What do other build systems generally do when running configure from
some other directory? Copy/link the Makefiles and then construct them
carefully such that they can find all the source files?

That doesn't sound like it would be that hard.

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/c77c3794/attachment.pgp>


[ANN] notmuch-deliver

2011-01-29 Thread Carl Worth
On Wed, 12 Jan 2011 15:50:24 -0500, Austin Clements  wrote:
> Out of curiosity, has anyone considered using inotify to monitor maildirs
> for new mail to hand to notmuch?  For systems supporting inotify (or
> equivalents), this would have the advantage of being compatible with any
> delivery mechanism, be it a mail server, procmail, or emacs fcc'ing a
> maildir.

The idea has definitely been floated before. If I search for:

notmuch search to:notmuch at notmuchmail.org inotify

in my collection I see 5 messages sent to the list, (other than
yours). But none of those include any code that I have seen.

It sounds like playing with inotify and notmuch might make a fun weekend
project for somebody. Is anyone looking for something like that?

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/004959a3/attachment.pgp>


[ANN] notmuch-deliver

2011-01-29 Thread Carl Worth
On Tue, 11 Jan 2011 16:01:00 +0200, Felipe Contreras  wrote:
> I think this should be part of notmuch itself,

I'll be happy to see any proposed additions for this. (And to the extent
that some of this functionality exists in patches already proposed and
just waiting for me, then I'm already happy about that too!)

> and there should be a
> configuration to use this as Fcc, instead of relying on the mail
> composer. This way both emacs and vim interfaces would share the same
> configuration regarding the Fcc/Bcc preference.

Sharing the configuration as much as possible is definitely good.

But Fcc is going to have to rely quite a bit on the mail
composer. Currently, notmuch isn't involved at all in the sending of a
mail, and it's not until a mail is actually sent that it's time to
deliver the message to the Fcc location. So even getting notmuch to
become informed about the message at Fcc time will require modification
of the mail composer.

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/3ac5dfea/attachment.pgp>


[ANN] notmuch-deliver

2011-01-29 Thread Carl Worth
On Tue, 11 Jan 2011 12:46:38 +0100, Thomas Schwinge  
wrote:
> > What's the best way to advertise this to potential users?
> 
> I recently put a description and link onto the notmuch web pages.

Great. Thanks for doing this.

> > Should we include a separate utils directory in the notmuch repository
> > with auxiliary programs like this?
> 
> I wouldn't do so.  But that is not a very strong opinion of mine.

Well, your opinion matters quite a bit in a case like this where it's
your utility. I guess I was simply offering the "hosting" of the
repository if you thought it would be useful. I understand the desire to
keep things cleanly packaged separately.

> I'll also take the liberty to put stuff from the mailing list or IRC
> discussions into web pages, for we have to document this notmuch beast
> ;-), and it's better to have a generic place to refer people to, instead
> of discussing the same things more than once.

Yes! Please continue to improve the web pages, and everyone, please feel
free to grab useful stuff from the lists or IRC and stuff them into the
web pages.

Also, if things should get shoved into the notmuch man page, then we
should do that too.

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/91937730/attachment-0001.pgp>


About the json output and the number of results shown.

2011-01-29 Thread Carl Worth
On Wed, 12 Jan 2011 22:39:45 +, Mike Kelly  wrote:
> For starters, if I'm simply trying to retrieve a single message, the
> interface is rather awkard. I seem to need to do something like:
> 
> my $json = `notmuch show --format=json id:$message_id`;
> my $parsed_json = decode_json($json);
> my $message = $parsed_json->[0][0][0];

That does seem fairly awkward, yes. Do you have a suggestion for how
you'd like the output to be structured instead?

> And, when I'm doing my search earlier to even find those message ids, I
> need to do a check to `notmuch count` first to see if I'll even get any
> results, because the 0 result case is not valid JSON.

Yikes! That's a bug in notmuch that we should get fixed rather than you
just working around it. I just started adding a test for this
case. Currently:

notmuch search --format=json "string that matches nothing"

returns nothing. Presumably, this should return just an empty json array
instead, (that is, "[]")?

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/f649abb3/attachment.pgp>


About the json output and the number of results shown.

2011-01-29 Thread Carl Worth
On Wed, 12 Jan 2011 19:37:21 +0100, Christophe-Marie Duquesne  wrote:
> So I am wondering: what is the point of having a tool that is able to
> output json and ending in not using it? Is there a solution to make
> the json output more useable? One solution I've been thinking about
> would be to add an option: the range of results to show (something
> like --range=25:50). Is it doable easily?

This is fairly easy to do, yes. We even had functionality like this
once, and I'll probably even add it back soon, (since a client like the
vim interface isn't able to do the kind of asynchronous processing that
you would really want).

One problem with the ranged output (for "notmuch search" at least) is
that small ranges with large initial offsets will take longer than
expected. This is because in this case notmuch can't directly use
Xapian's range offset support. The user is asking for an offset as a
number of threads, but within Xapian we only have messages stored. So
notmuch will have to search for messages from the beginning, construct a
bunch of useless threads, and then throw those threads away after doing
no more than counting them.

This inefficiency in this API was one of the reasons I dropped this
functionality before. It's pretty ugly. But I don't see a really good
answer for that.

> feature request. In any case, do you have any proposal for making
> sense of this json output without modifications in the notmuch CLI?

We've run into basically the same issue with the emacs interface. We've
been avoiding using the json output precisely because the emacs JSON
parsing would need to see all the output before it could start
parsing. And that wouldn't give us the responsive user interface that we
want.

One idea I've had for this is to change the output (perhaps with a
command-line option) to avoid emitting the outer array. That is, the
results would instead be a series of independent JSON objects rather
than a single JSON object. That should let the application treat things
quickly by simply calling the JSON parser for each complete
object. (Though, here, the application would likely want a cheap way to
know when the input represented a complete object.)

If anyone wants to help improve our JSON output here, then that would be
great.

For any change to the structure of the JSON output, I'd also like to see
some documentation added to specify that structure clearly.

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/33b467f5/attachment.pgp>


About the json output and the number of results shown.

2011-01-29 Thread Carl Worth
On Thu, 13 Jan 2011 19:46:29 +0100, Christophe-Marie Duquesne  wrote:
> I've had a look to the python libnotmuch documentation. My problem
> with this API is that, unless I did not read it correctly, if I use
> one of its functions in a threaded fashion, I still have to wait for
> this function to finish until I get results.

The search function should return very quickly (ore pretty close to that
anyway). It's only when you start iterating through the results that
there's a lot of time being spent in the library functions.

> When using the command
> line tool, I can process the text as it gets printed on stdout, and I
> have data to show to the user even though notmuch has not finished to
> output it...

This functionality of the command-line tool is implemented with the same
library functions you would call. So you should get exactly the same
behavior that you want by calling the library directly.

Please let us know if that's not the case.

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/4d5bafc3/attachment.pgp>


[PATCH] Add a few tests for searching LWN emails.

2011-01-29 Thread Carl Worth
On Thu, 27 Jan 2011 03:31:49 -0700, Thomas Schwinge  
wrote:
> These tests should pass -- but they currently don't.

Hi Thomas,

Thanks for sending these test cases. This is actually my favorite way to
receive bug reports. I really appreciate it!

> What we get from these emails, is an author named ``LWN.net'', and the
> ``Weekly Notification'' / ``Mailing Lists'' bits are stripped away.  I
> suspect this may be a misinterpretation in the notmuch address parser,
> related to the dot in the name.  I have not yet looked at the relevant
> code.

Yes, I believe this is related to the dot in the name. From my
recollection a name with an address requires quoting. So the header that
is currently formatted as:

From: LWN.net Weekly Notification 

should instead be:

From: "LWN.net Weekly Notification" 

I verified that adding this quoting fixes the tests[*]. I've CCed
lwn at lwn.net in case the kind editors there would like to add quoting
here.

I don't have any of the relevant RFCs in front of me now, so I don't
know exactly how an address with this missing quoting should be
parsed. But I recall having failures trying to send mail to an address
like this formatted without the quoting.

So I'm not sure what could reasonably be changed here in GMime or not. I
definitely do want to fix notmuch so that it indexes all of the text
here regardless if it's formatted in an RFC-compliant way or not.

-Carl

[*] Except for test VI which has a bug in that it searches for the word
"mailing" in the subject header, but no such word exists in the
message's subject header.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110129/1e1a6d6b/attachment-0001.pgp>


Re: [PATCH] Add a few tests for searching LWN emails.

2011-01-29 Thread Matthieu Lemerre

> Yes, I believe this is related to the dot in the name. From my
> recollection a name with an address requires quoting. So the header that
> is currently formatted as:
> 
>   From: LWN.net Weekly Notification 
> 
> should instead be:
> 
>   From: "LWN.net Weekly Notification" 

Hi all,

I had already reported this problem in id:"87ipzvk2xh@free.fr".

Recent versions of GMime perform more robust parsing that fix the
problem, but unfortunately debian only ship old versions of the package.

I don't believe we will be able to make all people from whom we receive
email always send RFC2822-compliant email addresses :)

Matthieu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add a few tests for searching LWN emails.

2011-01-29 Thread Chris Wilson
On Fri, 28 Jan 2011 15:17:54 -0700, Jake Edge  wrote:
> Hi Carl and Thomas, 
> 
> On Sat, 29 Jan 2011 05:59:38 +1000 Carl Worth wrote:
> 
> > Yes, I believe this is related to the dot in the name. From my
> > recollection a name with an address requires quoting. So the header
> > that is currently formatted as:
> > 
> > From: LWN.net Weekly Notification 
> > 
> > should instead be:
> > 
> > From: "LWN.net Weekly Notification" 
> 
> I am by no means an expert, but http://www.ietf.org/rfc/rfc2821.txt
> would seem to indicate that names with a '.' in them don't need to be
> quoted as there are several lines in the Scenarios section that look
> like:
> 
> C: From: John Q. Public 
> 
> unless the problem is XXX.yyy (i.e. no spaces on either side of the
> '.'), but that seems like a pretty arbitrary differentiator (i.e. 'Q. '
> is fine, but 'LWN.net' isn't)

The syntax is defined in http://www.faqs.org/rfcs/rfc2822.html in
particular section 3.2.4. From that it appears the unquoted use of
[a-Z][.a-Z]* is valid. However, I shall leave the intricacies to those
whose understand and appreciate the whole problem...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add a few tests for searching LWN emails.

2011-01-29 Thread Jake Edge
Hi Carl and Thomas, 

On Sat, 29 Jan 2011 05:59:38 +1000 Carl Worth wrote:

> Yes, I believe this is related to the dot in the name. From my
> recollection a name with an address requires quoting. So the header
> that is currently formatted as:
> 
>   From: LWN.net Weekly Notification 
> 
> should instead be:
> 
>   From: "LWN.net Weekly Notification" 

I am by no means an expert, but http://www.ietf.org/rfc/rfc2821.txt
would seem to indicate that names with a '.' in them don't need to be
quoted as there are several lines in the Scenarios section that look
like:

C: From: John Q. Public 

unless the problem is XXX.yyy (i.e. no spaces on either side of the
'.'), but that seems like a pretty arbitrary differentiator (i.e. 'Q. '
is fine, but 'LWN.net' isn't)

aren't the '<' and '>' the real delimiters here?

If we do need to fix something, though, we'd be more than happy to do
so I suspect ...

jake

-- 
Jake Edge - LWN - j...@lwn.net - http://lwn.net
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch