Add new dump/restore format and batch tagging.

2012-11-18 Thread Ethan Glasser-Camp
Ethan Glasser-Camp  writes:

> - Patch 4 still has a subject line that ends in a period. I don't think
>   this is mandatory for everyone but some people consider it best
>   practice.

Best practice, of course, would be to remove the period at the end of
the subject line. Patch 12 also has one.

Ethan


Add new dump/restore format and batch tagging.

2012-11-18 Thread Ethan Glasser-Camp
david at tethera.net writes:

> which was revied by Tomi and Ethan. I think I implemented their
> suggestions.

Actually, I don't think you implemented all of mine.

- Patch 4 still has a subject line that ends in a period. I don't think
  this is mandatory for everyone but some people consider it best
  practice. You still have the spelling "seperate" (also, patch 7 has
  "seperated").

- In patch 4, I still think this would look better if you switched the
  branches.

+if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   _notmuch_message_add_term (message, "type", "mail");
+} else {
+   return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+}

- In patch 5, I still think this looks funny:

+   int num_tags = random () % (max_tags + 1);
+   int this_mid_len = random () % message_id_len + 1;

Additionally, I would like a check that message_id_len is reasonable
(more than 1, say).

- Patch 8:

+int
+parse_tag_stream (void *ctx,
+ notmuch_database_t *notmuch,
+ FILE *input,
+ tag_callback_t callback,
+ tag_op_flag_t flags,
+ volatile sig_atomic_t *interrupted);

Am I going crazy, or does this function not get implemented?

- Patch 11: DUMP_FORMAT can change from DUMP_FORMAT_BATCH_TAG to
  DUMP_FORMAT_SUP if a paren is anywhere on the first line. I'd prefer
  this only happens if we have DUMP_FORMAT_AUTO.

You probably want to move the comment "Dump output is..." closer to the
regex.

I don't see why it's necessary to have this:

+   query_string = query_string + 3;

- Patch 13:

+cp /dev/null EXPECTED.$test_count
+notmuch dump --format=batch-tag -- from:cworth |\
+awk "{ print \"+$enc1 +$enc2 +$enc3 -- \" \$5 }" > EXPECTED.$test_count

What's the purpose of the CP here? It just creates an empty file. You
could do it with touch. Why even bother since you're going to fill it
with stuff in a second? Actually, care to explain the dump and sed call?
It looks like you're using this dump to encode the message IDs. If
format=batch-tag skips a message for some reason or terminates early,
the test won't fail.

- Patch 16:

+message-ids may contained arbitrary non-null characters. Note also

I think this should be "may contain", or something else entirely if
you're trying to describe past behavior of sup?

Ethan


Add new dump/restore format and batch tagging.

2012-11-18 Thread da...@tethera.net
This is a patch series with some history, if you will forgive the
semi-inadvertant pun. I list that at that end, for the curious.

If this series goes in, in the future we might consider whether
restore --accumulate provides important functionality over batch
tagging; at the moment I suggest leaving it as the extra code to
support it is minimal, and it does support peoples existing scripts
using the old dump/restore format.

History
---

About 1 year ago, Petter Reinholdtsen observed a problem with dumping
and restoring message-id's with spaces in them. 

id:2flfwhht87d.fsf at diskless.uio.no

There followed a proposed fix

  id:1323808075-7417-1-git-send-email-david at tethera.net

Which Dmitry had a few helpful things to say about the hex encoding
libs.

Jani took that foundation and proposed two versions of the batch
tagging, most recently at

id:cover.1334404979.git.jani at nikula.org. 

There was some discussion with Jamie about the file format for batch
tagging in the thread

 id:cover.1333231401.git.jani at nikula.org

id:1323808075-7417-1-git-send-email-david at tethera.net
The first 6 of these patches obsolete the series

id:1345382314-5330-1-git-send-email-david at tethera.net

which was revied by Tomi and Ethan. I think I implemented their
suggestions.

Although I didn't re-read that whole thread, I believe this version of
the patches address's Jamie's concerns by using exactly the same
format for restore and tag --batch (renamed from Jani's choice of
--stdin).



Re: Add new dump/restore format and batch tagging.

2012-11-18 Thread Ethan Glasser-Camp
Ethan Glasser-Camp  writes:

> - Patch 4 still has a subject line that ends in a period. I don't think
>   this is mandatory for everyone but some people consider it best
>   practice.

Best practice, of course, would be to remove the period at the end of
the subject line. Patch 12 also has one.

Ethan
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Add new dump/restore format and batch tagging.

2012-11-18 Thread Ethan Glasser-Camp
da...@tethera.net writes:

> which was revied by Tomi and Ethan. I think I implemented their
> suggestions.

Actually, I don't think you implemented all of mine.

- Patch 4 still has a subject line that ends in a period. I don't think
  this is mandatory for everyone but some people consider it best
  practice. You still have the spelling "seperate" (also, patch 7 has
  "seperated").

- In patch 4, I still think this would look better if you switched the
  branches.

+if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   _notmuch_message_add_term (message, "type", "mail");
+} else {
+   return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+}

- In patch 5, I still think this looks funny:

+   int num_tags = random () % (max_tags + 1);
+   int this_mid_len = random () % message_id_len + 1;

Additionally, I would like a check that message_id_len is reasonable
(more than 1, say).

- Patch 8:

+int
+parse_tag_stream (void *ctx,
+ notmuch_database_t *notmuch,
+ FILE *input,
+ tag_callback_t callback,
+ tag_op_flag_t flags,
+ volatile sig_atomic_t *interrupted);

Am I going crazy, or does this function not get implemented?

- Patch 11: DUMP_FORMAT can change from DUMP_FORMAT_BATCH_TAG to
  DUMP_FORMAT_SUP if a paren is anywhere on the first line. I'd prefer
  this only happens if we have DUMP_FORMAT_AUTO.

You probably want to move the comment "Dump output is..." closer to the
regex.

I don't see why it's necessary to have this:

+   query_string = query_string + 3;

- Patch 13:

+cp /dev/null EXPECTED.$test_count
+notmuch dump --format=batch-tag -- from:cworth |\
+awk "{ print \"+$enc1 +$enc2 +$enc3 -- \" \$5 }" > EXPECTED.$test_count

What's the purpose of the CP here? It just creates an empty file. You
could do it with touch. Why even bother since you're going to fill it
with stuff in a second? Actually, care to explain the dump and sed call?
It looks like you're using this dump to encode the message IDs. If
format=batch-tag skips a message for some reason or terminates early,
the test won't fail.

- Patch 16:

+message-ids may contained arbitrary non-null characters. Note also

I think this should be "may contain", or something else entirely if
you're trying to describe past behavior of sup?

Ethan
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Add new dump/restore format and batch tagging.

2012-11-18 Thread david
This is a patch series with some history, if you will forgive the
semi-inadvertant pun. I list that at that end, for the curious.

If this series goes in, in the future we might consider whether
restore --accumulate provides important functionality over batch
tagging; at the moment I suggest leaving it as the extra code to
support it is minimal, and it does support peoples existing scripts
using the old dump/restore format.

History
---

About 1 year ago, Petter Reinholdtsen observed a problem with dumping
and restoring message-id's with spaces in them. 

id:2flfwhht87d@diskless.uio.no

There followed a proposed fix

  id:1323808075-7417-1-git-send-email-da...@tethera.net

Which Dmitry had a few helpful things to say about the hex encoding
libs.

Jani took that foundation and proposed two versions of the batch
tagging, most recently at

id:cover.1334404979.git.j...@nikula.org. 

There was some discussion with Jamie about the file format for batch
tagging in the thread

 id:cover.1333231401.git.j...@nikula.org

id:1323808075-7417-1-git-send-email-da...@tethera.net
The first 6 of these patches obsolete the series

id:1345382314-5330-1-git-send-email-da...@tethera.net

which was revied by Tomi and Ethan. I think I implemented their
suggestions.

Although I didn't re-read that whole thread, I believe this version of
the patches address's Jamie's concerns by using exactly the same
format for restore and tag --batch (renamed from Jani's choice of
--stdin).

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


New dump/restore format.

2012-01-14 Thread David Bremner
This version of this series add fairly extensive testing with strange
message ids full of spaces and punctuation, and some documentation.



New dump/restore format.

2012-01-14 Thread David Bremner
This version of this series add fairly extensive testing with strange
message ids full of spaces and punctuation, and some documentation.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


New Dump/Restore Format

2011-12-13 Thread David Bremner
Hi All;

There are some style/doc issues remaining, but because bugs in dump
and restore really suck, I thought I would ask for early feedback on
functionality.  I'm particularly interested in how the new dump format
works for weird message-ids (spaces and so on). If you have public
messages with tricky message-id's, I'd appreciate adding those
messages to the test suite.

Things to bikeshed now: name(s) of the formats; sup and notmuch are
maybe not ideal.  The format itself? The encoding format? The latter
is chosen for compatibility with nmbug, but we could discussing using
a bigger character set.

Things I know about

  - not enough tests
  - no man page, online docs.
  - no API docs for hex_encode/blah.

I think the code in hex-escape.[ch] is otherwise ready for (second)
review; I'll probably do another review of the code in
notmuch-(dump|restore).c myself for clarity, so you might want to wait
for the next round before diving in.

If you prefer pull from git, you can get these patches on branch "new-dump" 
at git://pivot.cs.unb.ca/notmuch.git



New Dump/Restore Format

2011-12-13 Thread David Bremner
Hi All;

There are some style/doc issues remaining, but because bugs in dump
and restore really suck, I thought I would ask for early feedback on
functionality.  I'm particularly interested in how the new dump format
works for weird message-ids (spaces and so on). If you have public
messages with tricky message-id's, I'd appreciate adding those
messages to the test suite.

Things to bikeshed now: name(s) of the formats; sup and notmuch are
maybe not ideal.  The format itself? The encoding format? The latter
is chosen for compatibility with nmbug, but we could discussing using
a bigger character set.

Things I know about
  
  - not enough tests
  - no man page, online docs.
  - no API docs for hex_encode/blah.

I think the code in hex-escape.[ch] is otherwise ready for (second)
review; I'll probably do another review of the code in
notmuch-(dump|restore).c myself for clarity, so you might want to wait
for the next round before diving in.

If you prefer pull from git, you can get these patches on branch "new-dump" 
at git://pivot.cs.unb.ca/notmuch.git

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch