Re: Database location bug when database does not already exist
Hello, On Sat 03 Sep 2022 at 08:34AM -03, David Bremner wrote: > Sean Whitton writes: > >> Hello, >> >> In the following situation: >> >> - only notmuch config is ~/.notmuch-config >> - database.mail_root is set >> - database.path is not set >> - notmuch database does not yet exist >> >> then notmuch wants to create its db under mail_root/.notmuch, contrary >> to notmuch-config(1). >> > > This bug should be fixed as of commit 84e4e130e2c920b3dee91901582c4ab6276e2630 Nice, thanks! -- Sean Whitton ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 2/3] lib: parse index.as_text
We pre-parse into a list of compiled regular expressions to avoid calling regexc on the hot (indexing) path. As explained in the code comment, this cannot be done lazily with reasonable error reporting, at least not without touching a lot of the code in index.cc. --- lib/database-private.h | 4 lib/open.cc| 53 ++ 2 files changed, 57 insertions(+) diff --git a/lib/database-private.h b/lib/database-private.h index b9be4e22..61232f1a 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -291,6 +291,10 @@ struct _notmuch_database { /* Track what parameters were specified when opening */ notmuch_open_param_t params; + +/* list of regular expressions to check for text indexing */ +regex_t *index_as_text; +size_t index_as_text_length; }; /* Prior to database version 3, features were implied by the database diff --git a/lib/open.cc b/lib/open.cc index 67ff868c..54d1faf3 100644 --- a/lib/open.cc +++ b/lib/open.cc @@ -320,6 +320,8 @@ _alloc_notmuch (const char *database_path, const char *config_path, const char * notmuch->transaction_count = 0; notmuch->transaction_threshold = 0; notmuch->view = 1; +notmuch->index_as_text = NULL; +notmuch->index_as_text_length = 0; notmuch->params = NOTMUCH_PARAM_NONE; if (database_path) @@ -427,6 +429,53 @@ _load_database_state (notmuch_database_t *notmuch) notmuch, notmuch->xapian_db->get_uuid ().c_str ()); } +/* XXX This should really be done lazily, but the error reporting path in the indexing code + * would need to be redone to report any errors. + */ +notmuch_status_t +_ensure_index_as_text (notmuch_database_t *notmuch, char **message) +{ +int nregex = 0; +regex_t *regexv = NULL; + +if (notmuch->index_as_text) + return NOTMUCH_STATUS_SUCCESS; + +for (notmuch_config_values_t *list = notmuch_config_get_values (notmuch, + NOTMUCH_CONFIG_INDEX_AS_TEXT); +notmuch_config_values_valid (list); +notmuch_config_values_move_to_next (list)) { + regex_t *new_regex; + int rerr; + const char *str = notmuch_config_values_get (list); + size_t len = strlen (str); + + /* str must be non-empty, because n_c_get_values skips empty +* strings */ + assert (len > 0); + + regexv = talloc_realloc (notmuch, regexv, regex_t, nregex + 1); + new_regex = [nregex]; + + rerr = regcomp (new_regex, str, REG_EXTENDED | REG_NOSUB); + if (rerr) { + size_t error_size = regerror (rerr, new_regex, NULL, 0); + char *error = (char *) talloc_size (str, error_size); + + regerror (rerr, new_regex, error, error_size); + IGNORE_RESULT (asprintf (message, "Error in index.as_text: %s: %s\n", error, str)); + + return NOTMUCH_STATUS_ILLEGAL_ARGUMENT; + } + nregex++; +} + +notmuch->index_as_text = regexv; +notmuch->index_as_text_length = nregex; + +return NOTMUCH_STATUS_SUCCESS; +} + static notmuch_status_t _finish_open (notmuch_database_t *notmuch, const char *profile, @@ -531,6 +580,10 @@ _finish_open (notmuch_database_t *notmuch, if (status) goto DONE; + status = _ensure_index_as_text (notmuch, ); + if (status) + goto DONE; + autocommit_str = notmuch_config_get (notmuch, NOTMUCH_CONFIG_AUTOCOMMIT); if (unlikely (! autocommit_str)) { INTERNAL_ERROR ("missing configuration for autocommit"); -- 2.35.2 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 1/3] lib: add config key INDEX_AS_TEXT
Higher level processing as a list of regular expressions and documentation will follow. --- lib/config.cc| 3 +++ lib/notmuch.h| 1 + test/T030-config.sh | 1 + test/T055-path-config.sh | 1 + test/T590-libconfig.sh | 5 + 5 files changed, 11 insertions(+) diff --git a/lib/config.cc b/lib/config.cc index 503a0c8b..2323860d 100644 --- a/lib/config.cc +++ b/lib/config.cc @@ -599,6 +599,8 @@ _notmuch_config_key_to_string (notmuch_config_key_t key) return "database.autocommit"; case NOTMUCH_CONFIG_EXTRA_HEADERS: return "show.extra_headers"; +case NOTMUCH_CONFIG_INDEX_AS_TEXT: + return "index.as_text"; default: return NULL; } @@ -642,6 +644,7 @@ _notmuch_config_default (notmuch_database_t *notmuch, notmuch_config_key_t key) else email = _get_email_from_passwd_file (notmuch); return email; +case NOTMUCH_CONFIG_INDEX_AS_TEXT: case NOTMUCH_CONFIG_NEW_IGNORE: return ""; case NOTMUCH_CONFIG_AUTOCOMMIT: diff --git a/lib/notmuch.h b/lib/notmuch.h index 0b0540b1..935a8d59 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -2558,6 +2558,7 @@ typedef enum { NOTMUCH_CONFIG_USER_NAME, NOTMUCH_CONFIG_AUTOCOMMIT, NOTMUCH_CONFIG_EXTRA_HEADERS, +NOTMUCH_CONFIG_INDEX_AS_TEXT, NOTMUCH_CONFIG_LAST } notmuch_config_key_t; diff --git a/test/T030-config.sh b/test/T030-config.sh index 43bbce31..ea0b4012 100755 --- a/test/T030-config.sh +++ b/test/T030-config.sh @@ -57,6 +57,7 @@ database.mail_root=MAIL_DIR database.path=MAIL_DIR foo.list=this;is another;list value; foo.string=this is another string value +index.as_text= maildir.synchronize_flags=true new.ignore= new.tags=unread;inbox diff --git a/test/T055-path-config.sh b/test/T055-path-config.sh index fe295324..efc79e8b 100755 --- a/test/T055-path-config.sh +++ b/test/T055-path-config.sh @@ -299,6 +299,7 @@ database.backup_dir database.hook_dir database.mail_root=MAIL_DIR database.path +index.as_text= maildir.synchronize_flags=true new.ignore= new.tags=unread;inbox diff --git a/test/T590-libconfig.sh b/test/T590-libconfig.sh index 26a1f033..9326ba3e 100755 --- a/test/T590-libconfig.sh +++ b/test/T590-libconfig.sh @@ -440,6 +440,7 @@ cat <<'EOF' >EXPECTED 10: 'USER_FULL_NAME' 11: '8000' 12: 'NULL' +13: '' == stderr == EOF unset MAILDIR @@ -725,6 +726,7 @@ test_expect_equal_file EXPECTED OUTPUT test_begin_subtest "list by keys (ndlc)" notmuch config set search.exclude_tags "foo;bar;fub" notmuch config set new.ignore "sekrit_junk" +notmuch config set index.as_text "text/" cat c_head2 - c_tail <<'EOF' | test_C ${MAIL_DIR} %NULL% %NULL% { notmuch_config_key_t key; @@ -751,6 +753,7 @@ cat <<'EOF' >EXPECTED 10: 'Notmuch Test Suite' 11: '8000' 12: 'NULL' +13: 'text/' == stderr == EOF test_expect_equal_file EXPECTED OUTPUT @@ -785,6 +788,7 @@ cat <<'EOF' >EXPECTED 10: 'USER_FULL_NAME' 11: '8000' 12: 'NULL' +13: '' == stderr == EOF test_expect_equal_file EXPECTED OUTPUT.clean @@ -856,6 +860,7 @@ database.backup_dir MAIL_DIR/.notmuch/backups database.hook_dir MAIL_DIR/.notmuch/hooks database.mail_root MAIL_DIR database.path MAIL_DIR +index.as_text text/ key with spaces value, with, spaces! maildir.synchronize_flags true new.ignore sekrit_junk -- 2.35.2 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 3/3] lib: index attachements with mime types matching index.as_text
Instead of skipping indexing all attachments, we check of a (user configured) mime type that is indexable as text. --- doc/man1/notmuch-config.rst | 7 +++ lib/database.cc | 12 lib/index.cc| 25 ++--- lib/notmuch-private.h | 4 test/T050-new.sh| 37 - 5 files changed, 81 insertions(+), 4 deletions(-) diff --git a/doc/man1/notmuch-config.rst b/doc/man1/notmuch-config.rst index 388315f6..caa3bd65 100644 --- a/doc/man1/notmuch-config.rst +++ b/doc/man1/notmuch-config.rst @@ -122,6 +122,13 @@ paths are presumed relative to `$HOME` for items in section Default tag prefix (filter) for :any:`notmuch-git`. +.. nmconfig:: index.as_text + + List of regular expressions (without delimiters) for MIME types to + be indexed as text. Currently this applies only to attachments. + + History: This configuration value was introduced in notmuch 0.38. + .. nmconfig:: index.decrypt Policy for decrypting encrypted messages during indexing. Must be diff --git a/lib/database.cc b/lib/database.cc index c05d70d3..6b962a15 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1573,3 +1573,15 @@ notmuch_database_status_string (const notmuch_database_t *notmuch) { return notmuch->status_string; } + +bool +_notmuch_database_indexable_as_text (notmuch_database_t *notmuch, const char *mime_string) +{ +for (size_t i = 0; i < notmuch->index_as_text_length; i++) { + if (regexec (>index_as_text[i], mime_string, 0, NULL, 0) == 0) { + return true; + } +} + +return false; +} diff --git a/lib/index.cc b/lib/index.cc index 728bfb22..629dcb22 100644 --- a/lib/index.cc +++ b/lib/index.cc @@ -380,6 +380,23 @@ _index_pkcs7_part (notmuch_message_t *message, GMimeObject *part, _notmuch_message_crypto_t *msg_crypto); +static bool +_indexable_as_text (notmuch_message_t *message, GMimeObject *part) +{ +GMimeContentType *content_type = g_mime_object_get_content_type (part); +notmuch_database_t *notmuch = notmuch_message_get_database (message); + +if (content_type) { + char *mime_string = g_mime_content_type_get_mime_type (content_type); + if (mime_string) { + bool ret = _notmuch_database_indexable_as_text (notmuch, mime_string); + g_free (mime_string); + return ret; + } +} +return false; +} + /* Callback to generate terms for each mime part of a message. */ static void _index_mime_part (notmuch_message_t *message, @@ -497,9 +514,11 @@ _index_mime_part (notmuch_message_t *message, _notmuch_message_add_term (message, "tag", "attachment"); _notmuch_message_gen_terms (message, "attachment", filename); - /* XXX: Would be nice to call out to something here to parse -* the attachment into text and then index that. */ - goto DONE; + if (! _indexable_as_text (message, part)) { + /* XXX: Would be nice to call out to something here to parse +* the attachment into text and then index that. */ + goto DONE; + } } byte_array = g_byte_array_new (); diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 1d3d2b0c..c19ee8e2 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -259,6 +259,10 @@ _notmuch_database_filename_to_direntry (void *ctx, notmuch_find_flags_t flags, char **direntry); +bool +_notmuch_database_indexable_as_text (notmuch_database_t *notmuch, +const char *mime_string); + /* directory.cc */ notmuch_directory_t * diff --git a/test/T050-new.sh b/test/T050-new.sh index cb67889c..427c5b22 100755 --- a/test/T050-new.sh +++ b/test/T050-new.sh @@ -455,12 +455,47 @@ Date: Fri, 17 Jun 2016 22:14:41 -0400 EOF test_expect_equal_file EXPECTED OUTPUT +add_email_corpus indexing +test_begin_subtest "index text/* attachments, no config" +messages=$(notmuch count id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain) +count=$(notmuch count id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain and ersatz) +test_expect_equal "$messages,$count" "1,0" + +notmuch config set index.as_text "text/" add_email_corpus indexing test_begin_subtest "index text/* attachments" -test_subtest_known_broken notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain > EXPECTED notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain and ersatz > OUTPUT test_expect_equal_file_nonempty EXPECTED OUTPUT +test_begin_subtest "reindex text/* attachments, no config" +notmuch config set index.as_text +notmuch reindex '*' +messages=$(notmuch count id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain) +count=$(notmuch count id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain and ersatz) +test_expect_equal
index user selected MIME types of attachments
This series obsoletes the WIP patch [1]. Most of the work is in making it configurable. [1]: id:20220820185007.289543-4-da...@tethera.net ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH 1/3] test: rename indexing corpus
David Bremner writes: > The corpus is not really suitable for general indexing test since the > sole message is ignored (and will most likely continue to be ignored) > by notmuch-new. applied the first two patches (test changes) to master ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Feature request: search for last N modified mails
Ico writes: > My normal mail workflow is to keep todo's in my inbox; these can be a few days > or weeks old if I'm lazy. Every now and then I find myself accidentally > removing things from the inbox, and I have a very hard time finding things > back. As discussed on #notmuch IRC today: I'd like to request a feature for > searching for the last N modified mails, as notmuch already has this info in > the database. > > bremner suggested this could be implemented using with a query like > >lastmod:-10 > > > > 14:39 < Zevv> basically, I'd like to be able to search for the last 10 > modified > mails. Where 'modified' would probably mean changing tags > 14:41 < bremner> Zevv: you can get the current value from notmuch count > --lastmod '*' > 14:41 < bremner> then basically every tag change increments it > 14:42 < bremner> so I guess subtract 10 and search? > 14:44 < Zevv> Hm that would make it hard to use it as a simple query for in > alot > 14:45 < bremner> yes. > 14:45 < bremner> I never thought about it before, but it would be possible to > impliment lastmod:-10 > 14:47 < Zevv> that would be extremely handy The syntax lastmod:-10.. (or '(lastmod -10 *)' is now supported in git master. ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH 1/3] lib/sexp: provide relative lastmod queries
David Bremner writes: > Test the relatively trivial logic changes for the sexp query parser > first before refactoring that logic to share with the infix query > parser. series applied to master. d ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Database location bug when database does not already exist
Sean Whitton writes: > Hello, > > In the following situation: > > - only notmuch config is ~/.notmuch-config > - database.mail_root is set > - database.path is not set > - notmuch database does not yet exist > > then notmuch wants to create its db under mail_root/.notmuch, contrary > to notmuch-config(1). > This bug should be fixed as of commit 84e4e130e2c920b3dee91901582c4ab6276e2630 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH 2/6] test: add test for creating database in various configurations.
David Bremner writes: > The existing database creation (via add_email_corpus) was always done > in the traditional configuration. The use of xapian-metadata is just > to portably ensure that there is a database created where we expect > there to be. remainder of the series applied to master. d ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH] emacs: new notmuch-tree-process-exit-functions
jao writes: > Hook run when the tree insertion process finishes its job. > applied to master ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH] emacs/show: use read-shell-command instead of read-string
Antoine Beaupré writes: > This enables auto-completion of commands, something which plain > read-string does not do. It's otherwise a drop-in > replacement. According to `C-h f`, read-shell-command was introduced > in Emacs 23.1 or earlier. Applied to master. d ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org