Re: how to search for hyphenated words? (was: how to search for Morse code?)
David Bremner writes: > Matt Armstrong writes: > >> Carl Worth writes: >> >>> Hi Gregor, >>> >>> The trick here is that when notmuch is indexing body text it feeds it >>> into a Xapian function that parses the text by finding "terms" in the >>> text. And this parser considers both punctuation and whitespace as >>> separators between terms. >> >> I notice that Xapian supports something called "phrase searches", >> documented as: >> >> "A phrase surrounded with double quotes ("") matches documents >> containing that exact phrase. Hyphenated words are also treated as >> phrases, as are cases such as filenames and email addresses >> (e.g. /etc/passwd or presid...@whitehouse.gov)." >> >> I assume that this particular Xapian feature is unavailable in notmuch? >> If so, I wonder if enabling has ever been considered? > > It is enabled, and documented in notmuch-search-terms(7). Unfortunately > I don't think it's related to the original request. The mention of > hyphenated words is about the input to the query parser, not the > (necessarily) the retrieved text. Ah, so it boils down to the Xapian definition of "exact phrase." Notably, "exact phrase" is not "identical sequence of characters" as some people might expect. Quick tests with various search engines reveal their phrase search as operating the same way. E.g. searching for "org notmuch" finds all sorts of results: org-notmuch.el notmuchmail.org/notmuch-emacs/ to:devicet...@vger.kernel.org notmuch tag +inbox +unread -new (require 'org-notmuch nil t) https://notmuchmail.org/notmuch-emacs/. * imaps://mail.example.org/Notmuch/search For what it is worth, one thing I've taken to doing is using period separators in the notmuch phrase searches I use in scripts and even interactively. Using periods is generally immune to confusing issues related to quoting double quoted things, and always remains a single shell "word." They are also, most often, clearly not the exact content I'm searching for, so they make it clear than the match algorithm is inexact. E.g. subject:notmuch.is.wonderful instead of: subject:"notmuch is wonderful" ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Hi David, * David Bremner [2019-03-12; 07:41]: > Gregor Zattler writes: > > >> From: root@len.workgroup (Cron Daemon) >> Subject: Cron ~/bin/mailwiederdurchschleusen >> To: root@localhost >> Date: Fri, 29 Dec 2017 17:00:09 +0100 >> >> Date: Thu, 28 Dec 2017 21:04:52 -0500 >> From: Maxim Cournoyer >> To: help-gnu-em...@gnu.org >> Subject: Re: Gnus and emails sent by me >> -- >> Date: Thu, 28 Dec 2017 22:00:56 -0400 >> From: David Bremner >> To: David Edmondson , notmuch@notmuchmail.org >> Subject: Re: Xapian exception leading to database corruption >> -- > > The line > > To: David Edmondson , notmuch@notmuchmail.org > > contains the phrase "org notmuch". You can see this easier by stripping > all the punctuation. Thanks, now I see (the light :-) Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
On Tue, Mar 12 2019, Gregor Zattler wrote: > what I do not understand is that it dosn't matter if I search for > > org-notmuch > > or > > "org-notmuch" > > '"org-notmuch"' > > or even > > org ADJ/1 notmuch Correct. All four of those forms are giving you phrase searches, (so a term "org" followed immediately by a term "notmuch"). > a typical example of a matched message is the attached one. > Somehow the search matches the address of this very mailing list > in the body of the email (I assume). No, I don't think you are seeing a match on the mailing-list address itself, (which has "notmuch" two terms before "org"). > Therefore I wonder why notmuch matches 581 messages, not 16795 > messages or 77 messages. David showed you one example from the message you copied: > To: David Edmondson , notmuch@notmuchmail.org And I showed one earlier in the thread. In each case, the message includes "org" followed (after some amount of punctuation and whitespace, perhaps including newlines) by "notmuch". -Carl signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Gregor Zattler writes: > From: root@len.workgroup (Cron Daemon) > Subject: Cron ~/bin/mailwiederdurchschleusen > To: root@localhost > Date: Fri, 29 Dec 2017 17:00:09 +0100 > > Date: Thu, 28 Dec 2017 21:04:52 -0500 > From: Maxim Cournoyer > To: help-gnu-em...@gnu.org > Subject: Re: Gnus and emails sent by me > -- > Date: Thu, 28 Dec 2017 22:00:56 -0400 > From: David Bremner > To: David Edmondson , notmuch@notmuchmail.org > Subject: Re: Xapian exception leading to database corruption > -- The line To: David Edmondson , notmuch@notmuchmail.org contains the phrase "org notmuch". You can see this easier by stripping all the punctuation. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Hi David, Matt, Carl, notmuch developers, * David Bremner [2019-03-11; 22:13]: > Matt Armstrong writes: >> Carl Worth writes: >>> The trick here is that when notmuch is indexing body text it feeds it >>> into a Xapian function that parses the text by finding "terms" in the >>> text. And this parser considers both punctuation and whitespace as >>> separators between terms. >> >> I notice that Xapian supports something called "phrase searches", >> documented as: >> >> "A phrase surrounded with double quotes ("") matches documents >> containing that exact phrase. Hyphenated words are also treated as >> phrases, as are cases such as filenames and email addresses >> (e.g. /etc/passwd or presid...@whitehouse.gov)." >> >> I assume that this particular Xapian feature is unavailable in notmuch? >> If so, I wonder if enabling has ever been considered? > > It is enabled, and documented in notmuch-search-terms(7). Unfortunately > I don't think it's related to the original request. The mention of > hyphenated words is about the input to the query parser, not the > (necessarily) the retrieved text. what I do not understand is that it dosn't matter if I search for org-notmuch or "org-notmuch" '"org-notmuch"' or even org ADJ/1 notmuch $ notmuch count --output=messages '"org-notmuch"' 581 $ notmuch count --output=messages 'org-notmuch' 581 $ notmuch count --output=messages org-notmuch 581 $ notmuch count --output=messages org ADJ/1 notmuch 581 a typical example of a matched message is the attached one. Somehow the search matches the address of this very mailing list in the body of the email (I assume). But obviously there are much more emails with this address in them: $ notmuch count --output=messages 'notmuch@notmuchmail.org' 27396 $ notmuch count --output=messages '"notmuch@notmuchmail.org"' 27396 Or with a naive search (no decoding of possible base64 encoded parts) there are $ find /home/grfz/Mail/~ml/emacs-orgm...@gnu.org /home/grfz/Mail/~ml/notmuch@notmuchmail.org* -type f -print0 | xargs -0r grep -l -- 'notmuch@notmuchmail.org' | xargs -I sh -c "cat | sed -e '1,/^$/ d' | grep -c notmuch@notmuchmail.org " | egrep -c "1|2|3|4|5|6|7|8|9" 16795 emails with the address at least once in the body. Therefore I wonder why notmuch matches 581 messages. A naive search for org-notmuch on the files (no decoding of possible base64 encoded parts) only shows 79 files (77 unique emails): mkdir -vp /tmp/test/{cur,new,tmp} $ find /home/grfz/Mail/~ml/emacs-orgm...@gnu.org /home/grfz/Mail/~ml/notmuch@notmuchmail.org* -type f -print0 | xargs -0r grep -l -- 'org-notmuch' | xargs ln -vs --target-directory=/tmp/kolp/cur/ | wc -l 79 Therefore I wonder why notmuch matches 581 messages, not 16795 messages or 77 messages. Somehow these numbers do not fit!? Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- --- Begin Message --- Date: Thu, 28 Dec 2017 21:04:52 -0500 From: Maxim Cournoyer To: help-gnu-em...@gnu.org Subject: Re: Gnus and emails sent by me -- Date: Thu, 28 Dec 2017 22:00:56 -0400 From: David Bremner To: David Edmondson , notmuch@notmuchmail.org Subject: Re: Xapian exception leading to database corruption -- --- End Message --- ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Matt Armstrong writes: > Carl Worth writes: > >> Hi Gregor, >> >> The trick here is that when notmuch is indexing body text it feeds it >> into a Xapian function that parses the text by finding "terms" in the >> text. And this parser considers both punctuation and whitespace as >> separators between terms. > > I notice that Xapian supports something called "phrase searches", > documented as: > > "A phrase surrounded with double quotes ("") matches documents > containing that exact phrase. Hyphenated words are also treated as > phrases, as are cases such as filenames and email addresses > (e.g. /etc/passwd or presid...@whitehouse.gov)." > > I assume that this particular Xapian feature is unavailable in notmuch? > If so, I wonder if enabling has ever been considered? It is enabled, and documented in notmuch-search-terms(7). Unfortunately I don't think it's related to the original request. The mention of hyphenated words is about the input to the query parser, not the (necessarily) the retrieved text. d ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Gregor Zattler writes: > Hi David, notmuch developers, > * David Bremner [2019-03-10; 20:22]: >> Gregor Zattler writes: >>> How would one search for hyphenated words with notmuch? >>> >> >> In special cases, explained in notmuch-search-terms(7), one can use >> regexp searches, which are slower, but don't drop punctuation. > > thanks, this works for the subject: field, which helps a lot. > > Regexes do not work on the body of messages and I assume they > will not work with the upcoming "body:" field? That's correct. d ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Hi David, notmuch developers, * David Bremner [2019-03-10; 20:22]: > Gregor Zattler writes: >> How would one search for hyphenated words with notmuch? >> > > In special cases, explained in notmuch-search-terms(7), one can use > regexp searches, which are slower, but don't drop punctuation. thanks, this works for the subject: field, which helps a lot. Regexes do not work on the body of messages and I assume they will not work with the upcoming "body:" field? Thanks for your attention, Gregor ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Gregor Zattler writes: > > How would one search for hyphenated words with notmuch? > In special cases, explained in notmuch-search-terms(7), one can use regexp searches, which are slower, but don't drop punctuation. d ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for hyphenated words? (was: how to search for Morse code?)
Hi Gregor, The trick here is that when notmuch is indexing body text it feeds it into a Xapian function that parses the text by finding "terms" in the text. And this parser considers both punctuation and whitespace as separators between terms. So your messages are not being indexed in a way to let you distinguish between "org notmuch" and "org-notmuch". (Of note, the query parser applies the same parsing to your query---so that even when you think you're typing an exact phrase like "org-notmuch" that gets parsed into separate terms "org" and "notmuch" for searching.) > all these resulted in very many hits most or all of which do not > contain the string "org-notmuch", one found email was e.g. > > id:20180904105723.15564-3-da...@tethera.net That message does contain the following: +test_emacs '(notmuch-tree "id:000-real-r...@example.org") + (notmuch-test-wait) Where you will notice that there's a term "org" followed (after some punctuation and whitespace separators) by a term "notmuch". > How would one search for hyphenated words with notmuch? You would need to arrange to have the indexer consider the hyphen as a letter-like character to be made part of terms. Or be extra clever and index something like "notmuch-test-wait" in multiple ways (such as a single term "notmuch-test-wait" as well as three adjacent terms "notmuch", "test", and "wait" as notmuch is doing currently). -Carl signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
how to search for hyphenated words? (was: how to search for Morse code?)
Hello, * Gregor Zattler [2018-07-23; 14:20]: > today I searched for emails containing > > -... --- .-. . -.. ..--.. ...-.- today I searched for emails containing "org-notmuch" (which supports org links to notmuch searches), e.g. with notmuch search org-notmuch notmuch search -- org-notmuch notmuch search -- "org-notmuch" notmuch search -- '"org-notmuch"' notmuch search -- '+"org-notmuch"' notmuch search -- org ADJ/1 notmuch all these resulted in very many hits most or all of which do not contain the string "org-notmuch", one found email was e.g. id:20180904105723.15564-3-da...@tethera.net How would one search for hyphenated words with notmuch? Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for Morse code?
On 18-07-23 15:16:07, Ben Oliver wrote: On 18-07-23 14:20:41, Gregor Zattler wrote: Hello, today I searched for emails containing -... --- .-. . -.. ..--.. ...-.- Heh I suppose the problem is that xapian won't take two periods ".." even in quotes. I asked on their IRC about how to escape it but it's quiet So it seems like morse code would not be indexed, which makes sense. Sorry! signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: how to search for Morse code?
On 18-07-23 14:20:41, Gregor Zattler wrote: Hello, today I searched for emails containing -... --- .-. . -.. ..--.. ...-.- Heh I suppose the problem is that xapian won't take two periods ".." even in quotes. I asked on their IRC about how to escape it but it's quiet signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
how to search for Morse code?
Hello, today I searched for emails containing -... --- .-. . -.. ..--.. ...-.- tried with notmuch search "-... --- .-. . -.. ..--.. ...-.-" and notmuch search '-... --- .-. . -.. ..--.. ...-.-' and even notmuch search '"-... --- .-. . -.. ..--.. ...-.-"' and also with double dashes in front of the search term: notmuch search -- "-... --- .-. . -.. ..--.. ...-.-" All these searches produce notmuch search: A Xapian exception occurred A Xapian exception occurred parsing query: Unknown range operation Query string was: "-... --- .-. . -.. ..--.. ...-.-" Is it possible to search for emails containing my supposedly funny signature? Obviously this is not much of a problem for me, but perhaps I hit a hidden bug? Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch