Re: [incubator-ponymail-foal] 01/02: Add a separate header for short bodies for stats.py

Daniel Gruno Sun, 14 Nov 2021 08:22:31 -0800

On 14/11/2021 12.19, sebb wrote:

-1


I think this commit should be reverted.

I see no purpose for the body_short header.

It is absolutely needed. It is what makes it possible to search largelists with very large emails and not have everything crash around you.

When you only need the first bit of the text, there is no need to grabseveral megabytes of email body, especially not if you have thousands ofemails you need to parse. In a production environment I have access to,this caused a significant speedup and went from grabbing 2GB of data persearch to only 50MB - search time and overall backend response time wasan order of magnitude faster after this had been implemented. It is avery vital component of a responsive service.

The only other way would be to use scripting inside the ES query, whichis both dangerous and requires additional setup.

As for the length, 200 is the standard used throughout the interface.Yes, you could change it, and yes you might need to reindex if so, butwho is realistically going to change it?


It wastes space in the database.
It increases the data sent from database to server code.

What if you want to change the length?
Are you going to update the entire database each time the length is changed?

Seems to me the sensible way to do this is in the status.lua plugin.
This can then pick up a config item for the length.

Sebb

On Sun, 17 Oct 2021 at 21:32, sebb <[email protected]> wrote:


On Sun, 17 Oct 2021 at 16:43, <[email protected]> wrote:


This is an automated email from the ASF dual-hosted git repository.

humbedooh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git

commit 2dff9351d119ddee5c5e0171991c54b1911f05b1
Author: Daniel Gruno <[email protected]>
AuthorDate: Sun Oct 17 17:41:51 2021 +0200

     Add a separate header for short bodies for stats.py
---
  tools/archiver.py   | 5 ++++-
  tools/mappings.yaml | 2 ++
  2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/archiver.py b/tools/archiver.py
index 1783d0e..dc12e39 100755
--- a/tools/archiver.py
+++ b/tools/archiver.py
@@ -584,6 +584,8 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
              ghash = hashlib.md5(mailaddr.encode("utf-8")).hexdigest()

              notes.append(["ARCHIVE: Email archived as %s at %u" % 
(document_id, time.time())])
+            body_unflowed = body.unflow() if body else ""
+            body_shortened = body_unflowed[:210]  # 210 so that we can tell if 
> 200.


-1

What's so special about 200 and 210?

These numbers should be constants (with suitable docn) or possibly 
configuration items.

The only bare numbers I would expect to see in code are 0 and 1 (or -1).

              output_json = {
                  "from_raw": msg_metadata["from"],
@@ -603,7 +605,8 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
                  "private": private,
                  "references": msg_metadata["references"],
                  "in-reply-to": irt,
-                "body": body.unflow() if body else "",
+                "body": body_unflowed,
+                "body_short": body_shortened,
                  "html_source_only": body and body.html_as_source or False,
                  "attachments": attachments,
                  "forum": (lid or "").strip("<>").replace(".", "@", 1),
diff --git a/tools/mappings.yaml b/tools/mappings.yaml
index 4bb4978..6ad72d3 100644
--- a/tools/mappings.yaml
+++ b/tools/mappings.yaml
@@ -55,6 +55,8 @@ mbox:
            type: long
      body:
        type: text
+    body_short:
+      type: text
      cc:
        type: text
      date:

Re: [incubator-ponymail-foal] 01/02: Add a separate header for short bodies for stats.py

Reply via email to