On 14/11/2021 12.19, sebb wrote:
-1

I think this commit should be reverted.

I see no purpose for the body_short header.

It is absolutely needed. It is what makes it possible to search large lists with very large emails and not have everything crash around you.

When you only need the first bit of the text, there is no need to grab several megabytes of email body, especially not if you have thousands of emails you need to parse. In a production environment I have access to, this caused a significant speedup and went from grabbing 2GB of data per search to only 50MB - search time and overall backend response time was an order of magnitude faster after this had been implemented. It is a very vital component of a responsive service.

The only other way would be to use scripting inside the ES query, which is both dangerous and requires additional setup.

As for the length, 200 is the standard used throughout the interface. Yes, you could change it, and yes you might need to reindex if so, but who is realistically going to change it?


It wastes space in the database.
It increases the data sent from database to server code.

What if you want to change the length?
Are you going to update the entire database each time the length is changed?

Seems to me the sensible way to do this is in the status.lua plugin.
This can then pick up a config item for the length.

Sebb

On Sun, 17 Oct 2021 at 21:32, sebb <[email protected]> wrote:

On Sun, 17 Oct 2021 at 16:43, <[email protected]> wrote:

This is an automated email from the ASF dual-hosted git repository.

humbedooh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git

commit 2dff9351d119ddee5c5e0171991c54b1911f05b1
Author: Daniel Gruno <[email protected]>
AuthorDate: Sun Oct 17 17:41:51 2021 +0200

     Add a separate header for short bodies for stats.py
---
  tools/archiver.py   | 5 ++++-
  tools/mappings.yaml | 2 ++
  2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/archiver.py b/tools/archiver.py
index 1783d0e..dc12e39 100755
--- a/tools/archiver.py
+++ b/tools/archiver.py
@@ -584,6 +584,8 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
              ghash = hashlib.md5(mailaddr.encode("utf-8")).hexdigest()

              notes.append(["ARCHIVE: Email archived as %s at %u" % 
(document_id, time.time())])
+            body_unflowed = body.unflow() if body else ""
+            body_shortened = body_unflowed[:210]  # 210 so that we can tell if 
> 200.


-1

What's so special about 200 and 210?

These numbers should be constants (with suitable docn) or possibly 
configuration items.

The only bare numbers I would expect to see in code are 0 and 1 (or -1).

              output_json = {
                  "from_raw": msg_metadata["from"],
@@ -603,7 +605,8 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
                  "private": private,
                  "references": msg_metadata["references"],
                  "in-reply-to": irt,
-                "body": body.unflow() if body else "",
+                "body": body_unflowed,
+                "body_short": body_shortened,
                  "html_source_only": body and body.html_as_source or False,
                  "attachments": attachments,
                  "forum": (lid or "").strip("<>").replace(".", "@", 1),
diff --git a/tools/mappings.yaml b/tools/mappings.yaml
index 4bb4978..6ad72d3 100644
--- a/tools/mappings.yaml
+++ b/tools/mappings.yaml
@@ -55,6 +55,8 @@ mbox:
            type: long
      body:
        type: text
+    body_short:
+      type: text
      cc:
        type: text
      date:

Reply via email to