On Tue, Feb 06, 2018 at 02:24:21AM +1100, Daniel Axtens wrote: > Hi Veronika, > > So, having been convinced of the merits of this approach, here's my code > review. > > > Sometimes, multiple projects reside at the same mailing list. So far, > > Patchwork only allowed a single project per mailing list, which made it > > impossible for these projects to use Patchwork (unless they did some > > dirty hacks). > > > > Add a new property `subject_match` to projects and implement filtering > > on (list_id, subject_match) match instead of solely list_id. Instance > > admin can specify a regex on a per-project basis when the project is > > created. > > Ok, so let me see if I have the logic of this patch right. > > A message comes in. > > A set of possible projects is found based on the list-id. > > Then, starting from the first one returned by the db (with no intrinsic > ordering), the subject_match field is examined: > - if subject_match is empty, this project is returned as a match > - if subject_match is non-empty, the project is returned as a match if > the subject matches > > If this understanding is incorrect, bail out here :) > > Consider a list like netdev. It has at least a couple of userspace > projects (iproute2, ethtool) that could conceivably be spun off into > their own lists, and which could use subject matches, but most patches > are for the linux kernel and do not have usable subject matches. > > As I understand it, you would need to have 3 projects: > - for iproute2, subject-match \[[^\]]*iproute[^\]]*\] > - for ethtool, subject-match \[[^\]]*ethtool[^\]]*\] > Then how would you match the rest of the list? You could have an empty > subject-match, but then if that is loaded first out of the database, it > would match first and patches would never make it to the iproute2 or > ethtool projects. > > I think you need some sort of default or fall-back or last-resort - the > name is not important: just that it will match *only* if no other > project matches.
Good point. Internally in Red Hat we had a default bucket that caught stuff like this because regex's can't capture 100% of human behaviour (humans are great at doing dumb things :-) ). Periodically we just reviewed the bucket and manually moved patches to the right project. I think that is what you are thinking, right? Cheers, Don > > Does that sound right? > > I have some other minor comments which I have included throughout: > > > diff --git a/docs/api.yaml b/docs/api.yaml > > index 3e79f0b..3373226 100644 > > --- a/docs/api.yaml > > +++ b/docs/api.yaml > > @@ -374,6 +374,9 @@ definitions: > > list_id: > > type: string > > description: Mailing list identifier for project. > > + subject_match: > > + type: string > > + description: Regex used for email filtering. > This is good. > > > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst > > index c50b7b6..870d7ee 100644 > > --- a/docs/deployment/management.rst > > +++ b/docs/deployment/management.rst > > @@ -63,7 +63,7 @@ due to, for example, an outage. > > .. option:: --list-id <list-id> > > > > mailing list ID. If not supplied, this will be extracted from the mail > > - headers. > > + headers. Don't use this option if you require filtering based on > > subjects. > > > > .. option:: infile > > I don't understand why this would interfere with subject > filtering. I notice below that you've added a new method that matches by > list-id and subject and kept the old list-id matching one - does the > command-line option use the old method? > > I think I would prefer that there only be one system of mapping mails to > projects and for this restriction to go away, but if there's a good > reason for it to stay I am always open to persuasion. > > > diff --git a/patchwork/api/project.py b/patchwork/api/project.py > > index 446c473..597f605 100644 > > --- a/patchwork/api/project.py > > +++ b/patchwork/api/project.py > > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer): > > class Meta: > > model = Project > > fields = ('id', 'url', 'name', 'link_name', 'list_id', > > 'list_email', > > - 'web_url', 'scm_url', 'webscm_url', 'maintainers') > > - read_only_fields = ('name', 'maintainers') > > + 'web_url', 'scm_url', 'webscm_url', 'maintainers', > > + 'subject_match') > > + read_only_fields = ('name', 'maintainers', 'subject_match') > This looks good - albeit I am not an API expert. > > > diff --git a/patchwork/models.py b/patchwork/models.py > > index 11886f1..907707f 100644 > > --- a/patchwork/models.py > > +++ b/patchwork/models.py > > @@ -71,8 +71,14 @@ class Project(models.Model): > > > > linkname = models.CharField(max_length=255, unique=True) > > name = models.CharField(max_length=255, unique=True) > > - listid = models.CharField(max_length=255, unique=True) > > + listid = models.CharField(max_length=255) > > listemail = models.CharField(max_length=200) > > + subject_match = models.CharField( > > + max_length=64, blank=True, default='', help_text='Regex to match > > the ' > > + 'subject against if only part of emails sent to the list belongs > > to ' > > + 'this project. Will be used with IGNORECASE and MULTILINE flags. > > If ' > > + 'rules for more projects match the first one returned from DB is ' > > + 'chosen.') > > I don't want to bikeshed this too much, but it definitely needs to say > that a blank string matches everything. > > > diff --git a/patchwork/parser.py b/patchwork/parser.py > > index ac7dc5f..015f709 100644 > > --- a/patchwork/parser.py > > +++ b/patchwork/parser.py > > @@ -157,9 +157,25 @@ def find_project_by_id(list_id): > > project = Project.objects.get(listid=list_id) > > except Project.DoesNotExist: > > logger.debug("'%s' if not a valid project list-id", list_id) > > + except Project.MultipleObjectsReturned: > > + logger.debug("Multiple projects with list-id '%s' found", list_id) > > return project > > > > > > +def find_project_by_id_and_subject(list_id, subject): > > + """Find a `project` object based on `list_id` and subject prefix > > match.""" > > + projects = Project.objects.filter(listid=list_id) > > + for project in projects: > > + if (not project.subject_match or > > + re.search(project.subject_match, subject, > > + re.MULTILINE | re.IGNORECASE)): > > + return project > > + > > + logger.debug("No project to match email with list-id '%s', subject > > '%s' " > > + "found", list_id, subject) > > + return None > > + > > + > > def find_project_by_header(mail): > > project = None > > listid_res = [re.compile(r'.*<([^>]+)>.*', re.S), > > @@ -181,7 +197,8 @@ def find_project_by_header(mail): > > > > listid = match.group(1) > > > > - project = find_project_by_id(listid) > > + project = find_project_by_id_and_subject(listid, > > + mail.get('Subject')) > > if project: > > break > Again here there is the 2 ways to find a project. It does look like > subject matching only happens if you don't specify --list-id and I think > that is probably not the approach I want to take. > > Patchwork also extracts the bits of the subject between square brackets > - would matching on that be more appropriate? (See subject_prefixes in > parser.py.) That would mean the regex will automatically select for > things in square brackets without having to encode that in the regex, > which might avoid some user confusion and misdirected mail. > > Lastly, please could you also add a simple test case for this: doesn't need to > be big, just enough to show that the feature works. > > Thanks again for your contribution to Patchwork! > > Regards, > Daniel > _______________________________________________ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork