For the tags filter, I can also: * perform a query to fetch all documents that correspond to the given tags (it doesn't need to be permanent, but it would be better to cache it) * add a HQL filter on these pages (OR document IN :list_of_pages).
It's a little variation of solution A. It's ugly but it could do the job. 2018-06-27 12:58 GMT+02:00 Guillaume Delhumeau < [email protected]>: > Hi developers. > > I am trying to add a new filter to the notifications to be able to follow > pages > that are marked with a given tag. And it leads me to some questions about > the > technical implementation of the notifications. > > To remind the context: notifications are computed on top of the events > recorded > by the event stream (a.k.a activity stream). We take events from the event > stream SQL table, we apply some transformations on them, and we display > them to > the user. > > Then we have implemented the ability to filter on these events: for > example > "don't show events concerning the document A nor the wiki B". Filters are > implemented with 2 distinct ways: > > 1/ SQL injections: each filter can add SQL elements in the query we make > to > fetch the events from the event stream table. We made this mechanism > so we > can let the database do a lot of the filtering process. After all, > it's its > job and it's supposed to perform well. To be precise, Clement has > even > created an Abstract Syntax Tree (AST) so it's easier to inject some > content > in the query and it creates an abstraction over the SQL language so > we can > even consider to change the storage of the event stream someday. > > The bad thing is that some complex filtering are difficult to write > with > the SQL language (event with the AST) or even impossible. > > 2/ Post-filtering: after the events have been fetched from the database, > each > filter can still decide to keep or filter them. This is useful for > complex filtering that cannot be expressed with the SQL language. It > is > also needed by the real-time notification email sender, because it > takes > the events immediately when they occurs without fetching them in the > database (so SQL filters are bypassed). > > The bad thing is that some events are loaded in the memory to finally > be > rejected, and these filters can perform costly operations such as > loading > documents. > > Until now, this double mechanism was working quite well, with each > mechanism > filling the lacks of the other. > > However, we still have technical limitations in our design: > > 1/ Users who have a lot of filter preferences can end up with a giant > SQL > query that is almost impossible to perform by the database. Actually > we had > a user complaining about an OutOfMemory problem in the HQL to SQL > translator ! > > 2/ I cannot implement the tag filter ! > > The tag filter is supposed to show events that concern pages that hold a > given > tag, EVEN IF THE PAGE WAS EXCLUDED BY THE USER. Example of use-case: "I > don't > want to receive notifications about wiki A except for pages marked with > the tag > T". > > And it is not working. First because it is difficult to write a SQL query > for > that. It requires to make a join clause with the document and the object > tables, > which our SQL injection mechanism does not support. Even if it were > possible, > creating a SQL join with the document table will de-facto filter events > that do > not concern any pages or pages that do not have any objects: so many other > filters will be broken. I also don't consider creating a SQL subquery, I > think > the whole query would became too big. So I decided to just not inject any > SQL > code for this filter and only implement the post-filtering mechanism. > > But the other filter "EXCLUDE WIKI A" generates a SQL injection such as > "WIKI <> 'WIKI A'" so the events concerning the wiki A are not fetched > from the > database. Consequence: the tag filter never see the events that it is > supposed > to keep. It would be actually possible to by-pass the first SQL injections > by > injecting something like "OR 1=1". But doing something like this is like > dropping the all SQL injections mechanism. > > I see some solutions for this problem: > > A/ For each tag, create a permanent list of pages that hold it. So I can > inject "OR document IN (that_list)". I think this is heavy. > > B/ Drop the SQL injection mechanism and only rely on the post-filtering > mechanism. It would require to load from the database A LOT of events, > but maybe we could cache this. > > C/ Don't drop the SQL injection mechanism completely but use it as > little as > possible (for example, do not use it for LOCATION filtering). Seems > hard to > determine when a filter should use this feature or not. > > D/ Don't implement the "tags" filter, since it is the root of the issue. > But > it is like sweeping dirt under the carpet! > > Since we have the OutOfMemory problem with the SQL injections becoming too > huge, > I am more in favor of solution B or C. But I'm not sure for now, since I > do not > know how much it would impact the performances and the scalability of the > whole > notifications feature. > > This is a complex topic, but I hope this message will inspire you some > suggestions or things I have not seen with my own eyes. > > Thanks for your help, > > Guillaume > -- Guillaume Delhumeau ([email protected]) Research & Development Engineer at XWiki SAS Committer on the XWiki.org project

