For the tags filter, I can also:

* perform a query to fetch all documents that correspond to the given tags
(it doesn't need to be permanent, but it would be better to cache it)
* add a HQL filter on these pages (OR document IN :list_of_pages).

It's a little variation of solution A. It's ugly but it could do the job.

2018-06-27 12:58 GMT+02:00 Guillaume Delhumeau <
[email protected]>:

> Hi developers.
>
> I am trying to add a new filter to the notifications to be able to follow
> pages
> that are marked with a given tag. And it leads me to some questions about
> the
> technical implementation of the notifications.
>
> To remind the context: notifications are computed on top of the events
> recorded
> by the event stream (a.k.a activity stream). We take events from the event
> stream SQL table, we apply some transformations on them, and we display
> them to
> the user.
>
> Then we have implemented the ability to filter on these events: for
> example
> "don't show events concerning the document A nor the wiki B". Filters are
> implemented with 2 distinct ways:
>
>   1/ SQL injections: each filter can add SQL elements in the query we make
> to
>      fetch the events from the event stream table. We made this mechanism
> so we
>      can let the database do a lot of the filtering process. After all,
> it's its
>      job and it's supposed to perform well. To be precise, Clement has
> even
>      created an Abstract Syntax Tree (AST) so it's easier to inject some
> content
>      in the query and it creates an abstraction over the SQL language so
> we can
>      even consider to change the storage of the event stream someday.
>
>      The bad thing is that some complex filtering are difficult to write
> with
>      the SQL language (event with the AST) or even impossible.
>
>   2/ Post-filtering: after the events have been fetched from the database,
> each
>      filter can still decide to keep or filter them. This is useful for
>      complex filtering that cannot be expressed with the SQL language. It
> is
>      also needed by the real-time notification email sender, because it
> takes
>      the events immediately when they occurs without fetching them in the
>      database (so SQL filters are bypassed).
>
>      The bad thing is that some events are loaded in the memory to finally
> be
>      rejected, and these filters can perform costly operations such as
> loading
>      documents.
>
> Until now, this double mechanism was working quite well, with each
> mechanism
> filling the lacks of the other.
>
> However, we still have technical limitations in our design:
>
>   1/ Users who have a lot of filter preferences can end up with a giant
> SQL
>      query that is almost impossible to perform by the database. Actually
> we had
>      a user complaining about an OutOfMemory problem in the HQL to SQL
>      translator !
>
>   2/ I cannot implement the tag filter !
>
> The tag filter is supposed to show events that concern pages that hold a
> given
> tag, EVEN IF THE PAGE WAS EXCLUDED BY THE USER. Example of use-case: "I
> don't
> want to receive notifications about wiki A except for pages marked with
> the tag
> T".
>
> And it is not working. First because it is difficult to write a SQL query
> for
> that. It requires to make a join clause with the document and the object
> tables,
> which our SQL injection mechanism does not support. Even if it were
> possible,
> creating a SQL join with the document table will de-facto filter events
> that do
> not concern any pages or pages that do not have any objects: so many other
> filters will be broken. I also don't consider creating a SQL subquery, I
> think
> the whole query would became too big. So I decided to just not inject any
> SQL
> code for this filter and only implement the post-filtering mechanism.
>
> But the other filter "EXCLUDE WIKI A" generates a SQL injection such as
> "WIKI <> 'WIKI A'" so the events concerning the wiki A are not fetched
> from the
> database. Consequence: the tag filter never see the events that it is
> supposed
> to keep. It would be actually possible to by-pass the first SQL injections
> by
> injecting something like "OR 1=1". But doing something like this is like
> dropping the all SQL injections mechanism.
>
> I see some solutions for this problem:
>
>   A/ For each tag, create a permanent list of pages that hold it. So I can
>      inject "OR document IN (that_list)". I think this is heavy.
>
>   B/ Drop the SQL injection mechanism and only rely on the post-filtering
>      mechanism. It would require to load from the database A LOT of events,
>      but maybe we could cache this.
>
>   C/ Don't drop the SQL injection mechanism completely but use it as
> little as
>      possible (for example, do not use it for LOCATION filtering). Seems
> hard to
>      determine when a filter should use this feature or not.
>
>   D/ Don't implement the "tags" filter, since it is the root of the issue.
> But
>      it is like sweeping dirt under the carpet!
>
> Since we have the OutOfMemory problem with the SQL injections becoming too
> huge,
> I am more in favor of solution B or C. But I'm not sure for now, since I
> do not
> know how much it would impact the performances and the scalability of the
> whole
> notifications feature.
>
> This is a complex topic, but I hope this message will inspire you some
> suggestions or things I have not seen with my own eyes.
>
> Thanks for your help,
>
> Guillaume
>



-- 
Guillaume Delhumeau ([email protected])
Research & Development Engineer at XWiki SAS
Committer on the XWiki.org project

Reply via email to