Trying to answer my own question, I'm looking for a way to efficiently:
1. list all things shared with a user
2. sort those things by a date
and I don't want the complexity of this query to increase with the amount
of data that's in the DB (it's a social network feed so it will always
grow). As explained in the previous message, a naive approach (iterating on
users, then iterating on their publications) would always consume more CPU
and memory.
My idea now is to denormalize the shares in a normal (not edge) collection,
where for each share I would store a document with a field composed of:
{userId}_{time of the share} and create a skiplist index on that field.
The query would then be:
for s in shares
filter s.compoundField >= 'userId_'
sort s.compoundField desc
filter LIKE(s.compoundField, 'userId_%')
return s
The explanation of the query looks good, as it seems to directly use the
skiplist to target the userId AND sort. Your comments would be very
appreciated here!
Thanks,
Thomas
On Sunday, April 9, 2017 at 8:19:21 PM UTC+8, Thomas Weiss wrote:
>
> Working on a social network project, I've recently realized that the way I
> was fetching users' feed was not scalable:
> for f in 1 outbound 'users/<user-id>' follows
> for c, p in 1 outbound f hasPublished
> sort p.publishedAt
> return c
> As the number of users and published content grows, this query would
> always consume more CPU and memory!
>
> So in order to optimize that, I've started to denormalize my data by using
> a 'isSharedWith' edge collection that links users to published content and
> has a skiplist index on a field named 'lastPublishedAt'.
> So now my query looks like:
> for c, s in 1 inbound 'users/<user-id>' isSharedWith
> sort s.lastPublishedAt desc
> return c
>
> The "explanation" of this query is:
> Execution plan:
> Id NodeType Est. Comment
> 1 SingletonNode 1 * ROOT
> 2 TraversalNode 84 - FOR c /* vertex */, s /* edge */ IN
> 1..1 /* min..maxPathDepth */ INBOUND 'users/283139442928' /* startnode */
> isSharedWith
> 3 CalculationNode 84 - LET #4 = s.`lastPublishedAt` /*
> attribute expression */
> 4 SortNode 84 - SORT #4 DESC
> 5 ReturnNode 84 - RETURN c
>
>
> Indexes used:
> By Type Collection Unique Sparse Selectivity Fields
> Ranges
> 2 edge isSharedWith false false 1.18 % [ `_from`,
> `_to` ] base INBOUND
>
>
> Traversals on graphs:
> Id Depth Vertex collections Edge collections Options
> Filter conditions
> 2 1..1 isSharedWith uniqueVertices: none
> , uniqueEdges: path
>
>
> Optimization rules applied:
> none
>
> But this still doesn't look good to me; it seems that a full traversal is
> first performed in order to retrieve lastPublishedAt, and then sort on that
> field.
>
> So my question is, would there be a way to denormalize + query that kind
> of data in order to make sure that the complexity of the query (getting the
> x most recent elements) doesn't grow with the amount of data?
>
> Thanks in advance,
> Thomas
>
--
You received this message because you are subscribed to the Google Groups
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.