[arangodb-google] Re: AQL / Foxx queries are long and messy - how to shorten?

ropeladder Wed, 18 May 2016 05:20:53 -0700

Thanks, Willi. Definitely looking forward to 3.0. I'm on linux (mint) but 
not much of a power user, I just switched over from Windows a few months 
ago. Regarding permission, I'm guessing I should either run Webstorm as an 
admin or change the permissions on the arango controllers directory?


Got the console working after a bit of work. Thanks.

~~~

A note about traversals as well for anyone who might be reading: I tried 
just doing a basic depth + 1 traversal in order to get all the extra edges 
and paths, and it ends up being significantly slower with my test node 
slower than the other queries using subqueries. Might be related to my 
graph structure.

Here are some of the different ways I've tried to do the query:

1. GRAPH_NEIGHBORS depth = 1 along a specific graph with subqueries* for 
each node; GRAPH_EDGES for edges; then sort and limit by (temporary, 
constructed from graph) node properties (as suggested by mhacki on 
stackoverflow, with added subqueries, sorting and limiting))

2. FOR ... GRAPH depth = 1 traversal along specific graph with subqueries 
for each node, storing nodes and edges; then sort and limit by (temporary, 
constructed from graph) node properties

3. FOR ... GRAPH depth = 2 traversal along combined graph, storing nodes 
and edges and paths; to be sorted/limited/rearranged later via javascript 
(the query itself, without javascript, is slower here than the others)

4. GRAPH NEIGHBORS depth = 1 with a subquery only for the field the nodes 
will be sorted (and limited) by; sort and limit the nodes; then run 
subqueries on the limited subset of nodes

5. Foxx traversal, depth = 2 along combined graph, pushing nodes and edges 
as necessary (as suggested by mhacki on stackoverflow, with additions)

* I've tried both "FOR ... GRAPH" traversals and "GRAPH_NEIGHBORS" for the 
subqueries; graph neighbors seems to work a bit faster

-Ben

On Tuesday, May 17, 2016 at 3:54:53 PM UTC+2, Wilfried Gösgens wrote:
>
> Hi, 
>
> Most probably the best way is to use a traversal of depth 2, and figuring 
> out the paths in .js afterwards.
> ArangoDB 3 will bring some major overhauls to the graphing features; they 
> will most probably solve some of your problems.
> Michael will have a deeper look at your post once the 3.0 Release is 
> available.
>
> Regarding your permission troubles, whats your host OS?
>
> Regarding console.log - if you don't run arangod as daemon, but on the 
> console, you will see your output there; else you will find it in the 
> ArangoDB logfiles.
>
>
> Cheers, 
> Willi
>
>
> On Friday, May 13, 2016 at 4:33:31 PM UTC+2, [email protected] wrote:
>>
>> Apologies for the gigantic post and thanks in advance to anyone who has 
>> the patience to wade through my code!
>>
>> This is basically a follow-up to a question on stack overflow 
>> <http://stackoverflow.com/questions/33855799/getting-data-for-d3-from-arangodb-using-aql-or-arangojs>.
>>  
>> That question got a very helpful answer, but now I am trying to do a bit 
>> more with the query and my AQL has gotten very long and slow. I've also 
>> tried building a Foxx service (using the suggestion in the SO answer as a 
>> starting point), but I'm finding Foxx pretty confusing and awkward to use. 
>> (I don't understand how/where to view console.log statements; I can't edit 
>> the foxx data.js files from WebStorm because ArangoDB is installed as an 
>> admin; etc...)
>>
>> Other than simply "how can I simplify these queries?", my two main 
>> questions are:
>> - When I return a document I want to return a lot of data that exists as 
>> the graph structure itself (e.g. lists of edges or neighbor nodes), but 
>> doing this requires subqueries that not only make the query slow, they also 
>> make the query itself long and unwieldy. (50+ lines in some cases) Is there 
>> a good way to modularize this code using custom functions or something 
>> else? Or am I simply asking unreasonable things of the database?
>>
>> - I am trying to set up a system for paging based on sorting by the 
>> temporary graph data that is returned for each node, e.g. the query will 
>> return the 30 neighbors that each in turn have the most edges. The graph 
>> structure here makes this particularly tricky because I would like to be 
>> able to do queries of depth > 1. I think I have it working in the Foxx 
>> example below but not the AQL version. Is there possibly a better way to do 
>> this?
>>
>> To recap, I'm trying to use ArangoDB as a backend for a d3 force-directed 
>> diagram display, and I need data in something resembling the following 
>> format:
>>
>> var nodes = [
>>               {
>>                 vertexData: {...document properties...},
>>                 tempEdgeData: {...vertex properties constructed from 
>> graph...},
>>                 inEdges: [...array of vertex ids...],
>>                 outEdges: [...array of vertex ids...]
>>               },
>>               {vertexData, tempEdgeData, inEdges, outEdges},
>>               {vertexData, tempEdgeData, inEdges, outEdges},
>>             ],
>>     links = [
>>               {source: nodes[0], target: nodes[1]},
>>               {source: nodes[1], target: nodes[2]}
>>             ]
>>
>> I am also returning a 'progress' object to facilitate paging, which I've 
>> gotten working in the Foxx query but not the AQL version.
>>
>> The AQL query I've come up with at this point is something like this:
>>  
>> LET docId = 'DocumentA/380945'
>> LET limiter = 30
>> LET limitOffset = 0
>> LET edgesTmp = GRAPH_EDGES('combinedGraph',docId,{direction:'any',
>> maxDepth:1,includeData:true}) 
>> LET verticesTmp = (
>>   FOR v IN GRAPH_NEIGHBORS('combinedGraph', docId, {direction: 'any', 
>> maxDepth:1, includeData: true})
>>     LET evidence = {evidence: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, 
>> {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: 
>> n.name, _id: n._id})}
>>     LET arguments = {supporting: (FOR n IN GRAPH_NEIGHBORS('type1Graph', 
>> v, {direction: 'inbound', edgeExamples: {type: 'supports'}, 
>> vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: 
>> true}) RETURN {name: n.name, _id: n._id}),
>>       supported: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, {direction: 
>> 'outbound', edgeExamples: {type: 'supports'}, vertexCollectionRestriction: 
>> 'DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: 
>> n._id}), 
>>       contradicting: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, 
>> {direction: 'inbound', edgeExamples: {type: 'contradicts'}, 
>> vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: 
>> true}) RETURN {name: n.name, _id: n._id}), 
>>       contradicted: (FOR n IN GRAPH_NEIGHBORS('type1Graph', v, 
>> {direction: 'outbound', edgeExamples: {type: 'contradicts'}, 
>> vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: 
>> true}) RETURN {name: n.name, _id: n._id})}
>>     LET tags = {tags: (FOR n IN GRAPH_NEIGHBORS('tagLinkGraph', v, 
>> {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: 
>> n.name, code: n.code, _id: n._id})} 
>>     RETURN {
>>       vertexData: v, 
>>       tempEdgeData: {tags, arguments, evidence}, 
>>       outEdges: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 
>> 'outbound', maxDepth: 1, includeData: false}),
>>       inEdges: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 'inbound', 
>> maxDepth: 1, includeData: false}),
>>     })
>> LET evidence = {evidence: (FOR n IN GRAPH_NEIGHBORS('type1Graph', docId, 
>> {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: 
>> n.name, _id: n._id})}
>> LET arguments = {supporting: (FOR n IN GRAPH_NEIGHBORS('type1Graph', 
>> docId, {direction: 'inbound', edgeExamples: {type: 'supports'}, 
>> vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: 
>> true}) RETURN {name: n.name, _id: n._id}),
>>   supported: (FOR n IN GRAPH_NEIGHBORS('type1Graph', docId, {direction: 
>> 'outbound', edgeExamples: {type: 'supports'}, vertexCollectionRestriction: '
>> DocumentB', maxDepth: 1, includeData: true}) RETURN {name: n.name, _id: 
>> n._id}), 
>>   contradicting: (FOR n IN GRAPH_NEIGHBORS('type1Graph', docId, 
>> {direction: 'inbound', edgeExamples: {type: 'contradicts'}, 
>> vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: 
>> true}) RETURN {name: n.name, _id: n._id}), 
>>   contradicted: (FOR n IN GRAPH_NEIGHBORS('type1Graph', docId, 
>> {direction: 'outbound', edgeExamples: {type: 'contradicts'}, 
>> vertexCollectionRestriction: 'DocumentB', maxDepth: 1, includeData: 
>> true}) RETURN {name: n.name, _id: n._id})}
>>     LET tags = {tags: (FOR n IN GRAPH_NEIGHBORS('tagLinkGraph', docId, 
>> {direction: 'inbound', maxDepth: 1, includeData: true}) RETURN {name: 
>> n.name, code: n.code, _id: n._id})} 
>>     RETURN {
>>       vertexData: DOCUMENT(docId), 
>>       tempEdgeData: {tags, arguments, evidence}, 
>>       outEdges: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 
>> 'outbound', maxDepth: 1, includeData: false}),
>>       inEdges: GRAPH_NEIGHBORS('combinedGraph', v, {direction: 'inbound', 
>> maxDepth: 1, includeData: false}),
>>     })
>> LET vertices = (FOR v IN verticesTmp2 
>>   SORT v.distFromOrigin ASC, LENGTH(v.inEdges) DESC 
>>   LIMIT 0, 30
>>   RETURN v) 
>> LET progress = {
>>   'totalVertices': LENGTH(verticesTmp2), 
>>   'currentVertex': limitOffset + limiter, 
>>   'limit': limiter, 
>>   'limitOffset': limitOffset, 
>>   'currentQuantityReturned': MIN([limiter, LENGTH(vertices)])
>> }
>> LET edges = (FOR edge IN edgesTmp FILTER POSITION(vertices[*].vertexData._id 
>> ,edge._to, false) && POSITION(vertices[*].vertexData._id ,edge._from, 
>> false)  RETURN edge) 
>> RETURN { edges, vertices, progress }
>>
>> Alternatively, the Foxx query I've gotten working is faster, but I don't 
>> understand Foxx very well and I'm having trouble converting this query into 
>>
>> var _ = require('underscore');
>> var joi = require('joi');
>> var Foxx = require('org/arangodb/foxx');
>> var ArangoError = require('org/arangodb').ArangoError;
>> var db = require('org/arangodb').db;
>> var console = require('console');
>>
>> var Data = require('../repositories/data');
>> var Datum = require('../models/datum');
>> var controller = new Foxx.Controller(applicationContext);
>>
>> var datumIdSchema = joi.string().required()
>>     .description('The id of the datum')
>>     .meta({allowMultiple: false});
>>
>> var data = new Data(
>>     applicationContext.collection('data'),
>>     {model: Datum}
>>
>> controller.get('/neighbors', function(req, res) {
>>     var fullId = req.parameters.fullId;
>>     var id = req.parameters.id;
>>     var originVertexName = req.parameters.originVertexName;
>>     var neighborVertexRestriction = JSON.parse(req.parameters.
>> neighborVertexRestriction); //this is an array of collection names
>>     var maxDepth = Number(req.parameters.maxDepth) + 1;
>>     var limitOffset = Number(req.parameters.limitOffset) || 0;
>>     var limit = Number(req.parameters.limit) || 30;
>>     var graphName = req.parameters.graphName;
>>     var progress = {}
>>
>>     var traversal = require("org/arangodb/graph/traversal");
>>     var result = {
>>         edges: [],
>>         vertices: {}
>>     };
>>     var myVisitor = function (config, result, vertex, path, connected) {
>>         var currentDepth = path.edges.length;
>>         if(currentDepth === 0) { //if we are on the origin vertex
>>             if (!result.vertices.hasOwnProperty(vertex._id)) {
>>                 // If we visit a vertex, we store its data and prepare 
>> out/in
>>                 result.vertices[vertex._id] = {
>>                     vertexData: vertex,
>>                     tempVertexData: {
>>                         evidence: [],
>>                         argument: {supporting: [], supported: [], 
>> contradicting: [], contradicted: []},
>>                         tags: [],
>>                         distFromQueried: currentDepth
>>                     },
>>                     outEdges: [],
>>                     inEdges: []
>>                 };
>>             }
>>         } else { //for all vertices in addition to the origin vertex
>>             var e = path.edges[currentDepth-1]; //get the edge data for 
>> the edge immediately leading to this node en route from the original node
>>             e._relativeDepth = currentDepth;
>>             if(currentDepth < maxDepth) {
>>                 result.edges.push(e);
>>                 if (!result.vertices.hasOwnProperty(vertex._id) && _.
>> contains(neighborVertexRestriction, vertex._id.split('/')[0])) {
>>                     // If we visit a vertex, we store its data and 
>> prepare out/in
>>                     result.vertices[vertex._id] = {
>>                         vertexData: vertex,
>>                         tempVertexData: {
>>                             evidence: [],
>>                             argument: {supporting: [], supported: [], 
>> contradicting: [], contradicted: []},
>>                             tags: [],
>>                             distFromQueried: currentDepth
>>                         },
>>                         outEdges: [],
>>                         inEdges: []
>>                     };
>>                 }
>>             }
>>
>>             // add the edge data to the vertices' tempVertexData property
>>             if (e._id.split('/')[0] === "edgeType1" && vertex._id.split(
>> '/')[0] === 'DocumentA') {
>>                 result.vertices[e._to].tempVertexData.evidence.push({name
>> : vertex.name, _id: vertex._id})
>>             }
>>             if (e._id.split('/')[0] === "edgeType1" && vertex._id.split(
>> '/')[0] === 'DocumentB' && e.type === "supports") {
>>                 result.vertices[e._to].tempVertexData.argument.supporting
>> .push({name: vertex.name, _id: vertex._id});
>>                 result.vertices[e._from].tempVertexData.argument.
>> supported.push({name: vertex.name, _id: vertex._id})
>>             }
>>             if (e._id.split('/')[0] === "edgeType1" && vertex._id.split(
>> '/')[0] === 'DocumentB' && e.type === "contradicts") {
>>                 result.vertices[e._to].tempVertexData.argument.
>> contradicting.push({name: vertex.name, _id: vertex._id})
>>                 result.vertices[e._from].tempVertexData.argument.
>> contradicted.push({
>>                     name: vertex.name,
>>                     _id: vertex._id
>>                 })
>>             }
>>             if (e._id.split('/')[0] === "edgeType2" && vertex._id.split(
>> '/')[0] === 'DocumentA') {
>>                 result.vertices[e._to].tempVertexData.tags.push({name: 
>> vertex.name, _id: vertex._id})
>>             }
>>
>>             // push in and out edges to lists on each vertex
>>             if (result.vertices.hasOwnProperty(e._from)) {
>>                 result.vertices[e._from].outEdges.push(e._to);
>>             }
>>             if (result.vertices.hasOwnProperty(e._to)) {
>>                 result.vertices[e._to].inEdges.push(e._from);
>>             }
>>         }
>>     };
>>     var config = {
>>         datasource: traversal.generalGraphDatasourceFactory(
>> 'combinedGraph'),
>>         strategy: "breadthfirst",
>>         order: "preorder",
>>         visitor: myVisitor,
>>         expander: traversal.anyExpander,
>>         minDepth: 0,
>>         maxDepth: maxDepth
>>     };
>>     var traverser = new traversal.Traverser(config);
>>     var slicedVertices = [],
>>         extraVertices1 = [],
>>         extraVertices2 = [],
>>         filteredEdges = [],
>>         filterList = [];
>>     traverser.traverse(result, {_id: fullId});
>>     progress.totalVertices = Object.keys(result.vertices).length;
>>     result.vertices[fullId].vertexData = db._document(fullId);
>>
>>     //sort by inEdges length and then distance from the queried node, so 
>> that the queried node is always first
>>     var sortedVertices = _.chain(result.vertices).sortBy(function(v) {
>> return v.inEdges.length}).reverse().sortBy(function(v) {return v.
>> tempVertexData.distFromQueried}).value(); 
>>     if(limitOffset===0) {
>>         result.vertices = sortedVertices.slice(limitOffset, limit + 
>> limitOffset);
>>         filterList = result.vertices.map(function(v){return v.vertexData.
>> _id}); //get list of the vertices to be returned in order to filter 
>> edges with
>>         result.edges = _.filter(result.edges, function(e){return (_.
>> contains(filterList, e._to) && _.contains(filterList, e._from))})
>>     } else {
>>         slicedVertices = sortedVertices.slice(limitOffset, limit + 
>> limitOffset);
>>         filterList = slicedVertices.map(function(v){return v.vertexData.
>> _id}); //get list of new vertices to be returned
>>         filteredEdges = _.filter(result.edges, function(e){return (_.
>> contains(filterList, e._to) || _.contains(filterList, e._from)) && 
>> e._relativeDepth 
>> < maxDepth}) //get edges that border the new vertices
>>         extraVertices1 = filteredEdges.map(function(e){return result.
>> vertices[e._from]});
>>         extraVertices2 = filteredEdges.map(function(e){return result.
>> vertices[e._to]});
>>         slicedVertices = _.uniq(slicedVertices.concat(extraVertices1, 
>> extraVertices2));
>>         filterList = slicedVertices.map(function(v){return v.vertexData.
>> _id}); //get list of the vertices to be returned in order to filter edges
>>         filteredEdges = _.filter(result.edges, function(e){return (_.
>> contains(filterList, e._to) && _.contains(filterList, e._from))});
>>         result.vertices = slicedVertices;
>>         result.edges = filteredEdges;
>>     }
>>
>>     progress.currentVertex = limitOffset + limit;
>>     progress.limit = limit;
>>     progress.limitOffset = limitOffset;
>>     progress.currentQuantityReturned = Math.min(limit, progress.
>> totalVertices);
>>     result.progress = progress;
>>     res.json(result);
>> }).errorResponse(ArangoError, 404, 'The datum could not be found');
>>
>> Let me know if I can clarify anything.
>>
>> -Ben
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: AQL / Foxx queries are long and messy - how to shorten?

Reply via email to