I have a question: *How could the attributes of triplets of a graph get update after mapVertices() func? *
My code ``` // Initial the graph, assign a counter to each vertex that contains the vertex id only var anfGraph = graph.mapVertices { case (vid, _) => val counter = new HyperLogLog(5) counter.offer(vid) counter } val nullVertex = anfGraph.triplets.filter(edge => edge.srcAttr == null).first anfGraph.vertices.filter(_._1 == nullVertex).first // I could see that the vertex has a not null attribute // messages = anfGraph.aggregateMessages(msgFun, mergeMessage) // <- NullPointerException ``` I could found that some vertex attributes in some triplets are null, but not all. Alcaid 2015-02-13 14:50 GMT+08:00 Reynold Xin <r...@databricks.com>: > Then maybe you actually had a null in your vertex attribute? > > > On Thu, Feb 12, 2015 at 10:47 PM, James <alcaid1...@gmail.com> wrote: > >> I changed the mapReduceTriplets() func to aggregateMessages(), but it >> still failed. >> >> >> 2015-02-13 6:52 GMT+08:00 Reynold Xin <r...@databricks.com>: >> >>> Can you use the new aggregateNeighbors method? I suspect the null is >>> coming from "automatic join elimination", which detects bytecode to see if >>> you need the src or dst vertex data. Occasionally it can fail to detect. In >>> the new aggregateNeighbors API, the caller needs to explicitly specifying >>> that, making it more robust. >>> >>> >>> On Thu, Feb 12, 2015 at 6:26 AM, James <alcaid1...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> When I am running the code on a much bigger size graph, I met >>>> NullPointerException. >>>> >>>> I found that is because the sendMessage() function receive a triplet >>>> that >>>> edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen >>>> as I >>>> am sure every vertices have a attr. >>>> >>>> Any returns is appreciated. >>>> >>>> Alcaid >>>> >>>> >>>> 2015-02-11 19:30 GMT+08:00 James <alcaid1...@gmail.com>: >>>> >>>> > Hello, >>>> > >>>> > Recently I am trying to estimate the average distance of a big graph >>>> > using spark with the help of [HyperAnf]( >>>> > http://dl.acm.org/citation.cfm?id=1963493). >>>> > >>>> > It works like Connect Componenet algorithm, while the attribute of a >>>> > vertex is a HyperLogLog counter that at k-th iteration it estimates >>>> the >>>> > number of vertices it could reaches less than k hops. >>>> > >>>> > I have successfully run the code on a graph with 20M vertices. But I >>>> still >>>> > need help: >>>> > >>>> > >>>> > *I think the code could work more efficiently especially the "Send >>>> > message" function, but I am not sure about what will happen if a >>>> vertex >>>> > receive no message at a iteration.* >>>> > >>>> > Here is my code: https://github.com/alcaid1801/Erdos >>>> > >>>> > Any returns is appreciated. >>>> > >>>> >>> >>> >> >