[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge
[ https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-6378: Labels: bulk-closed (was: ) > srcAttr in graph.triplets don't update when the size of graph is huge > - > > Key: SPARK-6378 > URL: https://issues.apache.org/jira/browse/SPARK-6378 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.2.1 >Reporter: zhangzhenyue >Priority: Major > Labels: bulk-closed > Attachments: TripletsViewDonotUpdate.scala > > > when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the > srcAttr and dstAttr in graph.triplets don't update when using the > Graph.outerJoinVertices(when the data in vertex is changed). > the code and the log is as follows: > {quote} > g = graph.outerJoinVertices()... > g,vertices,count() > g.edges.count() > println("example edge " + g.triplets.filter(e => e.srcId == > 51L).collect() > .map(e =>(e.srcId + ":" + e.srcAttr + ", " + e.dstId + ":" + > e.dstAttr)).mkString("\n")) > println("example vertex " + g.vertices.filter(e => e._1 == > 51L).collect() > .map(e => (e._1 + "," + e._2)).mkString("\n")) > {quote} > the result: > {quote} > example edge 51:0, 2467451620:61 > 51:0, 1962741310:83 // attr of vertex 51 is 0 in > Graph.triplets > example vertex 51,2 // attr of vertex 51 is 2 in > Graph.vertices > {quote} > when the graph is smaller(10 million vertex), the code is OK, the triplets > will update when the vertex is changed -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge
[ https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaokang Wang updated SPARK-6378: - Attachment: TripletsViewDonotUpdate.scala I have met a similar problem with triplets update in GraphX. I think I have a code demo that can reproduce the situation of this issue. I reproduce the issue on a small toy graph with only 3 vertices. My demo code has been attached as [^TripletsViewDonotUpdate.scala]. Let me describe the steps to reproduce the issue: 1. We have constructed a small graph ({{purGraph}} in the code) with only 3 vertices. The edges of the graph are: 2->1, 3->1, 2->3. 2. Conduct the collect neighbors operation to get the {{inNeighborGraph}} of the {{purGraph}}. 3. Outer join the {{inNeighborGraph}} vertices on {{purGraph}} to get the {{dataGraph}}. In {{dataGraph}}, each vertex will store an ArrayBuffer of its in neighbors' vertex id list. 4. Now we can examine the {{inNeighbor}} attribute in {{dataGraph.vertices}} view and {{dataGraph.triplets}} view. We can see from the output that the two views are inconsistent on vertex 3's {{inNeighbor}} property: {quote} > dataGraph.vertices vid: 1, inNeighbor:2,3 vid: 3, inNeighbor:2 vid: 2, inNeighbor: > dataGraph.triplets.srcAttr vid: 2, inNeighbor: vid: 2, inNeighbor: vid: 3, inNeighbor: {quote} 5. If we comment the {{purGraph.triplets.count()}} statement in the code, the bug will disappear: {code} val purGraph = Graph(dataVertex, dataEdge).persist() // purGraph.triplets.count() // !!!comment this val inNeighborGraph = purGraph.collectNeighbors(EdgeDirection.In) // Now join the in neighbor vertex id list to every vertex's property val dataGraph = purGraph.outerJoinVertices(inNeighborGraph)((vid, property, inNeighborList) => { val inNeighborVertexIds = inNeighborList.getOrElse(Array[(VertexId, VertexProperty)]()).map(t => t._1) property.inNeighbor ++= inNeighborVertexIds.toBuffer property }) {code} It seems that the triplets view and the vertex view of the same graph may be inconsistent in some situation. > srcAttr in graph.triplets don't update when the size of graph is huge > - > > Key: SPARK-6378 > URL: https://issues.apache.org/jira/browse/SPARK-6378 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.2.1 >Reporter: zhangzhenyue > Attachments: TripletsViewDonotUpdate.scala > > > when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the > srcAttr and dstAttr in graph.triplets don't update when using the > Graph.outerJoinVertices(when the data in vertex is changed). > the code and the log is as follows: > {quote} > g = graph.outerJoinVertices()... > g,vertices,count() > g.edges.count() > println("example edge " + g.triplets.filter(e => e.srcId == > 51L).collect() > .map(e =>(e.srcId + ":" + e.srcAttr + ", " + e.dstId + ":" + > e.dstAttr)).mkString("\n")) > println("example vertex " + g.vertices.filter(e => e._1 == > 51L).collect() > .map(e => (e._1 + "," + e._2)).mkString("\n")) > {quote} > the result: > {quote} > example edge 51:0, 2467451620:61 > 51:0, 1962741310:83 // attr of vertex 51 is 0 in > Graph.triplets > example vertex 51,2 // attr of vertex 51 is 2 in > Graph.vertices > {quote} > when the graph is smaller(10 million vertex), the code is OK, the triplets > will update when the vertex is changed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge
[ https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-6378: - Target Version/s: (was: 1.4.0) srcAttr in graph.triplets don't update when the size of graph is huge - Key: SPARK-6378 URL: https://issues.apache.org/jira/browse/SPARK-6378 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.2.1 Reporter: zhangzhenyue when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the srcAttr and dstAttr in graph.triplets don't update when using the Graph.outerJoinVertices(when the data in vertex is changed). the code and the log is as follows: {quote} g = graph.outerJoinVertices()... g,vertices,count() g.edges.count() println(example edge + g.triplets.filter(e = e.srcId == 51L).collect() .map(e =(e.srcId + : + e.srcAttr + , + e.dstId + : + e.dstAttr)).mkString(\n)) println(example vertex + g.vertices.filter(e = e._1 == 51L).collect() .map(e = (e._1 + , + e._2)).mkString(\n)) {quote} the result: {quote} example edge 51:0, 2467451620:61 51:0, 1962741310:83 // attr of vertex 51 is 0 in Graph.triplets example vertex 51,2 // attr of vertex 51 is 2 in Graph.vertices {quote} when the graph is smaller(10 million vertex), the code is OK, the triplets will update when the vertex is changed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge
[ https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-6378: - Target Version/s: 1.4.0 (was: 1.3.1, 1.4.0) srcAttr in graph.triplets don't update when the size of graph is huge - Key: SPARK-6378 URL: https://issues.apache.org/jira/browse/SPARK-6378 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.2.1 Reporter: zhangzhenyue when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the srcAttr and dstAttr in graph.triplets don't update when using the Graph.outerJoinVertices(when the data in vertex is changed). the code and the log is as follows: {quote} g = graph.outerJoinVertices()... g,vertices,count() g.edges.count() println(example edge + g.triplets.filter(e = e.srcId == 51L).collect() .map(e =(e.srcId + : + e.srcAttr + , + e.dstId + : + e.dstAttr)).mkString(\n)) println(example vertex + g.vertices.filter(e = e._1 == 51L).collect() .map(e = (e._1 + , + e._2)).mkString(\n)) {quote} the result: {quote} example edge 51:0, 2467451620:61 51:0, 1962741310:83 // attr of vertex 51 is 0 in Graph.triplets example vertex 51,2 // attr of vertex 51 is 2 in Graph.vertices {quote} when the graph is smaller(10 million vertex), the code is OK, the triplets will update when the vertex is changed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org