[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-6378:

Labels: bulk-closed  (was: )

> srcAttr in graph.triplets don't update when the size of graph is huge
> -
>
> Key: SPARK-6378
> URL: https://issues.apache.org/jira/browse/SPARK-6378
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 1.2.1
>Reporter: zhangzhenyue
>Priority: Major
>  Labels: bulk-closed
> Attachments: TripletsViewDonotUpdate.scala
>
>
> when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the 
> srcAttr and dstAttr in graph.triplets don't update when using the 
> Graph.outerJoinVertices(when the data in vertex is changed).
> the code and the log is as follows:
> {quote}
> g = graph.outerJoinVertices()...
> g,vertices,count()
> g.edges.count()
> println("example edge " + g.triplets.filter(e => e.srcId == 
> 51L).collect()
>   .map(e =>(e.srcId + ":" + e.srcAttr + ", " + e.dstId + ":" + 
> e.dstAttr)).mkString("\n"))
> println("example vertex " + g.vertices.filter(e => e._1 == 
> 51L).collect()
>   .map(e => (e._1 + "," + e._2)).mkString("\n"))
> {quote}
> the result:
> {quote}
> example edge 51:0, 2467451620:61
> 51:0, 1962741310:83 // attr of vertex 51 is 0 in 
> Graph.triplets
> example vertex 51,2 // attr of vertex 51 is 2 in 
> Graph.vertices
> {quote}
> when the graph is smaller(10 million vertex), the code is OK, the triplets 
> will update when the vertex is changed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge

2016-03-13 Thread Zhaokang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaokang Wang updated SPARK-6378:
-
Attachment: TripletsViewDonotUpdate.scala

I have met a similar problem with triplets update in GraphX.
I think I have a code demo that can reproduce the situation of this issue.
I reproduce the issue on a small toy graph with only 3 vertices. My demo code 
has been attached as [^TripletsViewDonotUpdate.scala].

Let me describe the steps to reproduce the issue:
1. We have constructed a small graph ({{purGraph}} in the code) with only 3 
vertices. The edges of the graph are: 2->1, 3->1, 2->3.
2. Conduct the collect neighbors operation to get the {{inNeighborGraph}} of 
the  {{purGraph}}.
3. Outer join the  {{inNeighborGraph}} vertices on {{purGraph}} to get the 
{{dataGraph}}. In {{dataGraph}}, each vertex will store an ArrayBuffer of its 
in neighbors' vertex id list.
4. Now we can examine the {{inNeighbor}} attribute in {{dataGraph.vertices}} 
view and {{dataGraph.triplets}} view. We can see from the output that the two 
views are inconsistent on vertex 3's {{inNeighbor}} property:

{quote}
> dataGraph.vertices
vid: 1, inNeighbor:2,3
vid: 3, inNeighbor:2
vid: 2, inNeighbor:
> dataGraph.triplets.srcAttr
vid: 2, inNeighbor:
vid: 2, inNeighbor:
vid: 3, inNeighbor:
{quote}

5. If we comment the {{purGraph.triplets.count()}} statement in the code, the 
bug will disappear:
{code}
val purGraph = Graph(dataVertex, dataEdge).persist()
  // purGraph.triplets.count() // !!!comment this
val inNeighborGraph = purGraph.collectNeighbors(EdgeDirection.In)
// Now join the in neighbor vertex id list to every vertex's property
val dataGraph = purGraph.outerJoinVertices(inNeighborGraph)((vid, property, 
inNeighborList) => {
  val inNeighborVertexIds = inNeighborList.getOrElse(Array[(VertexId, 
VertexProperty)]()).map(t => t._1)
  property.inNeighbor ++= inNeighborVertexIds.toBuffer
  property
})
{code}

It seems that the triplets view and the vertex view of the same graph may be 
inconsistent in some situation.

> srcAttr in graph.triplets don't update when the size of graph is huge
> -
>
> Key: SPARK-6378
> URL: https://issues.apache.org/jira/browse/SPARK-6378
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 1.2.1
>Reporter: zhangzhenyue
> Attachments: TripletsViewDonotUpdate.scala
>
>
> when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the 
> srcAttr and dstAttr in graph.triplets don't update when using the 
> Graph.outerJoinVertices(when the data in vertex is changed).
> the code and the log is as follows:
> {quote}
> g = graph.outerJoinVertices()...
> g,vertices,count()
> g.edges.count()
> println("example edge " + g.triplets.filter(e => e.srcId == 
> 51L).collect()
>   .map(e =>(e.srcId + ":" + e.srcAttr + ", " + e.dstId + ":" + 
> e.dstAttr)).mkString("\n"))
> println("example vertex " + g.vertices.filter(e => e._1 == 
> 51L).collect()
>   .map(e => (e._1 + "," + e._2)).mkString("\n"))
> {quote}
> the result:
> {quote}
> example edge 51:0, 2467451620:61
> 51:0, 1962741310:83 // attr of vertex 51 is 0 in 
> Graph.triplets
> example vertex 51,2 // attr of vertex 51 is 2 in 
> Graph.vertices
> {quote}
> when the graph is smaller(10 million vertex), the code is OK, the triplets 
> will update when the vertex is changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge

2015-06-19 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6378:
-
Target Version/s:   (was: 1.4.0)

 srcAttr in graph.triplets don't update when the size of graph is huge
 -

 Key: SPARK-6378
 URL: https://issues.apache.org/jira/browse/SPARK-6378
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.2.1
Reporter: zhangzhenyue

 when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the 
 srcAttr and dstAttr in graph.triplets don't update when using the 
 Graph.outerJoinVertices(when the data in vertex is changed).
 the code and the log is as follows:
 {quote}
 g = graph.outerJoinVertices()...
 g,vertices,count()
 g.edges.count()
 println(example edge  + g.triplets.filter(e = e.srcId == 
 51L).collect()
   .map(e =(e.srcId + : + e.srcAttr + ,  + e.dstId + : + 
 e.dstAttr)).mkString(\n))
 println(example vertex  + g.vertices.filter(e = e._1 == 
 51L).collect()
   .map(e = (e._1 + , + e._2)).mkString(\n))
 {quote}
 the result:
 {quote}
 example edge 51:0, 2467451620:61
 51:0, 1962741310:83 // attr of vertex 51 is 0 in 
 Graph.triplets
 example vertex 51,2 // attr of vertex 51 is 2 in 
 Graph.vertices
 {quote}
 when the graph is smaller(10 million vertex), the code is OK, the triplets 
 will update when the vertex is changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6378) srcAttr in graph.triplets don't update when the size of graph is huge

2015-05-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6378:
-
Target Version/s: 1.4.0  (was: 1.3.1, 1.4.0)

 srcAttr in graph.triplets don't update when the size of graph is huge
 -

 Key: SPARK-6378
 URL: https://issues.apache.org/jira/browse/SPARK-6378
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.2.1
Reporter: zhangzhenyue

 when the size of the graph is huge(0.2 billion vertex, 6 billion edges), the 
 srcAttr and dstAttr in graph.triplets don't update when using the 
 Graph.outerJoinVertices(when the data in vertex is changed).
 the code and the log is as follows:
 {quote}
 g = graph.outerJoinVertices()...
 g,vertices,count()
 g.edges.count()
 println(example edge  + g.triplets.filter(e = e.srcId == 
 51L).collect()
   .map(e =(e.srcId + : + e.srcAttr + ,  + e.dstId + : + 
 e.dstAttr)).mkString(\n))
 println(example vertex  + g.vertices.filter(e = e._1 == 
 51L).collect()
   .map(e = (e._1 + , + e._2)).mkString(\n))
 {quote}
 the result:
 {quote}
 example edge 51:0, 2467451620:61
 51:0, 1962741310:83 // attr of vertex 51 is 0 in 
 Graph.triplets
 example vertex 51,2 // attr of vertex 51 is 2 in 
 Graph.vertices
 {quote}
 when the graph is smaller(10 million vertex), the code is OK, the triplets 
 will update when the vertex is changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org