[ https://issues.apache.org/jira/browse/SPARK-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947289#comment-14947289 ]
Sean Owen commented on SPARK-10945: ----------------------------------- My guess is there is possibly some issue in ... {code} def upgrade(vertices: VertexRDD[VD], includeSrc: Boolean, includeDst: Boolean) { val shipSrc = includeSrc && !hasSrcId val shipDst = includeDst && !hasDstId if (shipSrc || shipDst) { val shippedVerts: RDD[(Int, VertexAttributeBlock[VD])] = vertices.shipVertexAttributes(shipSrc, shipDst) .setName("ReplicatedVertexView.upgrade(%s, %s) - shippedVerts %s %s (broadcast)".format( includeSrc, includeDst, shipSrc, shipDst)) .partitionBy(edges.partitioner.get) val newEdges = edges.withPartitionsRDD(edges.partitionsRDD.zipPartitions(shippedVerts) { (ePartIter, shippedVertsIter) => ePartIter.map { case (pid, edgePartition) => (pid, edgePartition.updateVertices(shippedVertsIter.flatMap(_._2.iterator))) } }) edges = newEdges hasSrcId = includeSrc hasDstId = includeDst } } {code} but still can't see exactly what. Maybe somehow the partitioning causes the src vertex data to fail to get joined in the new edge representation? This is beyond my knowledge now, so hopefully [~ankurd] can weigh in since he created the code in question, to confirm or deny that there is something funny here. > GraphX computes Pagerank with NaN (with some datasets) > ------------------------------------------------------ > > Key: SPARK-10945 > URL: https://issues.apache.org/jira/browse/SPARK-10945 > Project: Spark > Issue Type: Bug > Components: GraphX > Affects Versions: 1.3.0 > Environment: Linux > Reporter: Khaled Ammar > Labels: test > > Hi, > I run GraphX in a medium size standalone Spark 1.3.0 installation. The > pagerank typically works fine, except with one dataset (Twitter: > http://law.di.unimi.it/webdata/twitter-2010). This is a public dataset that > is commonly used in research papers. > I found that many vertices have an NaN values. This is true, even if the > algorithm run for 1 iteration only. > Thanks, > -Khaled -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org