[jira] [Commented] (SPARK-10945) GraphX computes Pagerank with NaN (with some datasets)

Sean Owen (JIRA) Wed, 07 Oct 2015 11:09:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947289#comment-14947289
 ]


Sean Owen commented on SPARK-10945:
-----------------------------------

My guess is there is possibly some issue in ...

{code}
  def upgrade(vertices: VertexRDD[VD], includeSrc: Boolean, includeDst: 
Boolean) {
    val shipSrc = includeSrc && !hasSrcId
    val shipDst = includeDst && !hasDstId
    if (shipSrc || shipDst) {
      val shippedVerts: RDD[(Int, VertexAttributeBlock[VD])] =
        vertices.shipVertexAttributes(shipSrc, shipDst)
          .setName("ReplicatedVertexView.upgrade(%s, %s) - shippedVerts %s %s 
(broadcast)".format(
            includeSrc, includeDst, shipSrc, shipDst))
          .partitionBy(edges.partitioner.get)
      val newEdges = 
edges.withPartitionsRDD(edges.partitionsRDD.zipPartitions(shippedVerts) {
        (ePartIter, shippedVertsIter) => ePartIter.map {
          case (pid, edgePartition) =>
            (pid, 
edgePartition.updateVertices(shippedVertsIter.flatMap(_._2.iterator)))
        }
      })
      edges = newEdges
      hasSrcId = includeSrc
      hasDstId = includeDst
    }
  }
{code}

but still can't see exactly what. Maybe somehow the partitioning causes the src 
vertex data to fail to get joined in the new edge representation? This is 
beyond my knowledge now, so hopefully [~ankurd] can weigh in since he created 
the code in question, to confirm or deny that there is something funny here.

> GraphX computes Pagerank with NaN (with some datasets)
> ------------------------------------------------------
>
>                 Key: SPARK-10945
>                 URL: https://issues.apache.org/jira/browse/SPARK-10945
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.3.0
>         Environment: Linux
>            Reporter: Khaled Ammar
>              Labels: test
>
> Hi,
> I run GraphX in a medium size standalone Spark 1.3.0 installation. The 
> pagerank typically works fine, except with one dataset (Twitter: 
> http://law.di.unimi.it/webdata/twitter-2010). This is a public dataset that 
> is commonly used in research papers.
> I found that many vertices have an NaN values. This is true, even if the 
> algorithm run for 1 iteration only.  
> Thanks,
> -Khaled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10945) GraphX computes Pagerank with NaN (with some datasets)

Reply via email to