Re: How to set persistence level of graph in GraphX in spark 1.0.0

2014-10-28 Thread Yifan LI
Hi Arpit,

To try this:

val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions = 
numPartitions, edgeStorageLevel = StorageLevel.MEMORY_AND_DISK,
 vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)


Best,
Yifan LI




On 28 Oct 2014, at 11:17, Arpit Kumar arp8...@gmail.com wrote:

 Any help regarding this issue please?
 
 Regards,
 Arpit
 
 On Sat, Oct 25, 2014 at 8:56 AM, Arpit Kumar arp8...@gmail.com wrote:
 Hi all,
 I am using the GrpahLoader class to load graphs from edge list files. But 
 then I need to change the storage level of the graph to some other thing than 
 MEMORY_ONLY.
 
 val graph = GraphLoader.edgeListFile(sc, fname,
   minEdgePartitions = 
 numEPart).persist(StorageLevel.MEMORY_AND_DISK_SER)
 
 The error I am getting while executing this is:
 Exception in thread main java.lang.UnsupportedOperationException: Cannot 
 change storage level of an RDD after it was already assigned a level
 
 
 Then I looked into the GraphLoader class. I know that in the latest version 
 of spark support for setting persistence level is provided in this class. 
 Please suggest a workaround for spark 1.0.0 as I do not have the option to 
 shift to latest release.
 
 Note: I tried copying the GraphLoader class to my package as GraphLoader1 
 importing 
 
 package com.cloudera.xyz
 
 import org.apache.spark.storage.StorageLevel
 import org.apache.spark.graphx._
 import org.apache.spark.{Logging, SparkContext}
 import org.apache.spark.graphx.impl._
 
 and then changing the persistence level to my suitability as 
 .persist(gStorageLevel) instead of .cache()
 
 But while compiling I am getting the following errors
 
 GraphLoader1.scala:49: error: class EdgePartitionBuilder in package impl 
 cannot be accessed in package org.apache.spark.graphx.impl
 [INFO]   val builder = new EdgePartitionBuilder[Int, Int]
 
 I am also attaching the file with the mail. Maybe this way of doing thing is 
 not possible.
 
 
 Please suggest some workarounds so that I can set persistence level of my 
 graph to MEMORY_AND_DISK_SER for the graph I read from edge file list
 
 
 
 -- 
 Arpit Kumar
 Fourth Year Undergraduate
 Department of Computer Science and Engineering
 Indian Institute of Technology, Kharagpur



Re: How to set persistence level of graph in GraphX in spark 1.0.0

2014-10-28 Thread Arpit Kumar
Hi Yifan LI,
I am currently working on Spark 1.0 in which we can't pass edgeStorageLevel
as parameter. It implicitly caches the edges. So I am looking for a
workaround.

http://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.graphx.GraphLoader$

Regards,
Arpit

On Tue, Oct 28, 2014 at 4:25 PM, Yifan LI iamyifa...@gmail.com wrote:

 Hi Arpit,

 To try this:

 val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions =
 numPartitions, edgeStorageLevel = StorageLevel.MEMORY_AND_DISK,
  vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)


 Best,
 Yifan LI




 On 28 Oct 2014, at 11:17, Arpit Kumar arp8...@gmail.com wrote:

 Any help regarding this issue please?

 Regards,
 Arpit

 On Sat, Oct 25, 2014 at 8:56 AM, Arpit Kumar arp8...@gmail.com wrote:

 Hi all,
 I am using the GrpahLoader class to load graphs from edge list files. But
 then I need to change the storage level of the graph to some other thing
 than MEMORY_ONLY.

 val graph = GraphLoader.edgeListFile(sc, fname,
   minEdgePartitions =
 numEPart).persist(StorageLevel.MEMORY_AND_DISK_SER)

 The error I am getting while executing this is:
 Exception in thread main java.lang.UnsupportedOperationException:
 Cannot change storage level of an RDD after it was already assigned a level


 Then I looked into the GraphLoader class. I know that in the latest
 version of spark support for setting persistence level is provided in this
 class. Please suggest a workaround for spark 1.0.0 as I do not have the
 option to shift to latest release.

 Note: I tried copying the GraphLoader class to my package as GraphLoader1
 importing

 package com.cloudera.xyz

 import org.apache.spark.storage.StorageLevel
 import org.apache.spark.graphx._
 import org.apache.spark.{Logging, SparkContext}
 import org.apache.spark.graphx.impl._

 and then changing the persistence level to my suitability as
 .persist(gStorageLevel) instead of .cache()

 But while compiling I am getting the following errors

 GraphLoader1.scala:49: error: class EdgePartitionBuilder in package impl
 cannot be accessed in package org.apache.spark.graphx.impl
 [INFO]   val builder = new EdgePartitionBuilder[Int, Int]

 I am also attaching the file with the mail. Maybe this way of doing thing
 is not possible.


 Please suggest some workarounds so that I can set persistence level of my
 graph to MEMORY_AND_DISK_SER for the graph I read from edge file list




 --
 Arpit Kumar
 Fourth Year Undergraduate
 Department of Computer Science and Engineering
 Indian Institute of Technology, Kharagpur





-- 
Arpit Kumar
Fourth Year Undergraduate
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur


Re: How to set persistence level of graph in GraphX in spark 1.0.0

2014-10-28 Thread Yifan LI
I am not sure if it can work on Spark 1.0, but give it a try.

or, Maybe you can try:
1) to construct the edges and vertices RDDs respectively with desired storage 
level.
2) then, to obtain a graph by using Graph(verticesRDD, edgesRDD).

Best,
Yifan LI




On 28 Oct 2014, at 12:10, Arpit Kumar arp8...@gmail.com wrote:

 Hi Yifan LI,
 I am currently working on Spark 1.0 in which we can't pass edgeStorageLevel 
 as parameter. It implicitly caches the edges. So I am looking for a 
 workaround.
 
 http://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.graphx.GraphLoader$
 
 Regards,
 Arpit
 
 On Tue, Oct 28, 2014 at 4:25 PM, Yifan LI iamyifa...@gmail.com wrote:
 Hi Arpit,
 
 To try this:
 
 val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions = 
 numPartitions, edgeStorageLevel = StorageLevel.MEMORY_AND_DISK,
  vertexStorageLevel = StorageLevel.MEMORY_AND_DISK)
 
 
 Best,
 Yifan LI
 
 
 
 
 On 28 Oct 2014, at 11:17, Arpit Kumar arp8...@gmail.com wrote:
 
 Any help regarding this issue please?
 
 Regards,
 Arpit
 
 On Sat, Oct 25, 2014 at 8:56 AM, Arpit Kumar arp8...@gmail.com wrote:
 Hi all,
 I am using the GrpahLoader class to load graphs from edge list files. But 
 then I need to change the storage level of the graph to some other thing 
 than MEMORY_ONLY.
 
 val graph = GraphLoader.edgeListFile(sc, fname,
   minEdgePartitions = 
 numEPart).persist(StorageLevel.MEMORY_AND_DISK_SER)
 
 The error I am getting while executing this is:
 Exception in thread main java.lang.UnsupportedOperationException: Cannot 
 change storage level of an RDD after it was already assigned a level
 
 
 Then I looked into the GraphLoader class. I know that in the latest version 
 of spark support for setting persistence level is provided in this class. 
 Please suggest a workaround for spark 1.0.0 as I do not have the option to 
 shift to latest release.
 
 Note: I tried copying the GraphLoader class to my package as GraphLoader1 
 importing 
 
 package com.cloudera.xyz
 
 import org.apache.spark.storage.StorageLevel
 import org.apache.spark.graphx._
 import org.apache.spark.{Logging, SparkContext}
 import org.apache.spark.graphx.impl._
 
 and then changing the persistence level to my suitability as 
 .persist(gStorageLevel) instead of .cache()
 
 But while compiling I am getting the following errors
 
 GraphLoader1.scala:49: error: class EdgePartitionBuilder in package impl 
 cannot be accessed in package org.apache.spark.graphx.impl
 [INFO]   val builder = new EdgePartitionBuilder[Int, Int]
 
 I am also attaching the file with the mail. Maybe this way of doing thing is 
 not possible.
 
 
 Please suggest some workarounds so that I can set persistence level of my 
 graph to MEMORY_AND_DISK_SER for the graph I read from edge file list
 
 
 
 -- 
 Arpit Kumar
 Fourth Year Undergraduate
 Department of Computer Science and Engineering
 Indian Institute of Technology, Kharagpur
 
 
 
 
 -- 
 Arpit Kumar
 Fourth Year Undergraduate
 Department of Computer Science and Engineering
 Indian Institute of Technology, Kharagpur



Re: How to set persistence level of graph in GraphX in spark 1.0.0

2014-10-28 Thread Ankur Dave
At 2014-10-25 08:56:34 +0530, Arpit Kumar arp8...@gmail.com wrote:
 GraphLoader1.scala:49: error: class EdgePartitionBuilder in package impl
 cannot be accessed in package org.apache.spark.graphx.impl
 [INFO]   val builder = new EdgePartitionBuilder[Int, Int]

Here's a workaround:

1. Copy and modify the GraphLoader source as you did, but keep it in the 
org.apache.spark.graphx.impl package to fix the package-private error.
2. In addition to changing the persistence level of the edges RDD in 
GraphLoader, construct the VertexRDD and EdgeRDD yourself.
3. Call GraphImpl.fromExistingRDDs to construct the graph. This function will 
respect the existing EdgeRDD storage level.
4. Use the graph as desired. Be sure to avoid Graph#partitionBy, the Pregel 
API, and all of the built-in algorithms, because they call Graph#cache() on 
intermediate graphs.

Here is a modified version of GraphLoader that does 1-3: 
https://gist.github.com/ankurdave/0394d47809297eea76ff

Ankur

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to set persistence level of graph in GraphX in spark 1.0.0

2014-10-24 Thread Arpit Kumar
Hi all,
I am using the GrpahLoader class to load graphs from edge list files. But
then I need to change the storage level of the graph to some other thing
than MEMORY_ONLY.

val graph = GraphLoader.edgeListFile(sc, fname,
  minEdgePartitions =
numEPart).persist(StorageLevel.MEMORY_AND_DISK_SER)

The error I am getting while executing this is:
Exception in thread main java.lang.UnsupportedOperationException: Cannot
change storage level of an RDD after it was already assigned a level


Then I looked into the GraphLoader class. I know that in the latest version
of spark support for setting persistence level is provided in this class.
Please suggest a workaround for spark 1.0.0 as I do not have the option to
shift to latest release.

Note: I tried copying the GraphLoader class to my package as GraphLoader1
importing

package com.cloudera.xyz

import org.apache.spark.storage.StorageLevel
import org.apache.spark.graphx._
import org.apache.spark.{Logging, SparkContext}
import org.apache.spark.graphx.impl._

and then changing the persistence level to my suitability as
.persist(gStorageLevel) instead of .cache()

But while compiling I am getting the following errors

GraphLoader1.scala:49: error: class EdgePartitionBuilder in package impl
cannot be accessed in package org.apache.spark.graphx.impl
[INFO]   val builder = new EdgePartitionBuilder[Int, Int]

I am also attaching the file with the mail. Maybe this way of doing thing
is not possible.


Please suggest some workarounds so that I can set persistence level of my
graph to MEMORY_AND_DISK_SER for the graph I read from edge file list
package com.cloudera.sparkwordcount

import org.apache.spark.storage.StorageLevel
import org.apache.spark.graphx._
import org.apache.spark.{Logging, SparkContext}
import org.apache.spark.graphx.impl._

/**
 * Provides utilities for loading [[Graph]]s from files.
 */
object GraphLoader1 extends Logging {

  /**
   * Loads a graph from an edge list formatted file where each line contains two integers: a source
   * id and a target id. Skips lines that begin with `#`.
   *
   * If desired the edges can be automatically oriented in the positive
   * direction (source Id  target Id) by setting `canonicalOrientation` to
   * true.
   *
   * @example Loads a file in the following format:
   * {{{
   * # Comment Line
   * # Source Id \t Target Id
   * 1   -5
   * 12
   * 27
   * 18
   * }}}
   *
   * @param sc SparkContext
   * @param path the path to the file (e.g., /home/data/file or hdfs://file)
   * @param canonicalOrientation whether to orient edges in the positive
   *direction
   * @param minEdgePartitions the number of partitions for the edge RDD
   */
  def edgeListFile(
  sc: SparkContext,
  path: String,
  canonicalOrientation: Boolean = false,
  minEdgePartitions: Int = 1)
: Graph[Int, Int] =
  {
val startTime = System.currentTimeMillis
val gStorageLevel = StorageLevel.MEMORY_AND_DISK_SER
// Parse the edge data table directly into edge partitions
val lines = sc.textFile(path, minEdgePartitions).coalesce(minEdgePartitions)
val edges = lines.mapPartitionsWithIndex { (pid, iter) =
  val builder = new EdgePartitionBuilder[Int, Int]
  iter.foreach { line =
if (!line.isEmpty  line(0) != '#') {
  val lineArray = line.split(\\s+)
  if (lineArray.length  2) {
logWarning(Invalid line:  + line)
  }
  val srcId = lineArray(0).toLong
  val dstId = lineArray(1).toLong
  if (canonicalOrientation  srcId  dstId) {
builder.add(dstId, srcId, 1)
  } else {
builder.add(srcId, dstId, 1)
  }
}
  }
  Iterator((pid, builder.toEdgePartition))
}.persist(gStorageLevel).setName(GraphLoader.edgeListFile - edges (%s).format(path))
edges.count()

logInfo(It took %d ms to load the edges.format(System.currentTimeMillis - startTime))

GraphImpl.fromEdgePartitions(edges, defaultVertexAttr = 1)
  } // end of edgeListFile

}
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org