Re: representing RDF literals as vertex properties
OK, have waded into implementing this and have gotten pretty far, but am now hitting something I don't understand, an NoSuchMethodError. The code looks like [...] val conf = new SparkConf().setAppName(appName) //conf.set(fs.default.name, file://); val sc = new SparkContext(conf) val lines = sc.textFile(inFileArg) val foo = lines.count() val edgeTmp = lines.map( line = line.split( ).slice(0,3)). // following filters omit comments, so no need to specifically filter for comments (#) filter(x = x(0).startsWith() x(0).endsWith() x(2).startsWith() x(2).endsWith()). map(x = Edge(hashToVId(x(0)),hashToVId(x(2)),x(1))) edgeTmp.foreach( edge = print(edge+\n)) val edges: RDD[Edge[String]] = edgeTmp println(edges.count=+edges.count) val properties: RDD[(VertexId, Map[String, Any])] = lines.map( line = line.split( ).slice(0,3)). filter(x = !x(0).startsWith(#)). // omit RDF comments filter(x = !x(2).startsWith() || !x(2).endsWith()). map(x = { val m: Tuple2[VertexId, Map[String, Any]] = (hashToVId(x(0)), Map((x(1).toString,x(2; m }) properties.foreach( prop = print(prop+\n)) val G = Graph(properties, edges)/// this is line 114 println(G) The (short) traceback looks like Exception in thread main java.lang.NoSuchMethodError: org.apache.spark.graphx.Graph$.apply$default$4()Lorg/apache/spark/storage/StorageLevel; at com.cray.examples.spark.graphx.lubm.query9$.main(query9.scala:114) at com.cray.examples.spark.graphx.lubm.query9.main(query9.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Is the method that's not found (.../StorageLevel) something I need to initialize? Using this same code on a toy problem works fine. BTW, this is Spark 1.0, running locally on my laptop. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/representing-RDF-literals-as-vertex-properties-tp20404p20582.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: representing RDF literals as vertex properties
At 2014-12-08 12:12:16 -0800, spr s...@yarcdata.com wrote: OK, have waded into implementing this and have gotten pretty far, but am now hitting something I don't understand, an NoSuchMethodError. [...] The (short) traceback looks like Exception in thread main java.lang.NoSuchMethodError: org.apache.spark.graphx.Graph$.apply$default$4()Lorg/apache/spark/storage/StorageLevel; [...] Is the method that's not found (.../StorageLevel) something I need to initialize? Using this same code on a toy problem works fine. This is a binary compatibility error and shouldn't happen as long as you're compiling and running with the same Spark assembly jar. Is it possible there's a version mismatch between compiling and running? Ankur - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
representing RDF literals as vertex properties
@ankurdave's concise code at https://gist.github.com/ankurdave/587eac4d08655d0eebf9, responding to an earlier thread (http://apache-spark-user-list.1001560.n3.nabble.com/How-to-construct-graph-in-graphx-tt16335.html#a16355) shows how to build a graph with multiple edge-types (predicates in RDF-speak). I'm also looking at how to represent literals as vertex properties. It seems one way to do this is via positional convention in an Array/Tuple/List that is the VD; i.e., to represent height, weight, and eyeColor, the VD could be a Tuple3(Double, Double, String). If any of the properties can be present or not, then it seems the code needs to be precise about which elements of the Array/Tuple/List are present and which are not. E.g., to assign only weight, it could be Tuple3(Option(Double), 123.4, Option(String)). Given that vertices can have many many properties, it seems memory consumption for the properties should be as parsimonious as possible. Will any of Array/Tuple/List support sparse usage? Is Option the way to get there? Is this a reasonable approach for representing vertex properties, or is there a better way? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/representing-RDF-literals-as-vertex-properties-tp20404.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: representing RDF literals as vertex properties
At 2014-12-04 16:26:50 -0800, spr s...@yarcdata.com wrote: I'm also looking at how to represent literals as vertex properties. It seems one way to do this is via positional convention in an Array/Tuple/List that is the VD; i.e., to represent height, weight, and eyeColor, the VD could be a Tuple3(Double, Double, String). [...] Given that vertices can have many many properties, it seems memory consumption for the properties should be as parsimonious as possible. Will any of Array/Tuple/List support sparse usage? Is Option the way to get there? Storing vertex properties positionally with Array[Option[Any]] or any of the other sequence types will provide a dense representation. For a sparse representation, the right data type is a Map[String, Any], which will let you access properties by name and will only store the nonempty properties. Since the value type in the map has to be Any, or more precisely the least upper bound of the property types, this sacrifices type safety and you'll have to downcast when retrieving properties. If there are particular subsets of the properties that frequently go together, you could instead use a class hierarchy. For example, if the vertices are either people or products, you could use the following: sealed trait VertexProperty extends Serializable case class Person(name: String, weight: Int) extends VertexProperty case class Product(name: String, price: Int) extends VertexProperty Then you could pattern match against the hierarchy instead of downcasting: List(Person(Bob, 180), Product(chair, 800), Product(desk, 200)).flatMap { case Person(name, weight) = Array.empty[Int] case Product(name, price) = Array(price) }.sum Ankur - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org