Re: representing RDF literals as vertex properties

2014-12-08 Thread spr
OK, have waded into implementing this and have gotten pretty far, but am now
hitting something I don't understand, an NoSuchMethodError. 

The code looks like

  [...]
   val conf = new SparkConf().setAppName(appName)
//conf.set(fs.default.name, file://);
val sc = new SparkContext(conf)
   
val lines = sc.textFile(inFileArg)
val foo = lines.count()
val edgeTmp = lines.map( line = line.split( ).slice(0,3)).
  // following filters omit comments, so no need to specifically
filter for comments (#)
  filter(x = x(0).startsWith()   x(0).endsWith() 
  x(2).startsWith()   x(2).endsWith()).
  map(x = Edge(hashToVId(x(0)),hashToVId(x(2)),x(1)))
edgeTmp.foreach( edge = print(edge+\n))
val edges: RDD[Edge[String]] = edgeTmp
println(edges.count=+edges.count)

val properties: RDD[(VertexId, Map[String, Any])] =
lines.map( line = line.split( ).slice(0,3)).
  filter(x = !x(0).startsWith(#)).   // omit RDF comments
  filter(x = !x(2).startsWith() || !x(2).endsWith()).
  map(x = { val m: Tuple2[VertexId, Map[String, Any]] =
(hashToVId(x(0)), Map((x(1).toString,x(2; m })
properties.foreach( prop = print(prop+\n))

val G = Graph(properties, edges)///  this is line 114
println(G)

The (short) traceback looks like

Exception in thread main java.lang.NoSuchMethodError:
org.apache.spark.graphx.Graph$.apply$default$4()Lorg/apache/spark/storage/StorageLevel;
at com.cray.examples.spark.graphx.lubm.query9$.main(query9.scala:114)
at com.cray.examples.spark.graphx.lubm.query9.main(query9.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Is the method that's not found (.../StorageLevel) something I need to
initialize?  Using this same code on a toy problem works fine.  

BTW, this is Spark 1.0, running locally on my laptop.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/representing-RDF-literals-as-vertex-properties-tp20404p20582.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: representing RDF literals as vertex properties

2014-12-08 Thread Ankur Dave
At 2014-12-08 12:12:16 -0800, spr s...@yarcdata.com wrote:
 OK, have waded into implementing this and have gotten pretty far, but am now
 hitting something I don't understand, an NoSuchMethodError. 
 [...]
 The (short) traceback looks like

 Exception in thread main java.lang.NoSuchMethodError:
 org.apache.spark.graphx.Graph$.apply$default$4()Lorg/apache/spark/storage/StorageLevel;
 [...]
 Is the method that's not found (.../StorageLevel) something I need to
 initialize?  Using this same code on a toy problem works fine.  

This is a binary compatibility error and shouldn't happen as long as you're 
compiling and running with the same Spark assembly jar. Is it possible there's 
a version mismatch between compiling and running?

Ankur

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



representing RDF literals as vertex properties

2014-12-04 Thread spr
@ankurdave's concise code at
https://gist.github.com/ankurdave/587eac4d08655d0eebf9, responding to an
earlier thread
(http://apache-spark-user-list.1001560.n3.nabble.com/How-to-construct-graph-in-graphx-tt16335.html#a16355)
shows how to build a graph with multiple edge-types (predicates in
RDF-speak).  

I'm also looking at how to represent literals as vertex properties.  It
seems one way to do this is via positional convention in an Array/Tuple/List
that is the VD;  i.e., to represent height, weight, and eyeColor, the VD
could be a Tuple3(Double, Double, String).  If any of the properties can be
present or not, then it seems the code needs to be precise about which
elements of the Array/Tuple/List are present and which are not.  E.g., to
assign only weight, it could be Tuple3(Option(Double), 123.4,
Option(String)).  Given that vertices can have many many properties, it
seems memory consumption for the properties should be as parsimonious as
possible.  Will any of Array/Tuple/List support sparse usage?  Is Option the
way to get there?

Is this a reasonable approach for representing vertex properties, or is
there a better way?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/representing-RDF-literals-as-vertex-properties-tp20404.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: representing RDF literals as vertex properties

2014-12-04 Thread Ankur Dave
At 2014-12-04 16:26:50 -0800, spr s...@yarcdata.com wrote:
 I'm also looking at how to represent literals as vertex properties. It seems
 one way to do this is via positional convention in an Array/Tuple/List that is
 the VD; i.e., to represent height, weight, and eyeColor, the VD could be a
 Tuple3(Double, Double, String).
 [...]
 Given that vertices can have many many properties, it seems memory consumption
 for the properties should be as parsimonious as possible. Will any of
 Array/Tuple/List support sparse usage? Is Option the way to get there?

Storing vertex properties positionally with Array[Option[Any]] or any of the 
other sequence types will provide a dense representation. For a sparse 
representation, the right data type is a Map[String, Any], which will let you 
access properties by name and will only store the nonempty properties.

Since the value type in the map has to be Any, or more precisely the least 
upper bound of the property types, this sacrifices type safety and you'll have 
to downcast when retrieving properties. If there are particular subsets of the 
properties that frequently go together, you could instead use a class 
hierarchy. For example, if the vertices are either people or products, you 
could use the following:

sealed trait VertexProperty extends Serializable
case class Person(name: String, weight: Int) extends VertexProperty
case class Product(name: String, price: Int) extends VertexProperty

Then you could pattern match against the hierarchy instead of downcasting:

 List(Person(Bob, 180), Product(chair, 800), Product(desk, 
200)).flatMap {
   case Person(name, weight) = Array.empty[Int]
   case Product(name, price) = Array(price)
 }.sum

Ankur

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org