error accessing vertexRDD

2015-08-26 Thread dizzy5112
Hi all, question on an issue im having with a vertexRDD. If i kick of my
spark shell with something like this:



then run:


it will finish and give me the count but is see a few errors (see below).
This is okay for this small dataset but when trying with a large data set it
doesnt finish because of the number of errors. this works okay if i kick of
my spark shell with master = local.  Any help appreciated





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/error-accessing-vertexRDD-tp24466.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: graphx class not found error

2015-08-13 Thread dizzy5112
Oh forgot to note using the Scala REPL for this.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class-not-found-error-tp24253p24254.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



graphx class not found error

2015-08-13 Thread dizzy5112
the code below works perfectly on both cluster and local modes 



but when i try to create a graph in cluster mode (it works in local mode)


I get the following error:



any help appreciated



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class-not-found-error-tp24253.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



pregel graphx job not finishing

2015-08-11 Thread dizzy5112
Hi im currently using a pregel message passing function for my graph in spark
and graphx. The problem i have is that the code runs perfectly on spark 1.0
and finishes in a couple of minutes but as we have upgraded now im trying to
run the same code on 1.3 but it doesnt finish (left it overnight and it was
still going) and get a lot of messages as follows (doesnt happen in v1.0).








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pregel-graphx-job-not-finishing-tp24221.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: flatmapping with other data

2015-06-15 Thread dizzy5112
Sorry cut and paste error, the resulting data set i want is this:
({(101,S)=3},piece_of_data_1))
({(101,S)=3},piece_of_data_2))
({(101,S)=1},piece_of_data_3))
({(109,S)=2},piece_of_data_3))



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/flatmapping-with-other-data-tp23324p23325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



propogating edges

2015-01-11 Thread dizzy5112
Hi all looking for some help in propagating some values in edges. What i want
to achieve (see diagram) is for each connected part of the graph assign an
incrementing value  for each of the out links from the root node. This value
will restart again for the next part of the graph. ie node 1 has out links
to node 2,3,and 4. The edge attribute for these will be 1,2 and 3
respectively. For each of the out links from these nodes they keep this
value right through to the final node in their path. For node 9 with out
link to 10 this has an edge attribute of 1 etc etc.  Thanks in advance for
any help :) dave

http://apache-spark-user-list.1001560.n3.nabble.com/file/n21086/Drawing2.jpg 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/propogating-edges-tp21086.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



equivalent to sql in

2014-12-09 Thread dizzy5112
i have and RDD i want to filter and for a single term all works good:
ie
dataRDD.filter(x=x._2 ==apple)

how can i use multiple values, for example if i wanted to filter my rdd to
take out apples and oranges and pears with out using .  This could
get long winded as there may be quite a few. Can you filter using a set or a
list?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/equivalent-to-sql-in-tp20599.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



graph x extracting the path

2014-11-02 Thread dizzy5112
Hi all, just wondering if there was a way to extract paths in graphx. For
example if i have the graph attached i would like to return the results
along the lines of :

101 - 103
101 -104 -108
102 -105
102 -106-107

http://apache-spark-user-list.1001560.n3.nabble.com/file/n17936/graph.jpg 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/graph-x-extracting-the-path-tp17936.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



kryos serializer

2014-10-14 Thread dizzy5112
Hi all, how can i tell if my kryos serializer is actually working. I have a
class which extends Serializable and i have included the following imports:
import com.esotericsoftware.kryo.Kryo
import org.apache.spark.serializer.KryoRegistrator

i also have class included 
MyRegistrator extends KryoRegistrator {
  override def registerClasses(kryo: Kryo) {
kryo.register(classOf[Person])
  }
}
my configuration includes 
spark.serializerorg.apache.spark.serializer.KryoSerializer

but not sure if its working (code runs fine though, although parts using
larger data sets pretty slow). I see in the documentation i should use
conf.set(spark.kryo.registrator, mypackage.MyRegistrator) but as im
using the scala repl not sure where this should go or what format should be
in which leads me to wondering if im using the default java serialization.
Is there a way i can tell, and how to I go about including the myRegistrator
bit above.

Thanks in advance for any help
D






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kryos-serializer-tp16454.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



creating a subgraph with an edge predicate

2014-08-25 Thread dizzy5112
Im currently creating a subgraph using the vertex predicate:
subgraph(vpred = (vid,attr) = attr.split(,)(2)!=999)

but wondering if a subgraph can be created using the edge predicate, if so a
sample would be great :)

thanks
Dave



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-subgraph-with-an-edge-predicate-tp12797.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



counting degrees graphx

2014-05-25 Thread dizzy5112
Hi, looking for a little help on counting the degrees in a graph. Currently
my graph consists of 2 subgraphs the and it looks like this:

val vertexArray = Array(
(1L,(101,x)),
(2L,(102,y)),
(3L,(103,y)),
(4L,(104,y)),
(5L,(105,y)),
(6L,(106,x)),
(7L,(107,x)),
(8L,(108,y))
)

val edgeArray = Array(
Edge(1L,2L,1),
Edge(1L,3L,2),
Edge(3L,4L,3),
Edge(3L,5L,4),
Edge(6L,5L,5),
Edge(7L,8L,6)
)

no i can summarize the graphs using connected components as such:
val cc = userGraph.connectedComponents
userGraph.vertices.leftJoin(cc.vertices) {
case(id,tfn,ent) = s$tfn is in component $ent}.collect.foreach{ case
(id,str) = println(str) }
and i get the results i expect
(101,SH,0,2) is in component Some(1)
(102,COY,1,0) is in component Some(1)
(103,COY,1,2) is in component Some(1)
(104,COY,1,0) is in component Some(1)
(105,COY,2,0) is in component Some(1)
(106,SH,0,1) is in component Some(1)
(107,SH,0,1) is in component Some(7)
(108,COY,1,0) is in component Some(7)

this essentially gives me two clusters with the root nodes for each as ID 1
and ID 7. what i really want to do is identify the third cluster that is
built into this graph (the connection between 106 and 105 as another
cluster. My results would identify that 105 was in cluster 1 and the new
cluster. any help appreciated.

cheers 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: counting degrees graphx

2014-05-25 Thread dizzy5112
yes thats correct I want the vertex set for each source vertice in the graph.
Which of course leads me on to my next question is to add a level to each of
these.  
http://apache-spark-user-list.1001560.n3.nabble.com/file/n6383/image1.jpg 

For example the image shows the in and out links of the graph and shows my
structure.  i want the list of vertices in 1,6 and 7.  i need to show that
vertice 1 has members 2,3,4,5 and vertice 6 has members 5 and vertice 7 has
members 8. 

Ideally i would also like to go one step further and identify which level
each vertice is on ie vertices 1,6 and 7 are level 0, verices 2,3,and and
are level 1, vertices 4 and 5 are level 2 (5 is also a level 1 when looked
at through vertice 6). 

Hope that is clearer.

cheers




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370p6383.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.