Future of GraphX

2016-08-24 Thread mas
Hi, 

I am wondering if there is any current work going on optimizations of
GraphX? 
I am aware of GraphFrames that is built on Data frame. However, is there any
plane to build GraphX's version on newer Spark APIs, i.e., Datasets or Spark
2.0?

Furthermore, Is there any plan to incorporate Graph Streaming.

Thanks,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Future-of-GraphX-tp27592.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Incrementally add/remove vertices in GraphX

2015-10-19 Thread mas
Dear All,

Any update regarding Graph Streaming, I want to update, i.e., add vertices
and edges after creation of graph. 

Any suggestions or recommendations to do that.

Thanks,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Incrementally-add-remove-vertices-in-GraphX-tp2227p25116.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Effecient way to fetch all records on a particular node/partition in GraphX

2015-05-17 Thread mas
Hi All,

I have distributed my RDD into say 10 nodes. I want to fetch the data that
resides on a particular node say node 5. How i can achieve this?
I have tried mapPartitionWithIndex function to filter the data of that
corresponding node, however it is pretty expensive. 
Any efficient way to do that ? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Effecient-way-to-fetch-all-records-on-a-particular-node-partition-in-GraphX-tp22923.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Custom Partitioning Spark

2015-04-20 Thread mas
Hi,

I aim to do custom partitioning on a text file. I first convert it into
pairRDD and then try to use my custom partitioner. However, somehow it is
not working. My code snippet is given below.

val file=sc.textFile(filePath)
val locLines=file.map(line = line.split(\t)).map(line=
((line(2).toDouble,line(3).toDouble),line(5).toLong))
val ck=locLines.partitionBy(new HashPartitioner(50)) // new
CustomPartitioner(50) -- none of the way is working here.

while reading the file using textFile method it automatically partitions
the file. However when i explicitly want to partition the new rdd
locLines, It doesn't appear to do anything and even the number of
partitions are same which is created by sc.textFile().

Any help in this regard will be highly appreciated.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Path issue in running spark

2015-04-17 Thread mas
A very basic but strange problem: 
On running master i am getting following error. 
My java path is proper, however spark-class file is getting error because
here the in the string bin/java is duplicated. Can any body explain why it
is getting this .

Error: 

/bin/spark-class: line 190: exec:
/usr/lib/jvm/java-8-oracle/jre/bin/java/bin/java: cannot execute: Not a
directory



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Path-issue-in-running-spark-tp22536.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread mas
I have a big data file, i aim to create index on the data. I want to
partition the data based on user defined function in Spark-GraphX (Scala). 
Further i want to keep track the node on which a particular data partition
is send and being processed so i could fetch the required data by accessing
the right node and data partition.
How can i achieve this? 
Any help in this regard will be highly appreciated.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-node-tracking-in-Spark-GraphX-tp22527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Incremently load big RDD file into Memory

2015-04-07 Thread mas

val locations = filelines.map(line = line.split(\t)).map(t =
(t(5).toLong, (t(2).toDouble, t(3).toDouble))).distinct().collect()

val cartesienProduct=locations.cartesian(locations).map(t=
Edge(t._1._1,t._2._1,distanceAmongPoints(t._1._2._1,t._1._2._2,t._2._2._1,t._2._2._2)))

Code executes perfectly fine uptill here but when i try to use
cartesienProduct it got stuck i.e.

val count =cartesienProduct.count()

Any help to efficiently do this will be highly appreciated.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Incremently-load-big-RDD-file-into-Memory-tp22410.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



UNRESOLVED DEPENDENCIES while building Spark 1.3.0

2015-04-04 Thread mas
Hi All,

I am trying to build spark 1.3.0 on Ubuntu 14.04 Stand alone machine. I am
using sbt/sbt assembly command to build it. However, this command works
pretty fine with spark version 1.1.0 but for Spark 1.3 it gives following
error.
Any help or suggestions to resolve this problem will highly be appreciated.

] Resolving org.fusesource.jansi#jansi;1.4 ...
[warn]  ::
[warn]  ::  UNRESOLVED DEPENDENCIES ::
[warn]  ::
[warn]  :: org.apache.spark#spark-network-common_2.10;1.3.0: configuration
not p   
 
ublic in org.apache.spark#spark-network-common_2.10;1.3.0: 'test'. It was
requir  
  
ed from org.apache.spark#spark-network-shuffle_2.10;1.3.0 test
[warn]  ::
[warn]
[warn]  Note: Unresolved dependencies path:
[warn]  org.apache.spark:spark-network-common_2.10:1.3.0
((com.typesafe. 
   
sbt.pom.MavenHelper) MavenHelper.scala#L76)
[warn]+- org.apache.spark:spark-network-shuffle_2.10:1.3.0
sbt.ResolveException: unresolved dependency:
org.apache.spark#spark-network-comm 
   
on_2.10;1.3.0: configuration not public in
org.apache.spark#spark-network-common   
 
_2.10;1.3.0: 'test'. It was required from
org.apache.spark#spark-network-shuffle  
  
_2.10;1.3.0 test
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:278)
at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:175)
at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:157)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151)
at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:128)
at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56)
at sbt.IvySbt$$anon$4.call(Ivy.scala:64)
at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
at
xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRet   

 
ries$1(Locks.scala:78)
at
xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:   

 
97)
at xsbt.boot.Using$.withResource(Using.scala:10)
at xsbt.boot.Using$.apply(Using.scala:9)
at
xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
at xsbt.boot.Locks$.apply0(Locks.scala:31)
at xsbt.boot.Locks$.apply(Locks.scala:28)
at sbt.IvySbt.withDefaultLogger(Ivy.scala:64)
at sbt.IvySbt.withIvy(Ivy.scala:123)
at sbt.IvySbt.withIvy(Ivy.scala:120)
at sbt.IvySbt$Module.withModule(Ivy.scala:151)
at sbt.IvyActions$.updateEither(IvyActions.scala:157)
at
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala   

 
:1318)
at
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala   

 
:1315)
at
sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1   

 
345)
at
sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1   

 
343)
at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1348)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1342)
at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45)
at sbt.Classpaths$.cachedUpdate(Defaults.scala:1360)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1300)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1275)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at

UNRESOLVED DEPENDENCIES while building Spark 1.3.0

2015-04-04 Thread mas
Hi All,
I am trying to build spark 1.3.O on standalone Ubuntu 14.04. I am using the
sbt command i.e. sbt/sbt assembly to build it. This command works pretty
good with spark version 1.1 however, it gives following error with spark
1.3.0. Any help or suggestions to resolve this would highly be appreciated.

[info] Done updating.
[info] Updating {file:/home/roott/aamirTest/spark/}network-shuffle...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[warn]  ::
[warn]  ::  UNRESOLVED DEPENDENCIES ::
[warn]  ::
[warn]  :: org.apache.spark#spark-network-common_2.10;1.3.0: configuration
not p   
 
ublic in org.apache.spark#spark-network-common_2.10;1.3.0: 'test'. It was
requir  
  
ed from org.apache.spark#spark-network-shuffle_2.10;1.3.0 test
[warn]  ::
[warn]
[warn]  Note: Unresolved dependencies path:
[warn]  org.apache.spark:spark-network-common_2.10:1.3.0
((com.typesafe. 
   
sbt.pom.MavenHelper) MavenHelper.scala#L76)
[warn]+- org.apache.spark:spark-network-shuffle_2.10:1.3.0
sbt.ResolveException: unresolved dependency:
org.apache.spark#spark-network-comm 
   
on_2.10;1.3.0: configuration not public in
org.apache.spark#spark-network-common   
 
_2.10;1.3.0: 'test'. It was required from
org.apache.spark#spark-network-shuffle  
  
_2.10;1.3.0 test
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:278)
at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:175)
at
sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:157)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151)
at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:128)
at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56)
at sbt.IvySbt$$anon$4.call(Ivy.scala:64)
at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93)
at
xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRet   

 
ries$1(Locks.scala:78)
at
xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:   

 
97)
at xsbt.boot.Using$.withResource(Using.scala:10)
at xsbt.boot.Using$.apply(Using.scala:9)
at
xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58)
at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48)
at xsbt.boot.Locks$.apply0(Locks.scala:31)
at xsbt.boot.Locks$.apply(Locks.scala:28)
at sbt.IvySbt.withDefaultLogger(Ivy.scala:64)
at sbt.IvySbt.withIvy(Ivy.scala:123)
at sbt.IvySbt.withIvy(Ivy.scala:120)
at sbt.IvySbt$Module.withModule(Ivy.scala:151)
at sbt.IvyActions$.updateEither(IvyActions.scala:157)
at
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala   

 
:1318)
at
sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala   

 
:1315)
at
sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1   

 
345)
at
sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1   

 
343)
at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1348)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1342)
at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45)
at sbt.Classpaths$.cachedUpdate(Defaults.scala:1360)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1300)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1275)
at 

failed to launch workers on spark

2015-03-27 Thread mas
Hi all!
I am trying to install spark on my standalone machine. I am able to run the
master but when i try to run the slaves it gives me following error. Any
help in this regard will highly be appreciated. 
_
localhost: failed to launch org.apache.spark.deploy.worker.Worker:
localhost: at
sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
localhost: at
sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/failed-to-launch-workers-on-spark-tp22254.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How does GraphX internally traverse the Graph?

2015-01-14 Thread mas
I want to know the internal traversal of Graph by GraphX. Is it vertex and
edges based traversal or sequential traversal of RDDS? For example given a
vertex of graph, i want to fetch only of its neighbors Not the neighbors of
all the vertices ? How GraphX will traverse the graph in this case.

Thanks in anticipation.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-GraphX-internally-traverse-the-Graph-tp21136.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org