Future of GraphX
Hi, I am wondering if there is any current work going on optimizations of GraphX? I am aware of GraphFrames that is built on Data frame. However, is there any plane to build GraphX's version on newer Spark APIs, i.e., Datasets or Spark 2.0? Furthermore, Is there any plan to incorporate Graph Streaming. Thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Future-of-GraphX-tp27592.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Incrementally add/remove vertices in GraphX
Dear All, Any update regarding Graph Streaming, I want to update, i.e., add vertices and edges after creation of graph. Any suggestions or recommendations to do that. Thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Incrementally-add-remove-vertices-in-GraphX-tp2227p25116.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Effecient way to fetch all records on a particular node/partition in GraphX
Hi All, I have distributed my RDD into say 10 nodes. I want to fetch the data that resides on a particular node say node 5. How i can achieve this? I have tried mapPartitionWithIndex function to filter the data of that corresponding node, however it is pretty expensive. Any efficient way to do that ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Effecient-way-to-fetch-all-records-on-a-particular-node-partition-in-GraphX-tp22923.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Custom Partitioning Spark
Hi, I aim to do custom partitioning on a text file. I first convert it into pairRDD and then try to use my custom partitioner. However, somehow it is not working. My code snippet is given below. val file=sc.textFile(filePath) val locLines=file.map(line = line.split(\t)).map(line= ((line(2).toDouble,line(3).toDouble),line(5).toLong)) val ck=locLines.partitionBy(new HashPartitioner(50)) // new CustomPartitioner(50) -- none of the way is working here. while reading the file using textFile method it automatically partitions the file. However when i explicitly want to partition the new rdd locLines, It doesn't appear to do anything and even the number of partitions are same which is created by sc.textFile(). Any help in this regard will be highly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Partitioning-Spark-tp22571.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Path issue in running spark
A very basic but strange problem: On running master i am getting following error. My java path is proper, however spark-class file is getting error because here the in the string bin/java is duplicated. Can any body explain why it is getting this . Error: /bin/spark-class: line 190: exec: /usr/lib/jvm/java-8-oracle/jre/bin/java/bin/java: cannot execute: Not a directory -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Path-issue-in-running-spark-tp22536.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Data partitioning and node tracking in Spark-GraphX
I have a big data file, i aim to create index on the data. I want to partition the data based on user defined function in Spark-GraphX (Scala). Further i want to keep track the node on which a particular data partition is send and being processed so i could fetch the required data by accessing the right node and data partition. How can i achieve this? Any help in this regard will be highly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Data-partitioning-and-node-tracking-in-Spark-GraphX-tp22527.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Incremently load big RDD file into Memory
val locations = filelines.map(line = line.split(\t)).map(t = (t(5).toLong, (t(2).toDouble, t(3).toDouble))).distinct().collect() val cartesienProduct=locations.cartesian(locations).map(t= Edge(t._1._1,t._2._1,distanceAmongPoints(t._1._2._1,t._1._2._2,t._2._2._1,t._2._2._2))) Code executes perfectly fine uptill here but when i try to use cartesienProduct it got stuck i.e. val count =cartesienProduct.count() Any help to efficiently do this will be highly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Incremently-load-big-RDD-file-into-Memory-tp22410.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
UNRESOLVED DEPENDENCIES while building Spark 1.3.0
Hi All, I am trying to build spark 1.3.0 on Ubuntu 14.04 Stand alone machine. I am using sbt/sbt assembly command to build it. However, this command works pretty fine with spark version 1.1.0 but for Spark 1.3 it gives following error. Any help or suggestions to resolve this problem will highly be appreciated. ] Resolving org.fusesource.jansi#jansi;1.4 ... [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: org.apache.spark#spark-network-common_2.10;1.3.0: configuration not p ublic in org.apache.spark#spark-network-common_2.10;1.3.0: 'test'. It was requir ed from org.apache.spark#spark-network-shuffle_2.10;1.3.0 test [warn] :: [warn] [warn] Note: Unresolved dependencies path: [warn] org.apache.spark:spark-network-common_2.10:1.3.0 ((com.typesafe. sbt.pom.MavenHelper) MavenHelper.scala#L76) [warn]+- org.apache.spark:spark-network-shuffle_2.10:1.3.0 sbt.ResolveException: unresolved dependency: org.apache.spark#spark-network-comm on_2.10;1.3.0: configuration not public in org.apache.spark#spark-network-common _2.10;1.3.0: 'test'. It was required from org.apache.spark#spark-network-shuffle _2.10;1.3.0 test at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:278) at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:175) at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:157) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151) at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:128) at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56) at sbt.IvySbt$$anon$4.call(Ivy.scala:64) at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRet ries$1(Locks.scala:78) at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala: 97) at xsbt.boot.Using$.withResource(Using.scala:10) at xsbt.boot.Using$.apply(Using.scala:9) at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) at xsbt.boot.Locks$.apply0(Locks.scala:31) at xsbt.boot.Locks$.apply(Locks.scala:28) at sbt.IvySbt.withDefaultLogger(Ivy.scala:64) at sbt.IvySbt.withIvy(Ivy.scala:123) at sbt.IvySbt.withIvy(Ivy.scala:120) at sbt.IvySbt$Module.withModule(Ivy.scala:151) at sbt.IvyActions$.updateEither(IvyActions.scala:157) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala :1318) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala :1315) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1 345) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1 343) at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1348) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1342) at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45) at sbt.Classpaths$.cachedUpdate(Defaults.scala:1360) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1300) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1275) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at
UNRESOLVED DEPENDENCIES while building Spark 1.3.0
Hi All, I am trying to build spark 1.3.O on standalone Ubuntu 14.04. I am using the sbt command i.e. sbt/sbt assembly to build it. This command works pretty good with spark version 1.1 however, it gives following error with spark 1.3.0. Any help or suggestions to resolve this would highly be appreciated. [info] Done updating. [info] Updating {file:/home/roott/aamirTest/spark/}network-shuffle... [info] Resolving org.fusesource.jansi#jansi;1.4 ... [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: org.apache.spark#spark-network-common_2.10;1.3.0: configuration not p ublic in org.apache.spark#spark-network-common_2.10;1.3.0: 'test'. It was requir ed from org.apache.spark#spark-network-shuffle_2.10;1.3.0 test [warn] :: [warn] [warn] Note: Unresolved dependencies path: [warn] org.apache.spark:spark-network-common_2.10:1.3.0 ((com.typesafe. sbt.pom.MavenHelper) MavenHelper.scala#L76) [warn]+- org.apache.spark:spark-network-shuffle_2.10:1.3.0 sbt.ResolveException: unresolved dependency: org.apache.spark#spark-network-comm on_2.10;1.3.0: configuration not public in org.apache.spark#spark-network-common _2.10;1.3.0: 'test'. It was required from org.apache.spark#spark-network-shuffle _2.10;1.3.0 test at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:278) at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:175) at sbt.IvyActions$$anonfun$updateEither$1.apply(IvyActions.scala:157) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151) at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:151) at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:128) at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:56) at sbt.IvySbt$$anon$4.call(Ivy.scala:64) at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:93) at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRet ries$1(Locks.scala:78) at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala: 97) at xsbt.boot.Using$.withResource(Using.scala:10) at xsbt.boot.Using$.apply(Using.scala:9) at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:58) at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:48) at xsbt.boot.Locks$.apply0(Locks.scala:31) at xsbt.boot.Locks$.apply(Locks.scala:28) at sbt.IvySbt.withDefaultLogger(Ivy.scala:64) at sbt.IvySbt.withIvy(Ivy.scala:123) at sbt.IvySbt.withIvy(Ivy.scala:120) at sbt.IvySbt$Module.withModule(Ivy.scala:151) at sbt.IvyActions$.updateEither(IvyActions.scala:157) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala :1318) at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala :1315) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1 345) at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$85.apply(Defaults.scala:1 343) at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1348) at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1342) at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45) at sbt.Classpaths$.cachedUpdate(Defaults.scala:1360) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1300) at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1275) at
failed to launch workers on spark
Hi all! I am trying to install spark on my standalone machine. I am able to run the master but when i try to run the slaves it gives me following error. Any help in this regard will highly be appreciated. _ localhost: failed to launch org.apache.spark.deploy.worker.Worker: localhost: at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) localhost: at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/failed-to-launch-workers-on-spark-tp22254.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How does GraphX internally traverse the Graph?
I want to know the internal traversal of Graph by GraphX. Is it vertex and edges based traversal or sequential traversal of RDDS? For example given a vertex of graph, i want to fetch only of its neighbors Not the neighbors of all the vertices ? How GraphX will traverse the graph in this case. Thanks in anticipation. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-does-GraphX-internally-traverse-the-Graph-tp21136.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org