Re: mLIb solving linear regression with sparse inputs

2018-11-05 Thread Robineast
Well I did eventually write this code in Java, and it was very long! see 
https://github.com/insidedctm/sparse-linear-regression
  



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: GraphX subgraph from list of VertexIds

2017-05-12 Thread Robineast
it would be listVertices.contains(vid) wouldn't it?



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-subgraph-from-list-of-VertexIds-tp28677p28679.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: GraphX Pregel API: add vertices and edges

2017-03-23 Thread Robineast
>From the section on Pregel API in the GraphX programming guide: '... the
Pregel operator in GraphX is a bulk-synchronous parallel messaging
abstraction /constrained to the topology of the graph/.'. Does that answer
your question? Did you read the programming guide?



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Pregel-API-add-vertices-and-edges-tp28519p28529.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: GraphX Pregel API: add vertices and edges

2017-03-23 Thread Robineast
GraphX is not synonymous with Pregel. To quote the  GraphX programming guide
  
'GraphX exposes a variant of the Pregel API.'. There is no compute()
function in GraphX - see the Pregel API section of the programming guide for
details on how GraphX implements a Pregel-like API



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Pregel-API-add-vertices-and-edges-tp28519p28527.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: GraphX Pregel API: add vertices and edges

2017-03-23 Thread Robineast
Not that I'm aware of. Where did you read that?



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Pregel-API-add-vertices-and-edges-tp28519p28523.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Graphx triplet comparison

2016-12-14 Thread Robineast
You are trying to invoke 1 RDD action inside another, that won't work. If you
want to do what you are attempting you need to .collect() each triplet to
the driver and iterate over that.

HOWEVER you almost certainly don't want to do that, not if your data are
anything other than a trivial size. In essence you are doing a cartesian
join followed by a filter - that doesn't scale. You might want to consider
joining one triplet RDD to another and then evaluating the condition.



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-triplet-comparison-tp28198p28208.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Graphx triplet comparison

2016-12-13 Thread Robineast
No sure what you are asking. What's wrong with:

triplet1.filter(condition3)
triplet2.filter(condition3)




-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-triplet-comparison-tp28198p28202.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Does SparkR or SparkMLib support nonlinear optimization with non linear constraints

2016-11-25 Thread Robineast
I provided an answer to a similar question here: 
https://www.mail-archive.com/user@spark.apache.org/msg57697.html


---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 






> On 25 Nov 2016, at 09:55, elitejyo [via Apache Spark User List] 
>  wrote:
> 
> Our business application supports two types of function convex and S-shaped 
> curves and linear & non-linear constraints. These constraints can be combined 
> with any one type of functional form at a time. 
> 
> Example of convex curve – 
> 
> 〖  Y= c〗^k*pow(a^k,p^k) 
> 
> Example of S-shaped curve – 
> 
> 〖  Y= c〗^k*pow(a^k,p^k )/(b^k+ pow(a^k,p^k )  ) 
> 
> Example of non-linear constraints – 
> 
> Min Bound (50%) < ∑_(k=0)^n▒〖c^k*pow(a^k,p^k)〗 < Max Bound (150%) 
> 
> Example of linear constraints – 
> 
> Min Bound (50%) < a+b+c < Max Bound (150%) 
> 
> 
> At present we are using SAS to solve these business problems. We are looking 
> for SAS replacement software, which can solve similar kind of problems with 
> performance equivalent to SAS. 
> 
> Please share benchmarking of its performance. How it perform as no. of 
> variables keep on increasing. 
> 
> If this feature is not available, do you plan to have it in your roadmap 
> anytime. 
> 
> TIA 
> Jyoti 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Does-SparkR-or-SparkMLib-support-nonlinear-optimization-with-non-linear-constraints-tp28131.html
>  
> 
> To start a new topic under Apache Spark User List, email 
> ml-node+s1001560n1...@n3.nabble.com 
> To unsubscribe from Apache Spark User List, click here 
> .
> NAML 
> 




-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-SparkR-or-SparkMLib-support-nonlinear-optimization-with-non-linear-constraints-tp28131p28133.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: GraphX Connected Components

2016-11-08 Thread Robineast
Have you tried this?
https://spark.apache.org/docs/2.0.1/api/scala/index.html#org.apache.spark.graphx.GraphLoader$



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Connected-Components-tp10869p28049.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: mLIb solving linear regression with sparse inputs

2016-11-06 Thread Robineast
Here’s a way of creating sparse vectors in MLLib:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.rdd.RDD

val rdd = sc.textFile("A.txt").map(line => line.split(",")).
 map(ary => (ary(0).toInt, ary(1).toInt, ary(2).toDouble))

val pairRdd: RDD[(Int, (Int, Int, Double))] = rdd.map(el => (el._1, el))

val create = (first: (Int, Int, Double)) => (Array(first._2), Array(first._3))
val combine = (head: (Array[Int], Array[Double]), tail: (Int, Int, Double)) => 
(head._1 :+ tail._2, head._2 :+ tail._3)
val merge = (a: (Array[Int], Array[Double]), b: (Array[Int], Array[Double])) => 
(a._1 ++ b._1, a._2 ++ b._2)

val A = pairRdd.combineByKey(create,combine,merge).map(el => 
Vectors.sparse(3,el._2._1,el._2._2))

If you have a separate file of b’s then you would need to manipulate this 
slightly to join the b’s to the A RDD and then create LabeledPoints. I guess 
there is a way of doing this using the newer ML interfaces but it’s not 
particularly obvious to me how.

One point: In the example you give the b’s are exactly the same as col 2 in the 
A matrix. I presume this is just a quick hacked together example because that 
would give a trivial result.

---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 






> On 3 Nov 2016, at 18:12, im281 [via Apache Spark User List] 
>  wrote:
> 
> I would like to use it. But how do I do the following 
> 1) Read sparse data (from text or database) 
> 2) pass the sparse data to the linearRegression class? 
> 
> For example: 
> 
> Sparse matrix A 
> row, column, value 
> 0,0,.42 
> 0,1,.28 
> 0,2,.89 
> 1,0,.83 
> 1,1,.34 
> 1,2,.42 
> 2,0,.23 
> 3,0,.42 
> 3,1,.98 
> 3,2,.88 
> 4,0,.23 
> 4,1,.36 
> 4,2,.97 
> 
> Sparse vector b 
> row, column, value 
> 0,2,.89 
> 1,2,.42 
> 3,2,.88 
> 4,2,.97 
> 
> Solve Ax = b??? 
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28008.html
>  
> 
> To start a new topic under Apache Spark User List, email 
> ml-node+s1001560n1...@n3.nabble.com 
> To unsubscribe from Apache Spark User List, click here 
> .
> NAML 
> 




-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28027.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: mLIb solving linear regression with sparse inputs

2016-11-03 Thread Robineast
Any reason why you can’t use built in linear regression e.g. 
http://spark.apache.org/docs/latest/ml-classification-regression.html#regression
 or 
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression?

---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 






> On 3 Nov 2016, at 16:08, im281 [via Apache Spark User List] 
>  wrote:
> 
> I want to solve the linear regression problem using spark with huge 
> martrices: 
> 
> Ax = b 
> using least squares: 
> x = Inverse(A-transpose) * A)*A-transpose *b 
> 
> The A matrix is a large sparse matrix (as is the b vector). 
> 
> I have pondered several solutions to the Ax = b problem including: 
> 
> 1) directly solving the problem above where the matrix is transposed, 
> multiplied by itself, the inverse is taken and then multiplied by A-transpose 
> and then multiplied by b which will give the solution vector x 
> 
> 2) iterative solver (no need to take the inverse) 
> 
> My question is:
> 
> What is the best way to solve this problem using the MLib libraries, in JAVA 
> and using RDD and spark? 
> 
> Is there any code as an example? Has anyone done this? 
> 
> 
> 
> 
> 
> The code to take in data represented as a coordinate matrix and perform 
> transposition and multiplication is shown below but I need to take the 
> inverse if I use this strategy: 
> 
> //Read coordinate matrix from text or database 
> JavaRDD fileA = sc.textFile(file); 
> 
> //map text file with coordinate data (sparse matrix) to 
> JavaRDD
> JavaRDD matrixA = fileA.map(new Function MatrixEntry>() { 
> public MatrixEntry call(String x){ 
> String[] indeceValue = x.split(","); 
> long i = Long.parseLong(indeceValue[0]); 
> long j = Long.parseLong(indeceValue[1]); 
> double value = Double.parseDouble(indeceValue[2]); 
> return new MatrixEntry(i, j, value ); 
> } 
> }); 
> 
> //coordinate matrix from sparse data 
> CoordinateMatrix cooMatrixA = new 
> CoordinateMatrix(matrixA.rdd()); 
> 
> //create block matrix 
> BlockMatrix matA = cooMatrixA.toBlockMatrix(); 
> 
> //create block matrix after matrix multiplication (square 
> matrix) 
> BlockMatrix ata = matA.transpose().multiply(matA); 
> 
> //print out the original dense matrix 
> System.out.println(matA.toLocalMatrix().toString()); 
> 
> //print out the transpose of the dense matrix 
> 
> System.out.println(matA.transpose().toLocalMatrix().toString()); 
> 
> //print out the square matrix (after multiplication) 
> System.out.println(ata.toLocalMatrix().toString()); 
> 
> JavaRDD entries = 
> ata.toCoordinateMatrix().entries().toJavaRDD(); 
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006.html
>  
> 
> To start a new topic under Apache Spark User List, email 
> ml-node+s1001560n1...@n3.nabble.com 
> To unsubscribe from Apache Spark User List, click here 
> .
> NAML 
> 




-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mLIb-solving-linear-regression-with-sparse-inputs-tp28006p28007.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Large-scale matrix inverse in Spark

2016-09-29 Thread Robineast
The paper you mention references a Spark-based LU decomposition approach. AFAIK 
there is no current implementation in Spark but there is a JIRA open 
(https://issues.apache.org/jira/browse/SPARK-8514 
) that covers this - seems to 
have gone quiet though.
---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 






> On 27 Sep 2016, at 03:05, Cooper [via Apache Spark User List] 
>  wrote:
> 
> How is the problem of large-scale matrix inversion approached in Apache Spark 
> ? 
> 
> This linear algebra operation is obviously the very base of a lot of other 
> algorithms (regression, classification, etc). However, I have not been able 
> to find a Spark API on parallel implementation of matrix inversion. Can you 
> please clarify approaching this operation on the Spark internals ? 
> 
> Here  is a paper on 
> the parallelized matrix inversion in Spark, however I am trying to use an 
> existing code instead of implementing one from scratch, if available. 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796.html
>  
> 
> To start a new topic under Apache Spark User List, email 
> ml-node+s1001560n1...@n3.nabble.com 
> To unsubscribe from Apache Spark User List, click here 
> .
> NAML 
> 




-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796p27809.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to modify collection inside a spark rdd foreach

2016-06-06 Thread Robineast
It's not that clear what you are trying to achieve - what type is myRDD and
where do k and v come from?

Anyway it seems you want to end up with a map or a dictionary which is what
PairRDD is for e.g.

val rdd = sc.makeRDD(Array("1","2","3"))
val pairRDD = rdd.map(el => (el.toInt, el))






-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-modify-collection-inside-a-spark-rdd-foreach-tp27088p27095.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Various Apache Spark's deployment problems

2016-04-29 Thread Robineast
Do you need 2 --num-executors ?

Sent from my iPhone

> On 29 Apr 2016, at 20:25, Ashish Sharma [via Apache Spark User List] 
>  wrote:
> 
> Submit Command1: 
> 
> spark-submit --class working.path.to.Main \ 
> --master yarn \ 
> --deploy-mode cluster \ 
> --num-executors 17 \ 
> --executor-cores 8 \ 
> --executor-memory 25g \ 
> --driver-memory 25g \ 
> --num-executors 5 \ 
> application-with-all-dependencies.jar 
> 
> Error Log1: 
> 
> User class threw exception: java.lang.RuntimeException: Unable to 
> instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 
> 
> Submit Command2: 
> 
> spark-submit --class working.path.to.Main \ 
> --master yarn \ 
> --deploy-mode cluster \ 
> --num-executors 17 \ 
> --executor-cores 8 \ 
> --executor-memory 25g \ 
> --driver-memory 25g \ 
> --num-executors 5 \ 
> --files /etc/hive/conf/hive-site.xml \ 
> application-with-all-dependencies.jar 
> 
> Error Log2: 
> 
> User class threw exception: java.lang.NumberFormatException: For 
> input string: "5s" 
> 
> Since I don't have the administrative permissions, I cannot modify the 
> configuration. Well, I can contact to the IT engineer and make the changes, 
> but I'm looking for the 
> solution that involves less changes in the configuration files, if possible! 
> 
> Configuration changes were suggested in here: 
> https://hadoopist.wordpress.com/2016/02/23/how-to-resolve-error-yarn-applicationmaster-user-class-threw-exception-java-lang-runtimeexception-java-lang-numberformatexception-for-input-string-5s-in-spark-submit/
> 
> Then I tried passing various jar files as arguments as suggested in other 
> discussion forums. 
> 
> Submit Command3: 
> 
> spark-submit --class working.path.to.Main \ 
> --master yarn \ 
> --deploy-mode cluster \ 
> --num-executors 17 \ 
> --executor-cores 8 \ 
> --executor-memory 25g \ 
> --driver-memory 25g \ 
> --num-executors 5 \ 
> --jars 
> /usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-rdbms-3.2.9.jar
>  \ 
> --files /etc/hive/conf/hive-site.xml \ 
> application-with-all-dependencies.jar 
> 
> Error Log3: 
> 
> User class threw exception: java.lang.NumberFormatException: For 
> input string: "5s" 
> 
> I didn't understood what happened with the following command and couldn't 
> analyze the error log. 
> 
> Submit Command4: 
> 
> spark-submit --class working.path.to.Main \ 
> --master yarn \ 
> --deploy-mode cluster \ 
> --num-executors 17 \ 
> --executor-cores 8 \ 
> --executor-memory 25g \ 
> --driver-memory 25g \ 
> --num-executors 5 \ 
> --jars /usr/hdp/2.3.0.0-2557/spark/lib/*.jar \ 
> --files /etc/hive/conf/hive-site.xml \ 
> application-with-all-dependencies.jar 
> 
> Submit Log4: 
> 
> Application application_1461686223085_0014 failed 2 times due to AM 
> Container for appattempt_1461686223085_0014_02 exited with exitCode: 10 
> For more detailed output, check application tracking page: href="http://cluster-host:/cluster/app/application_1461686223085_0014Then;>http://cluster-host:/cluster/app/application_1461686223085_0014Then,
>  click on links to logs of each attempt. 
> Diagnostics: Exception from container-launch. 
> Container id: container_e10_1461686223085_0014_02_01 
> Exit code: 10 
> Stack trace: ExitCodeException exitCode=10: 
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) 
> at org.apache.hadoop.util.Shell.run(Shell.java:456) 
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> 

Re: How to use graphx to partition a graph which could assign topologically-close vertices on a same machine?

2016-03-09 Thread Robineast
In GraphX partitioning relates to edges not to vertices - vertices are
partitioned however the RDD that was used to create the graph was
partitioned.



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-graphx-to-partition-a-graph-which-could-assign-topologically-close-vertices-on-a-same-mac-tp26443p26444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark standalone peer2peer network

2016-02-23 Thread Robineast
Hi Thomas

I can confirm that I have had this working in the past. I'm pretty sure you
don't need password-less SSH for running a standalone cluster manually. Try
running the instructions at
http://spark.apache.org/docs/latest/spark-standalone.html for Starting a
Cluster manually.

do you get the master running and are you able to log in to the web ui?
Get the spark://:7077 url and start a slave on the same machine as the
master. Do you see the slave appear in the master web ui? If so can you run
spark-shell by connecting to the master?

Now start slave on another machine. Do you see the new slave in the master
web ui?





-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-peer2peer-network-tp26308p26309.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Where to implement synchronization is GraphX Pregel API

2015-12-07 Thread Robineast
Not sure exactly what your asking but:

1) if you are asking do you need to implement synchronisation code - no that
is built into the call to Pregel
2) if you are asking how is synchronisation implemented in GraphX - the
superstep starts and ends with the beginning and end of a while loop in the
Pregel implementation code (see
http://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api
for pseudo-code or Pregel.scala in the source). This code will run in the
driver and orchestrates the receipt of messages, vertex update program and
send messages. All you need to do is supply the Merge message, vertex update
and the send message functions to the Pregel method. Since GraphX objects
are backed by RDDs and RDDs provided distributed processing you get
synchronous distributed processing.



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Where-to-implement-synchronization-is-GraphX-Pregel-API-tp25612p25622.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Failing to execute Pregel shortest path on 22k nodes

2015-12-01 Thread Robineast
1. The for loop is executed in your driver program so will send each Pregel
request serially to be executed on the cluster
2. Whilst caching/persisting may improve the runtime it shouldn't affect the
memory bounds - if you ask to cache more than is available then cached RDDs
will be dropped out of the cache. How are you running the program? via
spark-submit - if so what parameters are you using?




-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Failing-to-execute-Pregel-shortest-path-on-22k-nodes-tp25528p25531.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: GraphX - How to make a directed graph an undirected graph?

2015-11-26 Thread Robineast
1. GraphX doesn't have a concept of undirected graphs, Edges are always
specified with a srcId and dstId. However there is nothing to stop you
adding in edges that point in the other direction i.e. if you have an edge
with srcId -> dstId you can add an edge dstId -> srcId

2. In general APIs will return a single Graph object even if the resulting
graph is partitioned. You should read the API docs for the specifics though



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-to-make-a-directed-graph-an-undirected-graph-tp25495p25499.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unable to build Spark 1.5, is build broken or can anyone successfully build?

2015-10-23 Thread Robineast
Both Spark 1.5 and 1.5.1 are released so it certainly shouldn't be a problem



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-build-Spark-1-5-is-build-broken-or-can-anyone-successfully-build-tp24513p25181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Install via directions in "Learning Spark". Exception when running bin/pyspark

2015-10-13 Thread Robineast
What you have done should work.

A couple of things to try:

1) you should have a lib directory in your Spark deployment, it should have
a jar file called lib/spark-assembly-1.5.1-hadoop2.6.0.jar. Is it there?
2) Have you set the JAVA_HOME variable to point to your java8 deployment? If
not try doing that.

Robin



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Install-via-directions-in-Learning-Spark-Exception-when-running-bin-pyspark-tp25043p25048.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Constant Spark execution time with different # of slaves

2015-10-10 Thread Robineast
Do you have enough partitions of your RDDs to spread across all your
processing cores? Are all executors actually processing tasks?



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Constant-Spark-execution-time-with-different-of-slaves-tp24735p25009.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark GraphaX

2015-10-10 Thread Robineast
Well it depends on exactly what algorithms are involved in Network Root Cause
analysis (not something I'm familiar with). GraphX provides a number of out
of the box algorithms like PageRank, connected components, strongly
connected components, label propagation as well as an implementation of the
Pregel framework that allows many algorithms to be quickly and succinctly
specified for parallel machine execution. 



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-GraphaX-tp24408p25011.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Checkpointing in Iterative Graph Computation

2015-10-10 Thread Robineast
One other thought - you need to call SparkContext.setCheckpointDir otherwise
nothing will happen



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Checkpointing-in-Iterative-Graph-Computation-tp24443p25013.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Checkpointing in Iterative Graph Computation

2015-10-10 Thread Robineast
You need to checkpoint before you materialize. You'll find you probably only
want to checkpoint every 100 or so iterations otherwise the checkpointing
will slow down your application excessively



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Checkpointing-in-Iterative-Graph-Computation-tp24443p25012.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Robineast
GraphX doesn't implement Tinkerpop functionality but there is an external
effort to provide an implementation. See
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4279



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926p24941.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Robineast
GraphX has a Shortest Paths algorithm implementation which will tell you, for
all vertices in the graph, the shortest distance to a specific ('landmark')
vertex. The returned value is '/a graph where each vertex attribute is a map
containing the shortest-path distance to each reachable landmark vertex/'.
If there is no path to the landmark vertex then the map for the source
vertex is empty



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926p24930.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Standalone Scala Project

2015-10-01 Thread Robineast
I've eyeballed the sbt file and it look ok to me

Try 

sbt clean package

that should sort it out. If not please supply the full code you are running



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Standalone-Scala-Project-tp24892p24904.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to find how much data will be train in mllib or how much the spark job is completed ?

2015-09-29 Thread Robineast
This page gives details on the monitoring available
http://spark.apache.org/docs/latest/monitoring.html. You can get a UI
showing Jobs, Stages and Tasks with an indication how far completed the job
is. The UI is usually on port 4040 of the machine where you run the spark
driver program.

The monitoring page also provides details of a REST API for monitoring the
same values



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-find-how-much-data-will-be-train-in-mllib-or-how-much-the-spark-job-is-completed-tp24858p24859.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to find how much data will be train in mllib or how much the spark job is completed ?

2015-09-29 Thread Robineast
so you could query the rest api in code. E.g. /applications//stages
provides details on the number of active and completed tasks in each stage



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-find-how-much-data-will-be-train-in-mllib-or-how-much-the-spark-job-is-completed-tp24858p24871.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark mailing list confusion

2015-09-29 Thread Robineast
Does anyone have any idea why some topics on the mailing list end up on
https://www.mail-archive.com/user@spark.apache.org e.g.  this message thread
  , but
not on http://apache-spark-user-list.1001560.n3.nabble.com ? 

Whilst I get notified of all messages when I reply via email they never
appear in either of the archives (I can use the web interface for nabble but
not for mail-archive.



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-mailing-list-confusion-tp24870.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Distance metrics in KMeans

2015-09-26 Thread Robineast
There is a Spark Package that gives some alternative distance metrics,
http://spark-packages.org/package/derrickburns/generalized-kmeans-clustering.
Not used it myself.



-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Distance-metrics-in-KMeans-tp24823p24829.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Robineast
Vertices that aren't connected to anything are perfectly valid e.g.

import org.apache.spark.graphx._

val vertices = sc.makeRDD(Seq((1L,1),(2L,1),(3L,1)))
val edges = sc.makeRDD(Seq(Edge(1L,2L,1)))

val g = Graph(vertices, edges)
g.vertices.count

gives 3

Not sure why vertices appear to be dropping off. Could you show your full
code.

g.degrees.count gives 2 - as the scaladocs mention 'The degree of each
vertex in the graph. @note Vertices with no edges are not returned in the
resulting RDD'






-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24831.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Calling a method parallel

2015-09-23 Thread Robineast
The following should give you what you need:

val results = sc.makeRDD(1 to n).map(X(_)).collect

This should return the results as an array. 

_
Robin East
Spark GraphX in Action - Michael Malak and Robin East
Manning Publications
http://manning.com/books/spark-graphx-in-action



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Calling-a-method-parallel-tp24786p24790.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: why when I double the number of workers, ml LogisticRegression fitting time is not reduced in half?

2015-09-16 Thread Robineast
In principle yes, however it depends on whether your application is actually
utilising the extra resources. Use the Task metrics available in the
application UI (usually available from the driver machine on port 4040) to
find out.

--
Robin East
Spark GraphX in Action - Michael S Malak and Robin East
Manning Publications
http://www.manning.com/books/spark-graphx-in-action




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/why-when-I-double-the-number-of-workers-ml-LogisticRegression-fitting-time-is-not-reduced-in-half-tp24714p24718.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Question about Google Books Ngrams with pyspark (1.4.1)

2015-09-01 Thread Robineast
Do you have LZO configured? see
http://stackoverflow.com/questions/14808041/how-to-have-lzo-compression-in-hadoop-mapreduce

---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/malak/



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Question-about-Google-Books-Ngrams-with-pyspark-1-4-1-tp24542p24544.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Graphx CompactBuffer help

2015-08-28 Thread Robineast
my previous reply got mangled
This should work:

coon.filter(x = x.exists(el = Seq(1,15).contains(el)))

CompactBuffer is a specialised form of a Scala Iterator

---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/malak/



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-CompactBuffer-help-tp24481p24490.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark GraphaX

2015-08-23 Thread Robineast
GrapX is a graph analytics engine rather than a graph database. It's typical
use case is running large-scale graph algorithms like page rank , connected
components, label propagation and so on. It can be an element of complex
processing pipelines that involve other Spark components such as Data
Frames, machine learning and Spark Streaming.

If you need to store, update and query graph structures you might be better
served looking at Neo4j or Titan. If you still need the analytics capability
you can integrate Spark with the database.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-GraphaX-tp24408p24411.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: what determine the task size?

2015-08-21 Thread Robineast
The OP wants to understand what determines the size of the task code that is 
shipped to each executor so it can run the task. I don't know the answer to but 
would be interested to know too.

Sent from my iPhone

 On 21 Aug 2015, at 08:26, oubrik [via Apache Spark User List] 
 ml-node+s1001560n24380...@n3.nabble.com wrote:
 
 Hi 
 You mean user code ? 
 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://apache-spark-user-list.1001560.n3.nabble.com/what-determine-the-task-size-tp24363p24380.html
 To start a new topic under Apache Spark User List, email 
 ml-node+s1001560n1...@n3.nabble.com 
 To unsubscribe from Apache Spark User List, click here.
 NAML




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/what-determine-the-task-size-tp24363p24384.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Saving and loading MLlib models as standalone (no Hadoop)

2015-08-20 Thread Robineast
You can't serialize models out of Spark and then use them outside of the
Spark context. However there is support for the PMML format - have a look at
https://spark.apache.org/docs/latest/mllib-pmml-model-export.html

Robin
---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/malak/



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-and-loading-MLlib-models-as-standalone-no-Hadoop-tp24216p24371.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: how to write any data (non RDD) to a file inside closure?

2015-08-18 Thread Robineast
Still not sure what you are trying to achieve. If you could post some code that 
doesn’t work the community can help you understand where the error (syntactic 
or conceptual) is.
 On 17 Aug 2015, at 17:42, dianweih001 [via Apache Spark User List] 
 ml-node+s1001560n24299...@n3.nabble.com wrote:
 
 Hi Robin, 
 
 I know how to write/read file outside of RDDs and executor closure. Just not 
 sure how to write data to file inside  closure because within closure we have 
 to define RDDs which will introduce SparkContext error sometimes. 
 
 Thank you for your reply. 
 
 Dianwei 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-write-any-data-non-RDD-to-a-file-inside-closure-tp24243p24299.html
  
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-write-any-data-non-RDD-to-a-file-inside-closure-tp24243p24299.html
 To start a new topic under Apache Spark User List, email 
 ml-node+s1001560n1...@n3.nabble.com 
 To unsubscribe from Apache Spark User List, click here 
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=Um9iaW4uZWFzdEB4ZW5zZS5jby51a3wxfDIzMzQzMDUyNg==.
 NAML 
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-write-any-data-non-RDD-to-a-file-inside-closure-tp24243p24315.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkR -Graphx Connected components

2015-08-11 Thread Robineast
To be part of a strongly connected component every vertex must be reachable
from every other vertex. Vertex 6 is not reachable from the other components
of scc 0. Same goes for 7. So both 6 and 7 form their own strongly connected
components. 6 and 7 are part of the connected components of 0 and 3
respectively.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Graphx-Connected-components-tp24165p24209.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SparkR -Graphx Connected components

2015-08-07 Thread Robineast
Hi

The graph returned by SCC (strong_graphs in your code) has vertex data where
each vertex in a component is assigned the lowest vertex id of the
component. So if you have 6 vertices (1 to 6) and 2 strongly connected
components (1 and 3, and 2,4,5 and 6) then the strongly connected components
are 1 and 2 (the lowest vertices in each component). So vertices 1 and 3
will have vertex data = 1 and vertices 2,4,5 and 6 will have vertex data 2.

Robin
---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/malak/ http://www.manning.com/malak/  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Graphx-Connected-components-tp24165p24166.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Scala problem when using g.vertices.map not a member of type parameter

2015-06-29 Thread Robineast
I can't see an obvious problem. Could you post the full minimal code that
reproduces the problem? Also why version of Spark and Scala are you using?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-problem-when-using-g-vertices-map-not-a-member-of-type-parameter-tp23515p23528.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: java.lang.UnsupportedOperationException: empty collection

2015-04-28 Thread Robineast
I've tried running your code through spark-shell on both 1.3.0 (pre-built for
Hadoop 2.4 and above) and a recently built snapshot of master. Both work
fine. Running on OS X yosemite. What's your configuration?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-UnsupportedOperationException-empty-collection-tp22677p22686.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: DAG info

2015-01-02 Thread Robineast
Do you have some example code of what you are trying to do?

Robin



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940p20941.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org