Re: Correct way of setting executor numbers and executor cores in Spark 1.6.1 for non-clustered mode ?

2016-05-07 Thread kmurph
Hi Simon, Thanks. I did actually have "SPARK_WORKER_CORES=8" in spark-env.sh - its commented as 'to set the number of cores to use on this machine'. Not sure how this would interplay with SPARK_EXECUTOR_INSTANCES and SPARK_EXECUTOR_CORES, but I removed it and still see no scaleup with increasing

Correct way of setting executor numbers and executor cores in Spark 1.6.1 for non-clustered mode ?

2016-05-07 Thread kmurph
Hi, I'm running spark 1.6.1 on a single machine, initially a small one (8 cores, 16GB ram) using "--master local[*]" to spark-submit and I'm trying to see scaling with increasing cores, unsuccessfully. Initially I'm setting SPARK_EXECUTOR_INSTANCES=1, and increasing cores for each executor.

Spark MLLib benchmarks

2016-05-04 Thread kmurph
Hi, I'm benchmarking Spark(1.6) and MLLib TF-IDF (with hdfs) on a 20GB dataset, and not seeing much scale-up when I increase cores/executors/RAM according to Spark tuning documentation. I suspect I'm missing a trick in my configuration. I'm running on shared memory (96 cores, 256GB RAM) and

Re: Pagerank implementation

2014-12-15 Thread kmurph
Hiya, I too am looking for a PageRank solution in GraphX where the probabilities sum to 1. I tried a few modifications, including division by the total number of vertices in the first part of the equation, as well as trying to return full rank instead of delta (though not correctly as evident

Re: Spark 1.1.1, Hadoop 2.6 - Protobuf conflict

2014-12-12 Thread kmurph
I had this problem also with spark 1.1.1. At the time I was using hadoop 0.20. To get around it I installed hadoop 2.5.2, and set the protobuf.version to 2.5.0 in the build command like so: mvn -Phadoop-2.5 -Dhadoop.version=2.5.2 -Dprotobuf.version=2.5.0 -DskipTests clean package So I