The usage of OpenBLAS

2015-06-26 Thread Tsai Li Ming
Hi, I found out that the instructions for OpenBLAS has been changed by the author of netlib-java in: https://github.com/apache/spark/pull/4448 since Spark 1.3.0 In that PR, I asked whether there’s still a need to compile OpenBLAS with USE_THREAD=0, and also about Intel MKL. Is it still

Issues building 1.4.0 using make-distribution

2015-06-17 Thread Tsai Li Ming
Hi, I downloaded the source from Downloads page and ran the make-distribution.sh script. # ./make-distribution.sh --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package The script has “-x” set in the beginning. ++ /tmp/a/spark-1.4.0/build/mvn help:evaluate

Documentation for external shuffle service in 1.4.0

2015-06-17 Thread Tsai Li Ming
Hi, I can’t seem to find any documentation on this feature in 1.4.0? Regards, Liming - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Not getting event logs = spark 1.3.1

2015-06-16 Thread Tsai Li Ming
Forgot to mention this is on standalone mode. Is my configuration wrong? Thanks, Liming On 15 Jun, 2015, at 11:26 pm, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir

Not getting event logs = spark 1.3.1

2015-06-15 Thread Tsai Li Ming
Hi, I have this in my spark-defaults.conf (same for hdfs): spark.eventLog.enabled true spark.eventLog.dir file:/tmp/spark-events spark.history.fs.logDirectory file:/tmp/spark-events While the app is running, there is a “.inprogress” directory. However when the job

Re: Logstash as a source?

2015-02-01 Thread Tsai Li Ming
I have been using a logstash alternative - fluentd to ingest the data into hdfs. I had to configure fluentd to not append the data so that spark streaming will be able to pick up the new logs. -Liming On 2 Feb, 2015, at 6:05 am, NORD SC jan.algermis...@nordsc.com wrote: Hi, I plan to

Re: Confused why I'm losing workers/executors when writing a large file to S3

2015-01-21 Thread Tsai Li Ming
I’m getting the same issue on Spark 1.2.0. Despite having set “spark.core.connection.ack.wait.timeout” in spark-defaults.conf and verified in the job UI (port 4040) environment tab, I still get the “no heartbeat in 60 seconds” error. spark.core.connection.ack.wait.timeout=3600 15/01/22

Understanding stages in WebUI

2014-11-25 Thread Tsai Li Ming
Hi, I have the classic word count example: file.flatMap(line = line.split( )).map(word = (word,1)).reduceByKey(_ + _).collect() From the Job UI, I can only see 2 stages: 0-collect and 1-map. What happened to ShuffledRDD in reduceByKey? And both flatMap and map operations is collapsed into a

RDD memory and storage level option

2014-11-20 Thread Tsai Li Ming
Hi, This is on version 1.1.0. I’m did a simple test on MEMORY_AND_DISK storage level. var file = sc.textFile(“file:///path/to/file.txt”).persit(StorageLevel.MEMORY_AND_DISK) file.count() The file is 1.5GB and there is only 1 worker. I have requested for 1GB of worker memory per node:

Re: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-09-12 Thread Tsai Li Ming
Another observation I had was reading over local filesystem with “file://“. it was stated as PROCESS_LOCAL which was confusing. Regards, Liming On 13 Sep, 2014, at 3:12 am, Nicholas Chammas nicholas.cham...@gmail.com wrote: Andrew, This email was pretty helpful. I feel like this stuff

Re: Hadoop LR comparison

2014-04-01 Thread Tsai Li Ming
://alpinenow.com/ On Mon, Mar 31, 2014 at 11:38 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, Is the code available for Hadoop to calculate the Logistic Regression hyperplane? I’m looking at the Examples: http://spark.apache.org/examples.html, where there is the 110s vs 0.9s

Re: Configuring shuffle write directory

2014-03-28 Thread Tsai Li Ming
tell, spark.local.dir should *not* be set there, so workers should get it from their spark-env.sh. It’s true that if you set spark.local.dir in the driver it would pass that on to the workers for that job. Matei On Mar 27, 2014, at 9:57 PM, Tsai Li Ming mailingl...@ltsai.com wrote

Re: Configuring shuffle write directory

2014-03-27 Thread Tsai Li Ming
Anyone can help? How can I configure a different spark.local.dir for each executor? On 23 Mar, 2014, at 12:11 am, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, Each of my worker node has its own unique spark.local.dir. However, when I run spark-shell, the shuffle writes are always

Setting SPARK_MEM higher than available memory in driver

2014-03-27 Thread Tsai Li Ming
Hi, My worker nodes have more memory than the host that I’m submitting my driver program, but it seems that SPARK_MEM is also setting the Xmx of the spark shell? $ SPARK_MEM=100g MASTER=spark://XXX:7077 bin/spark-shell Java HotSpot(TM) 64-Bit Server VM warning: INFO:

Re: Kmeans example reduceByKey slow

2014-03-24 Thread Tsai Li Ming
On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, At the reduceBuyKey stage, it takes a few minutes before the tasks start working. I have -Dspark.default.parallelism=127 cores (n-1). CPU/Network/IO is idling across all nodes when this is happening

Re: Kmeans example reduceByKey slow

2014-03-24 Thread Tsai Li Ming
:53 PM, Tsai Li Ming mailingl...@ltsai.com wrote: Hi, This is on a 4 nodes cluster each with 32 cores/256GB Ram. (0.9.0) is deployed in a stand alone mode. Each worker is configured with 192GB. Spark executor memory is also 192GB. This is on the first iteration. K=50. Here's

Configuring shuffle write directory

2014-03-22 Thread Tsai Li Ming
Hi, Each of my worker node has its own unique spark.local.dir. However, when I run spark-shell, the shuffle writes are always written to /tmp despite being set when the worker node is started. By specifying the spark.local.dir for the driver program, it seems to override the executor? Is

Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming
Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming
spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) Do you mean the worker nodes? Don’t think they are jetty connectors and the directories are empty: