Driver OOM while using reduceByKey

2014-05-29 Thread haitao .yao
Hi, I used 1g memory for the driver java process and got OOM error on driver side before reduceByKey. After analyzed the heap dump, the biggest object is org.apache.spark.MapStatus, which occupied over 900MB memory. Here's my question: 1. Is there any optimization switches that I can tune

Re: Driver OOM while using reduceByKey

2014-05-29 Thread haitao .yao
reduceByKey(_ + _, 100) to use only 100 tasks). Matei On May 29, 2014, at 2:03 AM, haitao .yao yao.e...@gmail.com wrote: Hi, I used 1g memory for the driver java process and got OOM error on driver side before reduceByKey. After analyzed the heap dump, the biggest object

spark_ec2.py for AWS region: cn-north-1, China

2014-11-04 Thread haitao .yao
Hi, Amazon aws started to provide service for China mainland, the region name is cn-north-1. But the script spark provides: spark_ec2.py will query ami id from https://github.com/mesos/spark-ec2/tree/v4/ami-list and there's no ami information for cn-north-1 region . Can anybody update the

Re: spark_ec2.py for AWS region: cn-north-1, China

2014-11-04 Thread haitao .yao
://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html cn-north-1 is not a supported region for EC2, as far as I can tell. There may be other AWS services that can use that region, but spark-ec2 relies on EC2. Nick On Tue, Nov 4, 2014 at 8:09 PM, haitao .yao yao.e

Re: spark_ec2.py for AWS region: cn-north-1, China

2014-11-04 Thread haitao .yao
/jira/secure/Dashboard.jspa to track this request? I can do it if you've never opened a JIRA issue before. Nick On Tue, Nov 4, 2014 at 9:03 PM, haitao .yao yao.e...@gmail.com wrote: I'm afraid not. We have been using EC2 instances in cn-north-1 region for a while. And the latest version of boto

add support for separate GC log files for different executor

2014-11-05 Thread haitao .yao
Hey, guys. Here's my problem: While using the standalone mode, I always use the following args for executor: -XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc -Xloggc:/tmp/spark.executor.gc.log ​ But as we know, hotspot JVM does not support variable substitution on -Xloggc parameter, which

unsubscribe

2020-03-07 Thread haitao .yao
unsubscribe -- haitao.yao