Re: Spark ec2 cluster lost worker

2015-06-24 Thread Anny Chen
Hi Jonathan, Thanks for this information! I will take a look into it. However is there a way to reconnect the lost node? Or there's no way that I could do to find back the lost worker? Thanks! Anny On Wed, Jun 24, 2015 at 6:06 PM, Kelly, Jonathan jonat...@amazon.com wrote: Just curious, would

Re: Array[T].distinct doesn't work inside RDD

2015-04-07 Thread Anny Chen
Hi Sean, I didn't override hasCode. But the problem is that Array[T].toSet could work but Array[T].distinct couldn't. If it is because I didn't override hasCode, then toSet shouldn't work either right? I also tried using this Array[T].distinct outside RDD, and it is working alright also,

Re: How to configure SparkUI to use internal ec2 ip

2015-03-31 Thread Anny Chen
Hi Akhil, I tried editing the /etc/hosts on the master and on the workers, and seems it is not working for me. I tried adding hostname internal-ip and it didn't work. I then tried adding internal-ip hostname and it didn't work either. I guess I should also edit the spark-env.sh file? Thanks!

Re: How to configure SparkUI to use internal ec2 ip

2015-03-31 Thread Anny Chen
it work. Thanks! Anny On Tue, Mar 31, 2015 at 10:26 AM, Petar Zecevic petar.zece...@gmail.com wrote: Did you try setting the SPARK_MASTER_IP parameter in spark-env.sh? On 31.3.2015. 19:19, Anny Chen wrote: Hi Akhil, I tried editing the /etc/hosts on the master and on the workers

Re: output worker stdout to one place

2015-02-20 Thread Anny Chen
Thanks Marcelo! I will try to change the log4j.properties On Fri, Feb 20, 2015 at 11:37 AM, Marcelo Vanzin van...@cloudera.com wrote: Hi Anny, You could play with creating your own log4j.properties that will write the output somewhere else (e.g. to some remote mount, or remote syslog).

Re: How to output to S3 and keep the order

2015-01-20 Thread Anny Chen
Thanks Aniket! It is working now. Anny On Mon, Jan 19, 2015 at 5:56 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: When you repartiton, ordering can get lost. You would need to sort after repartitioning. Aniket On Tue, Jan 20, 2015, 7:08 AM anny9699 anny9...@gmail.com wrote:

Re: worker_instances vs worker_cores

2014-10-20 Thread Anny Chen
Thanks a lot Andrew! Yeah I actually realized that later. I made a silly mistake here. On Mon, Oct 20, 2014 at 6:03 PM, Andrew Ash and...@andrewash.com wrote: Hi Anny, SPARK_WORKER_INSTANCES is the number of copies of spark workers running on a single box. If you change the number you change

Re: Spark output to s3 extremely slow

2014-10-16 Thread Anny Chen
Hi Rafal, Thanks for the explanation and solution! I need to write maybe 100 GB to s3. I will try your way and see whether it works for me. Thanks again! On Wed, Oct 15, 2014 at 1:44 AM, Rafal Kwasny m...@entropy.be wrote: Hi, How large is the dataset you're saving into S3? Actually saving