Re: spark.shuffle.consolidateFiles seems not working
I see. I'll try spark 1.1. On Fri, Aug 1, 2014 at 9:58 AM, Aaron Davidson wrote: > Make sure to set it before you start your SparkContext -- it cannot be > changed afterwards. Be warned that there are some known issues with shuffle > file consolidation, which should be fixed in 1.1. > > > On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang > wrote: > >> I got the number from the Hadoop admin. It's 1M actually. I suspect the >> consolidation didn't work as expected? Any other reason? >> >> >> On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai >> wrote: >> >>> I don’t think it’s a bug of consolidated shuffle, it’s a Linux >>> configuration problem. The default open files in Linux is 1024, while your >>> open file is larger than 1024 you will get the error as you mentioned >>> below. So you can set the open file numbers to a large one by: ulimit –n >>> xxx or write into /etc/security/limits.conf in Ubuntu. >>> >>> >>> >>> Shuffle consolidation can reduce the total shuffle file numbers, but the >>> concurrent opened file number is the same as basic hash-based shuffle. >>> >>> >>> >>> Thanks >>> >>> Jerry >>> >>> >>> >>> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] >>> *Sent:* Thursday, July 31, 2014 10:34 AM >>> *To:* user@spark.apache.org >>> *Cc:* xia...@sjtu.edu.cn >>> *Subject:* Re: spark.shuffle.consolidateFiles seems not working >>> >>> >>> >>> Ok... but my question is why spark.shuffle.consolidateFiles is working >>> (or is it)? Is this a bug? >>> >>> >>> >>> On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao wrote: >>> >>> Hi Jianshi, >>> >>> I've met similar situation before. >>> And my solution was 'ulimit', you can use >>> >>> -a to see your current settings >>> -n to set open files limit >>> (and other limits also) >>> >>> And I set -n to 10240. >>> >>> I see spark.shuffle.consolidateFiles helps by reusing open files. >>> (so I don't know to what extend does it help) >>> >>> Hope it helps. >>> >>> Larry >>> >>> >>> >>> On 7/30/14, 4:01 PM, Jianshi Huang wrote: >>> >>> I'm using Spark 1.0.1 on Yarn-Client mode. >>> >>> SortByKey always reports a FileNotFoundExceptions with messages says >>> "too many open files". >>> >>> I already set spark.shuffle.consolidateFiles to true: >>> >>> conf.set("spark.shuffle.consolidateFiles", "true") >>> >>> But it seems not working. What are the other possible reasons? How to >>> fix it? >>> >>> Jianshi >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
Re: spark.shuffle.consolidateFiles seems not working
Make sure to set it before you start your SparkContext -- it cannot be changed afterwards. Be warned that there are some known issues with shuffle file consolidation, which should be fixed in 1.1. On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang wrote: > I got the number from the Hadoop admin. It's 1M actually. I suspect the > consolidation didn't work as expected? Any other reason? > > > On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai > wrote: > >> I don’t think it’s a bug of consolidated shuffle, it’s a Linux >> configuration problem. The default open files in Linux is 1024, while your >> open file is larger than 1024 you will get the error as you mentioned >> below. So you can set the open file numbers to a large one by: ulimit –n >> xxx or write into /etc/security/limits.conf in Ubuntu. >> >> >> >> Shuffle consolidation can reduce the total shuffle file numbers, but the >> concurrent opened file number is the same as basic hash-based shuffle. >> >> >> >> Thanks >> >> Jerry >> >> >> >> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] >> *Sent:* Thursday, July 31, 2014 10:34 AM >> *To:* user@spark.apache.org >> *Cc:* xia...@sjtu.edu.cn >> *Subject:* Re: spark.shuffle.consolidateFiles seems not working >> >> >> >> Ok... but my question is why spark.shuffle.consolidateFiles is working >> (or is it)? Is this a bug? >> >> >> >> On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao wrote: >> >> Hi Jianshi, >> >> I've met similar situation before. >> And my solution was 'ulimit', you can use >> >> -a to see your current settings >> -n to set open files limit >> (and other limits also) >> >> And I set -n to 10240. >> >> I see spark.shuffle.consolidateFiles helps by reusing open files. >> (so I don't know to what extend does it help) >> >> Hope it helps. >> >> Larry >> >> >> >> On 7/30/14, 4:01 PM, Jianshi Huang wrote: >> >> I'm using Spark 1.0.1 on Yarn-Client mode. >> >> SortByKey always reports a FileNotFoundExceptions with messages says "too >> many open files". >> >> I already set spark.shuffle.consolidateFiles to true: >> >> conf.set("spark.shuffle.consolidateFiles", "true") >> >> But it seems not working. What are the other possible reasons? How to fix >> it? >> >> Jianshi >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> >> >> >> >> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >
Re: spark.shuffle.consolidateFiles seems not working
I got the number from the Hadoop admin. It's 1M actually. I suspect the consolidation didn't work as expected? Any other reason? On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai wrote: > I don’t think it’s a bug of consolidated shuffle, it’s a Linux > configuration problem. The default open files in Linux is 1024, while your > open file is larger than 1024 you will get the error as you mentioned > below. So you can set the open file numbers to a large one by: ulimit –n > xxx or write into /etc/security/limits.conf in Ubuntu. > > > > Shuffle consolidation can reduce the total shuffle file numbers, but the > concurrent opened file number is the same as basic hash-based shuffle. > > > > Thanks > > Jerry > > > > *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] > *Sent:* Thursday, July 31, 2014 10:34 AM > *To:* user@spark.apache.org > *Cc:* xia...@sjtu.edu.cn > *Subject:* Re: spark.shuffle.consolidateFiles seems not working > > > > Ok... but my question is why spark.shuffle.consolidateFiles is working > (or is it)? Is this a bug? > > > > On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao wrote: > > Hi Jianshi, > > I've met similar situation before. > And my solution was 'ulimit', you can use > > -a to see your current settings > -n to set open files limit > (and other limits also) > > And I set -n to 10240. > > I see spark.shuffle.consolidateFiles helps by reusing open files. > (so I don't know to what extend does it help) > > Hope it helps. > > Larry > > > > On 7/30/14, 4:01 PM, Jianshi Huang wrote: > > I'm using Spark 1.0.1 on Yarn-Client mode. > > SortByKey always reports a FileNotFoundExceptions with messages says "too > many open files". > > I already set spark.shuffle.consolidateFiles to true: > > conf.set("spark.shuffle.consolidateFiles", "true") > > But it seems not working. What are the other possible reasons? How to fix > it? > > Jianshi > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > > > > > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
RE: spark.shuffle.consolidateFiles seems not working
I don’t think it’s a bug of consolidated shuffle, it’s a Linux configuration problem. The default open files in Linux is 1024, while your open file is larger than 1024 you will get the error as you mentioned below. So you can set the open file numbers to a large one by: ulimit –n xxx or write into /etc/security/limits.conf in Ubuntu. Shuffle consolidation can reduce the total shuffle file numbers, but the concurrent opened file number is the same as basic hash-based shuffle. Thanks Jerry From: Jianshi Huang [mailto:jianshi.hu...@gmail.com] Sent: Thursday, July 31, 2014 10:34 AM To: user@spark.apache.org Cc: xia...@sjtu.edu.cn Subject: Re: spark.shuffle.consolidateFiles seems not working Ok... but my question is why spark.shuffle.consolidateFiles is working (or is it)? Is this a bug? On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao mailto:xia...@sjtu.edu.cn>> wrote: Hi Jianshi, I've met similar situation before. And my solution was 'ulimit', you can use -a to see your current settings -n to set open files limit (and other limits also) And I set -n to 10240. I see spark.shuffle.consolidateFiles helps by reusing open files. (so I don't know to what extend does it help) Hope it helps. Larry On 7/30/14, 4:01 PM, Jianshi Huang wrote: I'm using Spark 1.0.1 on Yarn-Client mode. SortByKey always reports a FileNotFoundExceptions with messages says "too many open files". I already set spark.shuffle.consolidateFiles to true: conf.set("spark.shuffle.consolidateFiles", "true") But it seems not working. What are the other possible reasons? How to fix it? Jianshi -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
Re: spark.shuffle.consolidateFiles seems not working
Ok... but my question is why spark.shuffle.consolidateFiles is working (or is it)? Is this a bug? On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao wrote: > Hi Jianshi, > > I've met similar situation before. > And my solution was 'ulimit', you can use > > -a to see your current settings > -n to set open files limit > (and other limits also) > > And I set -n to 10240. > > I see spark.shuffle.consolidateFiles helps by reusing open files. > (so I don't know to what extend does it help) > > Hope it helps. > > Larry > > > On 7/30/14, 4:01 PM, Jianshi Huang wrote: > >> I'm using Spark 1.0.1 on Yarn-Client mode. >> >> SortByKey always reports a FileNotFoundExceptions with messages says "too >> many open files". >> >> I already set spark.shuffle.consolidateFiles to true: >> >> conf.set("spark.shuffle.consolidateFiles", "true") >> >> But it seems not working. What are the other possible reasons? How to fix >> it? >> >> Jianshi >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
Re: spark.shuffle.consolidateFiles seems not working
Hi Jianshi, I've met similar situation before. And my solution was 'ulimit', you can use -a to see your current settings -n to set open files limit (and other limits also) And I set -n to 10240. I see spark.shuffle.consolidateFiles helps by reusing open files. (so I don't know to what extend does it help) Hope it helps. Larry On 7/30/14, 4:01 PM, Jianshi Huang wrote: I'm using Spark 1.0.1 on Yarn-Client mode. SortByKey always reports a FileNotFoundExceptions with messages says "too many open files". I already set spark.shuffle.consolidateFiles to true: conf.set("spark.shuffle.consolidateFiles", "true") But it seems not working. What are the other possible reasons? How to fix it? Jianshi -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
spark.shuffle.consolidateFiles seems not working
I'm using Spark 1.0.1 on Yarn-Client mode. SortByKey always reports a FileNotFoundExceptions with messages says "too many open files". I already set spark.shuffle.consolidateFiles to true: conf.set("spark.shuffle.consolidateFiles", "true") But it seems not working. What are the other possible reasons? How to fix it? Jianshi -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/