Hi Sean, Thanks for the reply. I think both driver and worker have the problem. You are right that the ulimit fixed the driver side too many files open error.
And there is a very big shuffle. My maybe naive thought is to migrate the HQL scripts directly from Hive to Spark SQL and make them work. It seems that it won't be that easy. Is that correct? And it seems that I had done that with Shark and it worked pretty well in the old days. Any suggestions if we are planning to migrate a large code base from Hive to Spark SQL with minimum code rewriting? Many thanks. Cao On Friday, October 31, 2014, Sean Owen <so...@cloudera.com> wrote: > It's almost surely the workers, not the driver (shell) that have too > many files open. You can change their ulimit. But it's probably better > to see why it happened -- a very big shuffle? -- and repartition or > design differently to avoid it. The new sort-based shuffle might help > in this regard. > > On Fri, Oct 31, 2014 at 3:25 PM, Bill Q <bill.q....@gmail.com > <javascript:;>> wrote: > > Hi, > > I am trying to make Spark SQL 1.1 to work to replace part of our ETL > > processes that are currently done by Hive 0.12. > > > > A common problem that I have encountered is the "Too many files open" > error. > > Once that happened, the query just failed. I started the spark-shell by > > using "ulimit -n 4096 & spark-shell". And it still pops the same error. > > > > Any solutions? > > > > Many thanks. > > > > > > Bill > > > > > > > > -- > > Many thanks. > > > > > > Bill > > > -- Many thanks. Bill