Hi Sean,
Thanks for the reply. I think both driver and worker have the problem. You
are right that the ulimit fixed the driver side too many files open error.

And there is a very big shuffle. My maybe naive thought is to migrate the
HQL scripts directly from Hive to Spark SQL and make them  work. It seems
that it won't be that easy. Is that correct? And it seems that I had done
that with Shark and it worked pretty well in the old days.

Any suggestions if we are planning to migrate a large code base from
Hive to Spark SQL with minimum code rewriting?

Many thanks.


Cao

On Friday, October 31, 2014, Sean Owen <so...@cloudera.com> wrote:

> It's almost surely the workers, not the driver (shell) that have too
> many files open. You can change their ulimit. But it's probably better
> to see why it happened -- a very big shuffle? -- and repartition or
> design differently to avoid it. The new sort-based shuffle might help
> in this regard.
>
> On Fri, Oct 31, 2014 at 3:25 PM, Bill Q <bill.q....@gmail.com
> <javascript:;>> wrote:
> > Hi,
> > I am trying to make Spark SQL 1.1 to work to replace part of our ETL
> > processes that are currently done by Hive 0.12.
> >
> > A common problem that I have encountered is the "Too many files open"
> error.
> > Once that happened, the query just failed. I started the spark-shell by
> > using "ulimit -n 4096 & spark-shell". And it still pops the same error.
> >
> > Any solutions?
> >
> > Many thanks.
> >
> >
> > Bill
> >
> >
> >
> > --
> > Many thanks.
> >
> >
> > Bill
> >
>


-- 
Many thanks.


Bill

Reply via email to