these files if i read it correctly are for spawning yet another process. i don't see how it may work for the shell.
I am also not convinced that spark-env is important for the client. On Tue, Nov 11, 2014 at 2:09 PM, Pat Ferrel <[email protected]> wrote: > I was thinking -Dx=y too, seems like a good idea. > > But we should also support setting them the way Spark documents in > spark-env.sh and the two links Andrew found may solve that in a > maintainable way. Maybe we get the SparkConf from a new mahoutSparkConf > function, which handles all env supplied setup. For the drivers it can be > done in the base class allowing and CLI overrides later. Then the SparkConf > is finally passed in to mahoutSparkContext where as little as possible is > changed in the conf. > > I’ll look at this for the drivers. Should be easy to add to the shell. > > On Nov 11, 2014, at 12:36 PM, Dmitriy Lyubimov <[email protected]> wrote: > > IMO you just need to modify `mahout spark-shell` to propagate -Dx=y > parameters to the java startup call and all should be fine. > > On Tue, Nov 11, 2014 at 12:23 PM, Andrew Palumbo <[email protected]> > wrote: > > > > > > > > > I've run into this problem starting $ mahout shell-script. i.e. needing > > to set the spark.kryoserializer.buffer.mb and spark.akka.frameSize. > I've > > been temporarily hard coding them for now while developing. > > > > I'm just getting familiar with What you've done with the CLI drivers. > For > > #2 could we borrow option parsing code/methods from spark [1] [2] at each > > (spark) release and somehow add this to > > MahoutOptionParser.parseSparkOptions? > > > > I'll hopefully be doing some CLI work soon and have a better > understanding. > > > > [1] > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala > > [2] > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala > > > >> From: [email protected] > >> Subject: Spark options > >> Date: Wed, 5 Nov 2014 09:48:59 -0800 > >> To: [email protected] > >> > >> Spark has a launch script as hadoop does. We use the Hadoop launcher > > script but not the Spark one. When starting up your Spark cluster there > is > > a spark-env.sh script that can set a bunch of environment variables. In > our > > own mahoutSparkContext function, which takes the place of the Spark > submit > > script and launcher we don’t account for most of the environment > variables. > >> > >> Unless I missed something this means most of the documented options will > > be ignored unless a user of Mahout parses and sets them in their own > > SparkConf. The Mahout CLI drivers don’t do this for all possible options, > > only supporting a few like job name and spark.executor.memory. > >> > >> The question is how to best handle these Spark options. There seem to be > > two options: > >> 1) use sparks launch mechanism for drivers but allow some to be > > overridden in the CLI > >> 2) add parsing the env for options and set up the SparkConf default in > > mahoutSparkContext with those variables. > >> > >> The downside of #2 is that as variables change we’ll have to reflect > > those in our code. I forget why #1 is not an option but Dmitriy has been > > consistently against this—in any case it would mean a fair bit of > > refactoring I believe. > >> > >> Any opinions or corrections? > > > > > > > >
