Re: Multiple spark interpreters in the same Zeppelin instance

2016-07-01 Thread Jongyoul Lee
Hi,

Concerning importing/exporting notebook with aliases, my simple solution is
to store reference interpreter setting into note.json. It, however, makes
note.json enlarged and it's possible not to match current and new
interpreter setting ids. Second, when user import note, we provide a menu
to choose interpreter mapping if there's no same aliases, or request
interpreter-setting information from previous server. But it looks
complicated for user, and cannot be adopted when user upload json only. And
finally, we can alert if there's not defined interpreter and you should
define it. It's very unfriendly.

There's no simple way now. I'll think of it deeply and find a smart
solution.

Regards,
JL


On Sat, Jul 2, 2016 at 1:22 AM, moon soo Lee  wrote:

> Thanks Jongyoul for taking care of ZEPPELIN-2012 and share the plan.
>
> Could you share little bit more detail about how export/import notebook
> will work after ZEPPELIN-2012? Because we assume export/import notebook
> works between different Zeppelin installations and one installation might
> '%myinterpreter-setting' but the other installation does not.
>
> In this case, user will need to guess type of interpreter from
> '%interpreter-setting' name or code text, and change each paragraph's
> interpreter selection one by one, to run imported notebook in the other
> zeppelin instance.
>
> Will there anyway to simplify the importing and using notebook once user
> able to select interpreter using alias?
>
> Best,
> moon
>
> On Thu, Jun 30, 2016 at 10:27 PM Jongyoul Lee  wrote:
>
>> Hi,
>>
>> This is a little bit later response, but, I think it's useful to share
>> the current status and the future plan for dealing with this feature.
>>
>> For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
>> '%jdbc(hive) and so on. This is a JdbcInterpreter features from
>> 0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
>> interpreter supports the parameter mechanism of JdbcInterpreter as an
>> alias. Thus you can use %drill, %hive in your paragraph, when you set
>> proper properties to JDBC on an interpreter tab. You can find more
>> information on the web[1]. However, it is only for JdbcInterpreter now. In
>> the next release, Zeppelin will support aliases for all interpreters. Then,
>> you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
>> on. this means different spark interpreters on a single Zeppelin server and
>> it will allow you to run multiple spark interpreters in a same note
>> simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.
>>
>> Regards,
>> Jongyoul Lee
>>
>> [1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
>> [2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012
>>
>> On Tue, May 3, 2016 at 3:44 AM, John Omernik  wrote:
>>
>>> I see two components.
>>>
>>> 1. To ability to have multiple interpreters of the same time, but use
>>> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
>>> etc.  What ever you want to name them is great, but spark1 would use the
>>> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
>>> or spark submit options.  That's the top level.
>>>
>>> 2. Ability to alias %interpreter to what ever interpreters are defined.
>>> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
>>> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
>>> point to %jdbc1.
>>>
>>> For #1, the idea here is we will have multiple instances of any given
>>> interpreter type. for #2, it really should be easy for a user to make their
>>> environment easy to use and intuitive. Not to pick on your example Rick,
>>> but as a user typing %spark:dev.sql is a pain... I need two shift
>>> characters, and another non alpha character.  whereas if I could just type
>>> %dev.sql and had an alias in my notebook that said %dev pointed to
>>> %spark_dev that would be handy It may seem like not a big deal, but having
>>> to type something like that over and over again gets old :)
>>>
>>>
>>>
>>> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz  wrote:
>>>
 I think the solution would be to distinguish between interpreter type
 and interpreter instance.
 The type should be relatively static, while the instance could be any
 alias/name and only generate a warning when unable to match with entries in
 interpreter.json. Finally the specific type would be added to distinguish
 the frontend-language (scala, python, R or sql/hive for spark, for 
 example).

 Since implementing this would also clear up some of the rather buggy
 and hard to maintain interpreter-group code, it would be a worthwhile thing
 to do, in any case.
 A final call could then look like this: %spark:dev.sql or
 %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
 Adding another separator (could be a period also - but the colon is
 semantically nice, since it's essentially a servi

Re: spark interpreter

2016-07-01 Thread Benjamin Kim
Moon,

I have downloaded and tested the bin-all tarball, and it has some deficiencies 
compared to the build-from-source version.
CSV, TSV download is missing
Doesn’t work with HBase 1.2 in CDH 5.7.0
Spark still does not work with Spark 1.6.0 in CDH 5.7.0 (JDK8)
Using Livy is a good workaround
Doesn’t work with Phoenix 4.7 in CDH 5.7.0

Everything else looks good especially in the area of multi-tenancy and 
security. I would like to know how to use the Credentials feature on securing 
usernames and passwords. I couldn’t find documentation on how.

Thanks,
Ben

> On Jul 1, 2016, at 9:04 AM, moon soo Lee  wrote:
> 
> 0.6.0 is currently in vote in dev@ list.
> http://apache-zeppelin-dev-mailing-list.75694.x6.nabble.com/VOTE-Apache-Zeppelin-release-0-6-0-rc1-tp11505.html
>  
> 
> 
> Thanks,
> moon
> 
> On Thu, Jun 30, 2016 at 1:54 PM Leon Katsnelson  > wrote:
> What is the expected day for v0.6?
> 
> 
> 
> 
> From:moon soo Lee mailto:leemoon...@gmail.com>>
> To:users@zeppelin.apache.org 
> Date:2016/06/30 11:36 AM
> Subject:Re: spark interpreter
> 
> 
> 
> Hi Ben,
> 
> Livy interpreter is included in 0.6.0. If it is not listed when you create 
> interpreter setting, could you check if your 'zeppelin.interpreters' property 
> list Livy interpreter classes? (conf/zeppelin-site.xml)
> 
> Thanks,
> moon
> 
> On Wed, Jun 29, 2016 at 11:52 AM Benjamin Kim  > wrote:
> On a side note…
> 
> Has anyone got the Livy interpreter to be added as an interpreter in the 
> latest build of Zeppelin 0.6.0? By the way, I have Shiro authentication on. 
> Could this interfere?
> 
> Thanks,
> Ben
> 
> 
> On Jun 29, 2016, at 11:18 AM, moon soo Lee  > wrote:
> 
> Livy interpreter internally creates multiple sessions for each user, 
> independently from 3 binding modes supported in Zeppelin.
> Therefore, 'shared' mode, Livy interpreter will create sessions per each 
> user, 'scoped' or 'isolated' mode will result create sessions per notebook, 
> per user.
> 
> Notebook is shared among users, they always use the same interpreter 
> instance/process, for now. I think supporting per user interpreter 
> instance/process would be future work.
> 
> Thanks,
> moon
> 
> On Wed, Jun 29, 2016 at 7:57 AM Chen Song  > wrote:
> Thanks for your explanation, Moon.
> 
> Following up on this, I can see the difference in terms of single or multiple 
> interpreter processes. 
> 
> With respect to spark drivers, since each interpreter spawns a separate Spark 
> driver in regular Spark interpreter setting, it is clear to me the different 
> implications of the 3 binding modes.
> 
> However, when it comes to Livy server with impersonation turned on, I am a 
> bit confused. Will Livy interpreter always create a new Spark driver (along 
> with a Spark Context instance) for each user session, regardless of the 
> binding mode of Livy interpreter? I am not very familiar with Livy, but from 
> what I could tell, I see no difference between different binding modes for 
> Livy on as far as how Spark drivers are concerned.
> 
> Last question, when a notebook is shared among users, will they always use 
> the same interpreter instance/process already created?
> 
> Thanks
> Chen
> 
> 
> 
> On Fri, Jun 24, 2016 at 11:51 AM moon soo Lee  > wrote:
> Hi,
> 
> Thanks for asking question. It's not dumb question at all, Zeppelin docs does 
> not explain very well.
> 
> Spark Interpreter, 
> 
> 'shared' mode, a spark interpreter setting spawn a interpreter process to 
> serve all notebooks which binded to this interpreter setting.
> 'scoped' mode, a spark interpreter setting spawn multiple interpreter 
> processes per notebook which binded to this interpreter setting.
> 
> Using Livy interpreter,
> 
> Zeppelin propagate current user information to Livy interpreter. And Livy 
> interpreter creates different session per user via Livy Server.
> 
> 
> Hope this helps.
> 
> Thanks,
> moon
> 
> 
> On Tue, Jun 21, 2016 at 6:41 PM Chen Song  > wrote:
> Zeppelin provides 3 binding modes for each interpreter. With `scoped` or 
> `shared` Spark interpreter, every user share the same SparkContext. Sorry for 
> the dumb question, how does it differ from Spark via Ivy Server?
> 
> 
> -- 
> Chen Song
> 
> 
> 
> 



Re: Using yarn-cluster mode.

2016-07-01 Thread Jeff Zhang
Check the official doc and use livy interpreter

https://zeppelin.incubator.apache.org/docs/0.6.0-SNAPSHOT/install/install.html#installation

On Fri, Jul 1, 2016 at 11:00 AM, Egor Pahomov 
wrote:

> How crazy it would be to build current zeppelin from master and use? Does
> it mean, that yarn-cluster would work there?
>
> 2016-07-01 10:56 GMT-07:00 Jeff Zhang :
>
>> AFAIK,for now only yarn-client mode is supported, the livy interpreter
>> would support yarn-cluster mode, but not in the released zeppelin yet.
>>
>> On Fri, Jul 1, 2016 at 10:44 AM, Egor Pahomov 
>> wrote:
>>
>>> Hi, I want to use yarn-cluster mode and failing to do so. I couldn't
>>> find any documentation if it's possible to do or any information, that
>>> someone done it.
>>>
>>> My motivation:
>>>
>>> I have many users with 1 zeppelin instance per user. They all live on
>>> same machine and allocate a lot of memory with there drivers.
>>>
>>> --
>>>
>>>
>>> *Sincerely yoursEgor Pakhomov*
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
Best Regards

Jeff Zhang


Re: classnotfoundexception using zeppelin with spark authentication

2016-07-01 Thread Jonathan Esterhazy
Hyung, thx for your help. I opened these:

https://issues.apache.org/jira/browse/ZEPPELIN-1096 (this scala problem)
https://issues.apache.org/jira/browse/ZEPPELIN-1097 (similar looking python
problem)

LMK if I can provide more info or help in some way.

On Fri, Jul 1, 2016 at 5:08 AM, Hyung Sung Shim  wrote:

> Hi Jonathan.
> Unfortunately I got same error in my test bed.
> Do you mind create an jira issue for this?
>
> 2016-07-01 3:19 GMT+09:00 Jonathan Esterhazy  >:
>
>> I added this and still get the same exception. The same property is also
>> set in spark-defaults.conf.
>>
>> After that didn't work, I also tried adding --conf
>> spark.authenticate=true and --conf
>> spark.authenticate.enableSaslEncryption=true, to match the other related
>> settings in spark-defaults.conf. Still get the same classnotfoundexception.
>>
>>
>> On Thu, Jun 30, 2016 at 10:45 AM, Hyung Sung Shim 
>> wrote:
>>
>>> Please add the *export SPARK_SUBMIT_OPTIONS="--conf
>>> spark.authenticate.secret=secret"* in conf/zeppelin-env.sh and restart
>>> zeppelin and retry your code.
>>>
>>>
>>> 2016-06-30 23:34 GMT+09:00 Jonathan Esterhazy <
>>> jonathan.esterh...@gmail.com>:
>>>
 yes it does. i only see this problem in zeppelin.

 On Thu, Jun 30, 2016 at 7:05 AM, Hyung Sung Shim 
 wrote:

> Hi Jonathan.
> It's not easy to build the test environments but I am working on this.
> I have question for you.
> Does your code working well on spark-shell in the spark.authenticate
> mode?
>
> 2016-06-30 22:47 GMT+09:00 Jonathan Esterhazy <
> jonathan.esterh...@gmail.com>:
>
>> Hyung, did you have any luck w/ zeppelin + spark authentication? I'm
>> quite stumped.
>>
>> thx.
>>
>> On Tue, Jun 28, 2016 at 9:11 PM, Hyung Sung Shim 
>> wrote:
>>
>>> Thank you.
>>> Let me try.
>>>
>>> 2016-06-28 22:18 GMT+09:00 Jonathan Esterhazy <
>>> jonathan.esterh...@gmail.com>:
>>>
 Hyung,

 Yes, here they are.

 zeppelin-env.sh:

 export ZEPPELIN_PORT=8890
 export ZEPPELIN_CONF_DIR=/etc/zeppelin/conf
 export ZEPPELIN_LOG_DIR=/var/log/zeppelin
 export ZEPPELIN_PID_DIR=/var/run/zeppelin
 export ZEPPELIN_PID=$ZEPPELIN_PID_DIR/zeppelin.pid
 export ZEPPELIN_NOTEBOOK_DIR=/var/lib/zeppelin/notebook
 export ZEPPELIN_WAR_TEMPDIR=/var/run/zeppelin/webapps
 export MASTER=yarn-client
 export SPARK_HOME=/usr/lib/spark
 export HADOOP_CONF_DIR=/etc/hadoop/conf
 export
 CLASSPATH=":/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*
 :/usr/share/aws/emr/emrfs/auxlib/*"
 export JAVA_HOME=/usr/lib/jvm/java-1.8.0
 export ZEPPELIN_NOTEBOOK_S3_BUCKET=mybucket
 export ZEPPELIN_NOTEBOOK_S3_USER=zeppelin
 export
 ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.S3NotebookRepo

 spark-defaults.conf:

 spark.master yarn
 spark.driver.extraClassPath
  
 /etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf
 :/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
 spark.driver.extraLibraryPath
  /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
 spark.executor.extraClassPath
  
 /etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf
 :/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
 spark.executor.extraLibraryPath
  /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
 spark.eventLog.enabled   true
 spark.eventLog.dir   hdfs:///var/log/spark/apps
 spark.history.fs.logDirectoryhdfs:///var/log/spark/apps
 spark.yarn.historyServer.address ip-172-30-54-30.ec2.internal:18080
 spark.history.ui.port18080
 spark.shuffle.service.enabledtrue
 spark.driver.extraJavaOptions
  -Dlog4j.configuration=file:///etc/spark/conf/log4j.properties
 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
 -XX:MaxHeapFreeRatio=70
 -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=512M
 -XX:OnOutOfMemoryError='kill -9 %p'
 spark.dynamicAllocation.enabled  true
 spark.executor.extraJavaOptions  -verbose:gc -XX:+PrintGCDetails
 -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CM
 SClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
 spark.executor.memory8640m
>>>

Re: Error with sql interpreter

2016-07-01 Thread Juan Pablo Briganti
Hi all!

Any other suggestion about this issues would really help us and will be
really appreciated.
Looks like a very interesting tool so we want to work more with it.
If you need more info please let me know.

Thanks!!!

2016-06-27 9:56 GMT-03:00 Juan Pablo Briganti :

> Hello Meilong!
>
> I exported SPARK_HOME inside zeppelin-env.sh configuration file. The
> documentation says that hive-env.xml should be placed inside zeppelin
> installation forlder so I did that, but just to be sure I tried what you
> said (also exporting SPARK_HOME environment variable) but that didn't work
> either.
> Thanks, any other suggestion would be really appreciated.
>
> 2016-06-27 3:46 GMT-03:00 Meilong Huang :
>
>> Do your export $SPAR_HOME? Is hive-site.xml file in $SPARK_HOME/conf?
>>
>> 2016-06-25 2:50 GMT+08:00 Juan Pablo Briganti 
>> :
>>
>>> Hello:
>>>
>>>   I'm configuring zeppelin to work with our cloudera cluster and I'm
>>> facing some errors. I would like to see if those are known errors or I'm
>>> doing something wrong:
>>>
>>> Zeppelin version: 0.6
>>> Cloudera version: 5.7.0
>>> Hadoop version: 2.6
>>> Spark version: 1.6
>>>
>>> Build maven command: mvn clean package -Pspark-1.6
>>> -Dspark.version=1.6.0-cdh5.7.0 -Dhadoop.version=2.6.0-cdh5.7.0 -Phadoop-2.6
>>> -Pvendor-repo -DskipTests
>>>
>>> I'm running applications using %spark and %pyspark interpreter and seems
>>> to run ok under yarn-client configuration, using both created and the
>>> example scripts. But when I try to run any script with %sql I get one of
>>> the next 2 errors detailed at the end of the email. It's not clear when I
>>> get each error, I just keep getting the first one until I start getting the
>>> second one without doing any configuration change, just running again the
>>> same script.
>>> I have my hive-site.xml file inside zeppelin/conf folder.
>>> Log folder provides those exceptions only, no extra information.
>>>
>>>
>>> Any hint you may have about this would help to solve the problem.
>>> Thanks.
>>>
>>> java.lang.NullPointerException
>>> at
>>> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
>>> at
>>> org.apache.spark.sql.hive.client.ClientWrapper.client(ClientWrapper.scala:261)
>>> at
>>> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:273)
>>> at
>>> org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)
>>> at
>>> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)
>>> at
>>> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)
>>> at org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:65)
>>> at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
>>> at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
>>> at
>>> org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
>>> at
>>> org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:113)
>>> at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
>>> at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
>>> at
>>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>>> at
>>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>>> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>>> at
>>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>>> at
>>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>>> at
>>> scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
>>> at
>>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>>> at
>>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>>> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>>> at
>>> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>>> at
>>> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>>> at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
>>> at
>>> scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
>>> at
>>> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
>>> at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208)
>>> at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208)
>>> at
>>> org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:43)
>>> at org.apache.spark.

Re: Using yarn-cluster mode.

2016-07-01 Thread Egor Pahomov
How crazy it would be to build current zeppelin from master and use? Does
it mean, that yarn-cluster would work there?

2016-07-01 10:56 GMT-07:00 Jeff Zhang :

> AFAIK,for now only yarn-client mode is supported, the livy interpreter
> would support yarn-cluster mode, but not in the released zeppelin yet.
>
> On Fri, Jul 1, 2016 at 10:44 AM, Egor Pahomov 
> wrote:
>
>> Hi, I want to use yarn-cluster mode and failing to do so. I couldn't find
>> any documentation if it's possible to do or any information, that someone
>> done it.
>>
>> My motivation:
>>
>> I have many users with 1 zeppelin instance per user. They all live on
>> same machine and allocate a lot of memory with there drivers.
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 


*Sincerely yoursEgor Pakhomov*


Re: Using yarn-cluster mode.

2016-07-01 Thread Jeff Zhang
AFAIK,for now only yarn-client mode is supported, the livy interpreter
would support yarn-cluster mode, but not in the released zeppelin yet.

On Fri, Jul 1, 2016 at 10:44 AM, Egor Pahomov 
wrote:

> Hi, I want to use yarn-cluster mode and failing to do so. I couldn't find
> any documentation if it's possible to do or any information, that someone
> done it.
>
> My motivation:
>
> I have many users with 1 zeppelin instance per user. They all live on same
> machine and allocate a lot of memory with there drivers.
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
Best Regards

Jeff Zhang


Using yarn-cluster mode.

2016-07-01 Thread Egor Pahomov
Hi, I want to use yarn-cluster mode and failing to do so. I couldn't find
any documentation if it's possible to do or any information, that someone
done it.

My motivation:

I have many users with 1 zeppelin instance per user. They all live on same
machine and allocate a lot of memory with there drivers.

-- 


*Sincerely yoursEgor Pakhomov*


Re: Multiple spark interpreters in the same Zeppelin instance

2016-07-01 Thread moon soo Lee
Thanks Jongyoul for taking care of ZEPPELIN-2012 and share the plan.

Could you share little bit more detail about how export/import notebook
will work after ZEPPELIN-2012? Because we assume export/import notebook
works between different Zeppelin installations and one installation might
'%myinterpreter-setting' but the other installation does not.

In this case, user will need to guess type of interpreter from
'%interpreter-setting' name or code text, and change each paragraph's
interpreter selection one by one, to run imported notebook in the other
zeppelin instance.

Will there anyway to simplify the importing and using notebook once user
able to select interpreter using alias?

Best,
moon

On Thu, Jun 30, 2016 at 10:27 PM Jongyoul Lee  wrote:

> Hi,
>
> This is a little bit later response, but, I think it's useful to share the
> current status and the future plan for dealing with this feature.
>
> For now, JdbcInterpreter supports parameter like '%jdbc(drill)',
> '%jdbc(hive) and so on. This is a JdbcInterpreter features from
> 0.6.0-SNAPSHOT and will be included 0.6.0. Furthermore, Zeppelin
> interpreter supports the parameter mechanism of JdbcInterpreter as an
> alias. Thus you can use %drill, %hive in your paragraph, when you set
> proper properties to JDBC on an interpreter tab. You can find more
> information on the web[1]. However, it is only for JdbcInterpreter now. In
> the next release, Zeppelin will support aliases for all interpreters. Then,
> you can make multiple interpreters like '%spark-dev`, '%spark-prod' and so
> on. this means different spark interpreters on a single Zeppelin server and
> it will allow you to run multiple spark interpreters in a same note
> simutaneously. This will be handled ZEPPELIN-1012[2]. Please watch it.
>
> Regards,
> Jongyoul Lee
>
> [1]: http://zeppelin.apache.org/docs/0.6.0-SNAPSHOT/interpreter/jdbc.html
> [2]: https://issues.apache.org/jira/browse/ZEPPELIN-1012
>
> On Tue, May 3, 2016 at 3:44 AM, John Omernik  wrote:
>
>> I see two components.
>>
>> 1. To ability to have multiple interpreters of the same time, but use
>> different configuration options. a jdbc1, jdbc2, spark1, spark2, spark3,
>> etc.  What ever you want to name them is great, but spark1 would use the
>> SPARK_HOME that is configured, and spark2 would use a different SPARK_HOME
>> or spark submit options.  That's the top level.
>>
>> 2. Ability to alias %interpreter to what ever interpreters are defined.
>> I.e. I could do %jdbc1 for Drill, %jdbc2 for MySQL. And then have a file
>> let as a user, I can say "I want %mysql to point to %jdbc2, and %drill to
>> point to %jdbc1.
>>
>> For #1, the idea here is we will have multiple instances of any given
>> interpreter type. for #2, it really should be easy for a user to make their
>> environment easy to use and intuitive. Not to pick on your example Rick,
>> but as a user typing %spark:dev.sql is a pain... I need two shift
>> characters, and another non alpha character.  whereas if I could just type
>> %dev.sql and had an alias in my notebook that said %dev pointed to
>> %spark_dev that would be handy It may seem like not a big deal, but having
>> to type something like that over and over again gets old :)
>>
>>
>>
>> On Mon, May 2, 2016 at 11:31 AM, Rick Moritz  wrote:
>>
>>> I think the solution would be to distinguish between interpreter type
>>> and interpreter instance.
>>> The type should be relatively static, while the instance could be any
>>> alias/name and only generate a warning when unable to match with entries in
>>> interpreter.json. Finally the specific type would be added to distinguish
>>> the frontend-language (scala, python, R or sql/hive for spark, for example).
>>>
>>> Since implementing this would also clear up some of the rather buggy and
>>> hard to maintain interpreter-group code, it would be a worthwhile thing to
>>> do, in any case.
>>> A final call could then look like this: %spark:dev.sql or
>>> %spark:prod.pyspark. (or jdbc:drill, jdbc:oracledw)
>>> Adding another separator (could be a period also - but the colon is
>>> semantically nice, since it's essentially a service and address that we're
>>> calling) makes for easy parsing of the string and keeps notes (somewhat)
>>> portable.
>>>
>>> What do you think?
>>>
>>
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


Re: spark interpreter

2016-07-01 Thread moon soo Lee
0.6.0 is currently in vote in dev@ list.
http://apache-zeppelin-dev-mailing-list.75694.x6.nabble.com/VOTE-Apache-Zeppelin-release-0-6-0-rc1-tp11505.html

Thanks,
moon

On Thu, Jun 30, 2016 at 1:54 PM Leon Katsnelson  wrote:

> What is the expected day for v0.6?
>
>
>
>
> From:moon soo Lee 
> To:users@zeppelin.apache.org
> Date:2016/06/30 11:36 AM
> Subject:Re: spark interpreter
> --
>
>
>
> Hi Ben,
>
> Livy interpreter is included in 0.6.0. If it is not listed when you create
> interpreter setting, could you check if your 'zeppelin.interpreters'
> property list Livy interpreter classes? (conf/zeppelin-site.xml)
>
> Thanks,
> moon
>
> On Wed, Jun 29, 2016 at 11:52 AM Benjamin Kim <*bbuil...@gmail.com*
> > wrote:
> On a side note…
>
> Has anyone got the Livy interpreter to be added as an interpreter in the
> latest build of Zeppelin 0.6.0? By the way, I have Shiro authentication on.
> Could this interfere?
>
> Thanks,
> Ben
>
>
> On Jun 29, 2016, at 11:18 AM, moon soo Lee <*m...@apache.org*
> > wrote:
>
> Livy interpreter internally creates multiple sessions for each user,
> independently from 3 binding modes supported in Zeppelin.
> Therefore, 'shared' mode, Livy interpreter will create sessions per each
> user, 'scoped' or 'isolated' mode will result create sessions per notebook,
> per user.
>
> Notebook is shared among users, they always use the same interpreter
> instance/process, for now. I think supporting per user interpreter
> instance/process would be future work.
>
> Thanks,
> moon
>
> On Wed, Jun 29, 2016 at 7:57 AM Chen Song <*chen.song...@gmail.com*
> > wrote:
> Thanks for your explanation, Moon.
>
> Following up on this, I can see the difference in terms of single or
> multiple interpreter processes.
>
> With respect to spark drivers, since each interpreter spawns a separate
> Spark driver in regular Spark interpreter setting, it is clear to me the
> different implications of the 3 binding modes.
>
> However, when it comes to Livy server with impersonation turned on, I am a
> bit confused. Will Livy interpreter always create a new Spark driver (along
> with a Spark Context instance) for each user session, regardless of the
> binding mode of Livy interpreter? I am not very familiar with Livy, but
> from what I could tell, I see no difference between different binding modes
> for Livy on as far as how Spark drivers are concerned.
>
> Last question, when a notebook is shared among users, will they always use
> the same interpreter instance/process already created?
>
> Thanks
> Chen
>
>
>
> On Fri, Jun 24, 2016 at 11:51 AM moon soo Lee <*m...@apache.org*
> > wrote:
> Hi,
>
> Thanks for asking question. It's not dumb question at all, Zeppelin docs
> does not explain very well.
>
> Spark Interpreter,
>
> 'shared' mode, a spark interpreter setting spawn a interpreter process to
> serve all notebooks which binded to this interpreter setting.
> 'scoped' mode, a spark interpreter setting spawn multiple interpreter
> processes per notebook which binded to this interpreter setting.
>
> Using Livy interpreter,
>
> Zeppelin propagate current user information to Livy interpreter. And Livy
> interpreter creates different session per user via Livy Server.
>
>
> Hope this helps.
>
> Thanks,
> moon
>
>
> On Tue, Jun 21, 2016 at 6:41 PM Chen Song <*chen.song...@gmail.com*
> > wrote:
> Zeppelin provides 3 binding modes for each interpreter. With `scoped` or
> `shared` Spark interpreter, every user share the same SparkContext. Sorry
> for the dumb question, how does it differ from Spark via Ivy Server?
>
>
> --
> Chen Song
>
>
>
>
>


Re: classnotfoundexception using zeppelin with spark authentication

2016-07-01 Thread Hyung Sung Shim
Hi Jonathan.
Unfortunately I got same error in my test bed.
Do you mind create an jira issue for this?

2016-07-01 3:19 GMT+09:00 Jonathan Esterhazy :

> I added this and still get the same exception. The same property is also
> set in spark-defaults.conf.
>
> After that didn't work, I also tried adding --conf spark.authenticate=true
> and --conf spark.authenticate.enableSaslEncryption=true, to match the other
> related settings in spark-defaults.conf. Still get the same
> classnotfoundexception.
>
>
> On Thu, Jun 30, 2016 at 10:45 AM, Hyung Sung Shim 
> wrote:
>
>> Please add the *export SPARK_SUBMIT_OPTIONS="--conf
>> spark.authenticate.secret=secret"* in conf/zeppelin-env.sh and restart
>> zeppelin and retry your code.
>>
>>
>> 2016-06-30 23:34 GMT+09:00 Jonathan Esterhazy <
>> jonathan.esterh...@gmail.com>:
>>
>>> yes it does. i only see this problem in zeppelin.
>>>
>>> On Thu, Jun 30, 2016 at 7:05 AM, Hyung Sung Shim 
>>> wrote:
>>>
 Hi Jonathan.
 It's not easy to build the test environments but I am working on this.
 I have question for you.
 Does your code working well on spark-shell in the spark.authenticate
 mode?

 2016-06-30 22:47 GMT+09:00 Jonathan Esterhazy <
 jonathan.esterh...@gmail.com>:

> Hyung, did you have any luck w/ zeppelin + spark authentication? I'm
> quite stumped.
>
> thx.
>
> On Tue, Jun 28, 2016 at 9:11 PM, Hyung Sung Shim 
> wrote:
>
>> Thank you.
>> Let me try.
>>
>> 2016-06-28 22:18 GMT+09:00 Jonathan Esterhazy <
>> jonathan.esterh...@gmail.com>:
>>
>>> Hyung,
>>>
>>> Yes, here they are.
>>>
>>> zeppelin-env.sh:
>>>
>>> export ZEPPELIN_PORT=8890
>>> export ZEPPELIN_CONF_DIR=/etc/zeppelin/conf
>>> export ZEPPELIN_LOG_DIR=/var/log/zeppelin
>>> export ZEPPELIN_PID_DIR=/var/run/zeppelin
>>> export ZEPPELIN_PID=$ZEPPELIN_PID_DIR/zeppelin.pid
>>> export ZEPPELIN_NOTEBOOK_DIR=/var/lib/zeppelin/notebook
>>> export ZEPPELIN_WAR_TEMPDIR=/var/run/zeppelin/webapps
>>> export MASTER=yarn-client
>>> export SPARK_HOME=/usr/lib/spark
>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>> export
>>> CLASSPATH=":/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*
>>> :/usr/share/aws/emr/emrfs/auxlib/*"
>>> export JAVA_HOME=/usr/lib/jvm/java-1.8.0
>>> export ZEPPELIN_NOTEBOOK_S3_BUCKET=mybucket
>>> export ZEPPELIN_NOTEBOOK_S3_USER=zeppelin
>>> export
>>> ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.S3NotebookRepo
>>>
>>> spark-defaults.conf:
>>>
>>> spark.master yarn
>>> spark.driver.extraClassPath
>>>  
>>> /etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf
>>> :/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
>>> spark.driver.extraLibraryPath
>>>  /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
>>> spark.executor.extraClassPath
>>>  
>>> /etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf
>>> :/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
>>> spark.executor.extraLibraryPath
>>>  /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
>>> spark.eventLog.enabled   true
>>> spark.eventLog.dir   hdfs:///var/log/spark/apps
>>> spark.history.fs.logDirectoryhdfs:///var/log/spark/apps
>>> spark.yarn.historyServer.address ip-172-30-54-30.ec2.internal:18080
>>> spark.history.ui.port18080
>>> spark.shuffle.service.enabledtrue
>>> spark.driver.extraJavaOptions
>>>  -Dlog4j.configuration=file:///etc/spark/conf/log4j.properties
>>> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
>>> -XX:MaxHeapFreeRatio=70
>>> -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=512M
>>> -XX:OnOutOfMemoryError='kill -9 %p'
>>> spark.dynamicAllocation.enabled  true
>>> spark.executor.extraJavaOptions  -verbose:gc -XX:+PrintGCDetails
>>> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC
>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CM
>>> SClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
>>> spark.executor.memory8640m
>>> spark.executor.cores 7
>>> spark.authenticate.enableSaslEncryption true
>>> spark.driver.memory  1g
>>> spark.network.sasl.serverAlwaysEncrypt true
>>> spark.driver.cores   1
>>> spark.ssl.protocol   TLSv1.2
>>> spark.ssl.keyStorePassword   password
>>> spark.yarn.maxAppAttempts1
>>> spark.ssl.keyStore   /etc/emr/securi