from:"Ben Vogan"

org.apache.spark.SparkException: Could not parse Master URL: 'yarn'

2017-04-12 Thread Ben Vogan

Hello all,

I am trying to install Zeppelin 0.7.1 on my CDH 5.7 Cluster.  I have been
following the instructions here:

https://zeppelin.apache.org/docs/0.7.1/install/install.html
https://zeppelin.apache.org/docs/0.7.1/install/configuration.html
https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html

I copied the zeppelin-env.sh.template into zeppelin-env.sh and made the
following changes:
export JAVA_HOME=/usr/java/latest
export MASTER=yarn-client

export ZEPPELIN_LOG_DIR=/var/log/services/zeppelin
export ZEPPELIN_PID_DIR=/services/zeppelin/data
export ZEPPELIN_WAR_TEMPDIR=/services/zeppelin/data/jetty_tmp
export ZEPPELIN_NOTEBOOK_DIR=/services/zeppelin/data/notebooks
export ZEPPELIN_NOTEBOOK_PUBLIC=true

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf
export PYSPARK_PYTHON=/usr/lib/python

I then start Zeppelin and hit the UI in my browser and create a spark note:

%spark
sqlContext.sql("select 1+1").collect().foreach(println)

And I get this error:

org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2746)
at org.apache.spark.SparkContext.(SparkContext.scala:533)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_1(SparkInterpreter.java:484)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:382)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I specified "yarn-client" as indicated by the instructions so I'm not sure
where it is getting "yarn" from.  In my spark-defaults.conf it
spark.master=yarn-client as well.

Help would be greatly appreciated.

Thanks,
-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'

2017-04-13 Thread Ben Vogan

I discovered that the interpreter.json had "master" : "yarn" and this seems
to take precedence over what is in the zeppelin-env.sh file.  Changing that
to yarn-client resolved my issue.

--Ben

On Wed, Apr 12, 2017 at 2:39 PM, Chaoran Yu  wrote:

> I suspect this is due to not setting SPARK_EXECUTOR_URI.
>
> I’ve run Zeppelin with Spark on Mesos. I ran into a similar exception
> where Zeppelin was not able to parse the MASTER URL, which is “
> mesos://leader.mesos:5050” in my case. Then I found out that I had the
> following setting:
> SPARK_EXECUTOR_URI=https://www.apache.org/dist/spark/
> spark-2.1.0/spark-2.1.0-bin-hadoop2.6.tgz
> which is not built for mesos.
>
> After changing it to the following
> SPARK_EXECUTOR_URI=https://downloads.mesosphere.com/
> spark/assets/spark-2.1.0-bin-2.6.tgz
> the exception was gone.
>
> In your case, you might want to look at this page: http://archive-primary.
> cloudera.com/cdh5/cdh/5/
> So I guess something like http://archive-primary.
> cloudera.com/cdh5/cdh/5/spark-1.6.0-cdh5.7.6.tar.gz should work as a
> value for SPARK_EXECUTOR_URI.
>
> --
> Chaoran Yu
>
> On Apr 12, 2017, at 4:16 PM, Ben Vogan  wrote:
>
> Hello all,
>
> I am trying to install Zeppelin 0.7.1 on my CDH 5.7 Cluster.  I have been
> following the instructions here:
>
> https://zeppelin.apache.org/docs/0.7.1/install/install.html
> https://zeppelin.apache.org/docs/0.7.1/install/configuration.html
> https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html
>
> I copied the zeppelin-env.sh.template into zeppelin-env.sh and made the
> following changes:
> export JAVA_HOME=/usr/java/latest
> export MASTER=yarn-client
>
> export ZEPPELIN_LOG_DIR=/var/log/services/zeppelin
> export ZEPPELIN_PID_DIR=/services/zeppelin/data
> export ZEPPELIN_WAR_TEMPDIR=/services/zeppelin/data/jetty_tmp
> export ZEPPELIN_NOTEBOOK_DIR=/services/zeppelin/data/notebooks
> export ZEPPELIN_NOTEBOOK_PUBLIC=true
>
> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
> export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf
> export PYSPARK_PYTHON=/usr/lib/python
>
> I then start Zeppelin and hit the UI in my browser and create a spark note:
>
> %spark
> sqlContext.sql("select 1+1").collect().foreach(println)
>
> And I get this error:
>
> org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
> at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$
> createTaskScheduler(SparkContext.scala:2746)
> at org.apache.spark.SparkContext.(SparkContext.scala:533)
> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_1(
> SparkInterpreter.java:484)
> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(
> SparkInterpreter.java:382)
> at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(
> SparkInterpreter.java:146)
> at org.apache.zeppelin.spark.SparkInterpreter.open(
> SparkInterpreter.java:828)
> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(
> LazyOpenInterpreter.java:70)
> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$
> InterpretJob.jobRun(RemoteInterpreterServer.java:483)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(
> FIFOScheduler.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I specified "yarn-client" as indicated by the instructions so I'm not sure
> where it is getting "yarn" from.  In my spark-defaults.conf it
> spark.master=yarn-client as well.
>
> Help would be greatly appreciated.
>
> Thanks,
> --
> *BENJAMIN VOGAN* | Data Platform Team Lead
>
> <http://www.shopkick.com/>
> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

ZeppelinContext textbox for passwords

2017-05-09 Thread Ben Vogan

Hi there,

Is it possible to create a textbox for accepting passwords via the
ZeppelinContext (i.e. one that masks input)?  I do not see any way to do
so, but I hope I'm missing something.

Thanks,

-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Illegal Inheritance error

2017-05-14 Thread Ben Vogan

Hi all,

I've been using Zeppelin for a couple of weeks now with a stable
configuration, but all of a sudden I am getting "Illegal inheritance"
errors like so:

 INFO [2017-05-14 03:25:32,678] ({pool-2-thread-56}
Paragraph.java[jobRun]:362) - run paragraph 20170514-032326_663206142 using
livy org.apache.zeppelin.interpreter.LazyOpenInterpreter@505a171c
 WARN [2017-05-14 03:25:33,696] ({pool-2-thread-56}
NotebookServer.java[afterStatusChange]:2058) - Job
20170514-032326_663206142 is finished, status: ERROR, exception: null,
result: %text :4: error: illegal inheritance;

It happens across multiple notebooks and across by my spark and livy
interpreters.  I don't know where to look for more information about what
is wrong.  I don't see any errors in spark/yarn at all.  The driver got
created, but it looks like no jobs were ever submitted to spark.

Help would be greatly appreciated.

Thanks,

-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: Illegal Inheritance error

2017-05-15 Thread Ben Vogan

Hi all,

For some reason today I'm getting a stack:

org.apache.zeppelin.livy.LivyException: Fail to create
SQLContext,:4: error: illegal inheritance;
at
org.apache.zeppelin.livy.LivySparkSQLInterpreter.open(LivySparkSQLInterpreter.java:76)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

On the Livy server I see no errors and there is an open session on yarn.

Some help on this would be greatly appreciated!

--Ben

On Sun, May 14, 2017 at 6:16 AM, Ben Vogan  wrote:

> Hi all,
>
> I've been using Zeppelin for a couple of weeks now with a stable
> configuration, but all of a sudden I am getting "Illegal inheritance"
> errors like so:
>
>  INFO [2017-05-14 03:25:32,678] ({pool-2-thread-56}
> Paragraph.java[jobRun]:362) - run paragraph 20170514-032326_663206142 using
> livy org.apache.zeppelin.interpreter.LazyOpenInterpreter@505a171c
>  WARN [2017-05-14 03:25:33,696] ({pool-2-thread-56} 
> NotebookServer.java[afterStatusChange]:2058)
> - Job 20170514-032326_663206142 is finished, status: ERROR, exception:
> null, result: %text :4: error: illegal inheritance;
>
> It happens across multiple notebooks and across by my spark and livy
> interpreters.  I don't know where to look for more information about what
> is wrong.  I don't see any errors in spark/yarn at all.  The driver got
> created, but it looks like no jobs were ever submitted to spark.
>
> Help would be greatly appreciated.
>
> Thanks,
>
> --
> *BENJAMIN VOGAN* | Data Platform Team Lead
>
> <http://www.shopkick.com/>
> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>



-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

Re: Illegal Inheritance error

2017-05-15 Thread Ben Vogan

ivy-server/livy-server-current/repl_2.10-jars/livy-core_2.10-0.3.0.jar
at spark://10.19.194.147:53267/jars/livy-core_2.10-0.3.0.jar with
timestamp 1494893058609
17/05/16 00:04:18 INFO cluster.YarnClusterScheduler: Created
YarnClusterScheduler
17/05/16 00:04:18 INFO util.Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port
57551.
17/05/16 00:04:18 INFO netty.NettyBlockTransferService: Server created on 57551
17/05/16 00:04:18 INFO storage.BlockManager: external shuffle service
port = 7337
17/05/16 00:04:18 INFO storage.BlockManagerMaster: Trying to register
BlockManager
17/05/16 00:04:18 INFO storage.BlockManagerMasterEndpoint: Registering
block manager 10.19.194.147:57551 with 1966.1 MB RAM,
BlockManagerId(driver, 10.19.194.147, 57551)
17/05/16 00:04:18 INFO storage.BlockManagerMaster: Registered BlockManager
17/05/16 00:04:19 INFO scheduler.EventLoggingListener: Logging events
to 
hdfs://jarvis-nameservice001/user/spark/applicationHistory/application_1494373289850_0336_1
17/05/16 00:04:19 INFO cluster.YarnClusterSchedulerBackend:
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.8
17/05/16 00:04:19 INFO cluster.YarnClusterScheduler:
YarnClusterScheduler.postStartHook done
17/05/16 00:04:19 INFO
cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster
registered as NettyRpcEndpointRef(spark://YarnAM@10.19.194.147:53267)
17/05/16 00:04:19 INFO yarn.YarnRMClient: Registering the ApplicationMaster
17/05/16 00:04:19 INFO yarn.ApplicationMaster: Started progress
reporter thread with (heartbeat : 3000, initial allocation : 200)
intervals
17/05/16 00:04:19 INFO hive.HiveContext: Initializing execution hive,
version 1.1.0
17/05/16 00:04:19 INFO client.ClientWrapper: Inspected Hadoop version:
2.6.0-cdh5.7.0
17/05/16 00:04:19 INFO client.ClientWrapper: Loaded
org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version
2.6.0-cdh5.7.0
17/05/16 00:04:20 INFO hive.metastore: Trying to connect to metastore
with URI thrift://jarvis-hdfs003.internal.shopkick.com:9083
17/05/16 00:04:20 INFO hive.metastore: Opened a connection to
metastore, current connections: 1
17/05/16 00:04:20 INFO hive.metastore: Connected to metastore.
17/05/16 00:04:20 INFO session.SessionState: Created HDFS directory:
file:/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/spark-2217d267-a3c0-4cf4-9565-45f80517d41c/scratch/hdfs
17/05/16 00:04:20 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/yarn
17/05/16 00:04:20 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/478f39e9-5295-4e8e-97aa-40b5828f9440_resources
17/05/16 00:04:20 INFO session.SessionState: Created HDFS directory:
file:/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/spark-2217d267-a3c0-4cf4-9565-45f80517d41c/scratch/hdfs/478f39e9-5295-4e8e-97aa-40b5828f9440
17/05/16 00:04:20 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/yarn/478f39e9-5295-4e8e-97aa-40b5828f9440
17/05/16 00:04:20 INFO session.SessionState: Created HDFS directory:
file:/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/spark-2217d267-a3c0-4cf4-9565-45f80517d41c/scratch/hdfs/478f39e9-5295-4e8e-97aa-40b5828f9440/_tmp_space.db
17/05/16 00:04:20 INFO session.SessionState: No Tez session required
at this point. hive.execution.engine=mr.
17/05/16 00:04:20 INFO repl.SparkInterpreter: Created sql context
(with Hive support).


On Mon, May 15, 2017 at 5:43 PM, Jeff Zhang  wrote:

>
> Which version of zeppelin do you use ? And can you check the yarn app log ?
>
>
> Ben Vogan 于2017年5月15日周一 下午5:56写道：
>
>> Hi all,
>>
>> For some reason today I'm getting a stack:
>>
>> org.apache.zeppelin.livy.LivyException: Fail to create
>> SQLContext,:4: error: illegal inheritance;
>> at org.apache.zeppelin.livy.LivySparkSQLInterpreter.open(
>> LivySparkSQLInterpreter.java:76)
>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(
>> LazyOpenInterpreter.java:70)
>> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$
>> InterpretJob.jobRun(RemoteInterpreterServer.java:483)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(
>> FIFOScheduler.java:139)
>> at java.util.concurrent.Executors$RunnableAdapter.
>> call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
&

Re: Illegal Inheritance error

2017-05-16 Thread Ben Vogan

19:47:42 INFO ContextLauncher: 17/05/16 19:47:42 INFO
spark.SecurityManager: Changing modify acls to: hdfs
17/05/16 19:47:42 INFO ContextLauncher: 17/05/16 19:47:42 INFO
spark.SecurityManager: SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(hdfs); users with modify
permissions: Set(hdfs)
17/05/16 19:47:42 INFO ContextLauncher: 17/05/16 19:47:42 INFO yarn.Client:
Submitting application 386 to ResourceManager
17/05/16 19:47:42 INFO ContextLauncher: 17/05/16 19:47:42 INFO
impl.YarnClientImpl: Submitted application application_1494373289850_0386
17/05/16 19:47:42 INFO ContextLauncher: 17/05/16 19:47:42 INFO yarn.Client:
Application report for application_1494373289850_0386 (state: ACCEPTED)
17/05/16 19:47:42 INFO ContextLauncher: 17/05/16 19:47:42 INFO yarn.Client:
17/05/16 19:47:42 INFO ContextLauncher:  client token: N/A
17/05/16 19:47:42 INFO ContextLauncher:  diagnostics: N/A
17/05/16 19:47:42 INFO ContextLauncher:  ApplicationMaster host: N/A
17/05/16 19:47:42 INFO ContextLauncher:  ApplicationMaster RPC
port: -1
17/05/16 19:47:42 INFO ContextLauncher:  queue: root.hdfs
17/05/16 19:47:42 INFO ContextLauncher:  start time: 1494964062698
17/05/16 19:47:42 INFO ContextLauncher:  final status: UNDEFINED


On Mon, May 15, 2017 at 6:53 PM, Jeff Zhang  wrote:

>
> It is weird that the yarn app log shows the SQLContext is created
> successfully, but in zeppelin side it shows error of "Fail to create
> SQLContext"
>
> Ben Vogan 于2017年5月15日周一 下午8:07写道：
>
>> I am using 0.7.1 and I checked the yarn app log and don't see any
>> errors.  It looks like this:
>>
>> 17/05/16 00:04:12 INFO yarn.ApplicationMaster: Registered signal handlers 
>> for [TERM, HUP, INT]
>> 17/05/16 00:04:13 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
>> appattempt_1494373289850_0336_01
>> 17/05/16 00:04:13 INFO spark.SecurityManager: Changing view acls to: 
>> yarn,hdfs
>> 17/05/16 00:04:13 INFO spark.SecurityManager: Changing modify acls to: 
>> yarn,hdfs
>> 17/05/16 00:04:13 INFO spark.SecurityManager: SecurityManager: 
>> authentication disabled; ui acls disabled; users with view permissions: 
>> Set(yarn, hdfs); users with modify permissions: Set(yarn, hdfs)
>> 17/05/16 00:04:13 INFO yarn.ApplicationMaster: Starting the user application 
>> in a separate Thread
>> 17/05/16 00:04:13 INFO yarn.ApplicationMaster: Waiting for spark context 
>> initialization
>> 17/05/16 00:04:13 INFO yarn.ApplicationMaster: Waiting for spark context 
>> initialization ...
>> 17/05/16 00:04:14 INFO driver.RSCDriver: Connecting to: 
>> jarvis-hue002.internal.shopkick.com:40819
>> 17/05/16 00:04:14 INFO driver.RSCDriver: Starting RPC server...
>> 17/05/16 00:04:14 WARN rsc.RSCConf: Your hostname, 
>> jarvis-yarn008.internal.shopkick.com, resolves to a loopback address, but we 
>> couldn't find any external IP address!
>> 17/05/16 00:04:14 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you 
>> need to bind to another address.
>> 17/05/16 00:04:14 INFO driver.RSCDriver: Received job request 
>> cd7d1356-709d-4674-a85c-21edade2c38d
>> 17/05/16 00:04:14 INFO driver.RSCDriver: SparkContext not yet up, queueing 
>> job request.
>> 17/05/16 00:04:17 INFO spark.SparkContext: Running Spark version 1.6.0
>> 17/05/16 00:04:17 INFO spark.SecurityManager: Changing view acls to: 
>> yarn,hdfs
>> 17/05/16 00:04:17 INFO spark.SecurityManager: Changing modify acls to: 
>> yarn,hdfs
>> 17/05/16 00:04:17 INFO spark.SecurityManager: SecurityManager: 
>> authentication disabled; ui acls disabled; users with view permissions: 
>> Set(yarn, hdfs); users with modify permissions: Set(yarn, hdfs)
>> 17/05/16 00:04:17 INFO util.Utils: Successfully started service 
>> 'sparkDriver' on port 53267.
>> 17/05/16 00:04:18 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> 17/05/16 00:04:18 INFO Remoting: Starting remoting
>> 17/05/16 00:04:18 INFO Remoting: Remoting started; listening on addresses 
>> :[akka.tcp://sparkDriverActorSystem@10.19.194.147:38037]
>> 17/05/16 00:04:18 INFO Remoting: Remoting now listens on addresses: 
>> [akka.tcp://sparkDriverActorSystem@10.19.194.147:38037]
>> 17/05/16 00:04:18 INFO util.Utils: Successfully started service 
>> 'sparkDriverActorSystem' on port 38037.
>> 17/05/16 00:04:18 INFO spark.SparkEnv: Registering MapOutputTracker
>> 17/05/16 00:04:18 INFO spark.SparkEnv: Registering BlockManagerMaster
>> 17/05/16 00:04:18 INFO storage.DiskBlockManager: Created local directory at 
>> /yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/blockmgr-f46429a6-7466-42c1-bd

Re: Hive interpreter Error as soon as Hive query uses MapRed

2017-05-19 Thread Ben Vogan

I am running CDH 5.7 and Spark 1.6 as well and hive is working for me with
the following configuration:

Properties
namevalue
common.max_count 1000
default.driver org.apache.hive.jdbc.HiveDriver
default.password
default.url jdbc:hive2://hdfs004:1
default.user hive
zeppelin.interpreter.localRepo
/services/zeppelin/zeppelin-0.7.1/local-repo/2CECB8FBV
zeppelin.jdbc.auth.type
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.concurrent.use true
zeppelin.jdbc.keytab.location
zeppelin.jdbc.principal
Dependencies
artifactexclude
org.apache.hive:hive-jdbc:0.14.0
org.apache.hadoop:hadoop-common:2.6.0

I admit to not having spent time figuring out whether there are any edge
cases that are broken because I am using the open source version of the
odbc driver vs using the cloudera jars.  However, it definitely returns
results from complex select queries and has no issues with DDL statements
that I've tried.

Good luck!
--Ben

On Fri, May 19, 2017 at 12:10 PM, Meier, Alexander <
alexander.me...@t-systems-dmc.com> wrote:

> Yes, the script (i.e. The select statement) runs fine in hive cli, hue and
> also in spark sql ( spark sql also in zeppelin).
> Just not when using the hive interpreter in zeppelin.
>
>
>
> Sent from my iPhone
>
> Am 19.05.2017 um 19:35 schrieb Jongyoul Lee :
>
> Can you check your script works in native hive environment?
>
> On Fri, May 19, 2017 at 10:20 AM, Meier, Alexander <
> alexander.me...@t-systems-dmc.com> wrote:
>
>> Hi list
>>
>> I’m trying to get a Hive interpreter correctly running on a CDH 5.7
>> Cluster with Spark 1.6. Simple queries are running fine, but as soon as a
>> query needs a MapRed tasks in order to complete, the query fails with:
>>
>> java.sql.SQLException: Error while processing statement: FAILED:
>> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec
>> .mr.MapRedTask
>> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.
>> java:279)
>> at org.apache.commons.dbcp2.DelegatingStatement.execute(Delegat
>> ingStatement.java:291)
>> at org.apache.commons.dbcp2.DelegatingStatement.execute(Delegat
>> ingStatement.java:291)
>> at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInte
>> rpreter.java:580)
>> at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInter
>> preter.java:692)
>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.
>> interpret(LazyOpenInterpreter.java:95)
>> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServ
>> er$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOSchedu
>> ler.java:139)
>> at java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> etc…
>>
>> I’ve got the interpreter set up as follows:
>>
>> Properties
>> namevalue
>> default.driver
>> org.apache.hive.jdbc.HiveDriver
>> default.url
>>  jdbc:hive2://[hostname]:1
>> hive.driver
>>  org.apache.hive.jdbc.HiveDriver
>> hive.url
>> jdbc:hive2://[hostname]:1
>> zeppelin.interpreter.localRepo  /opt/zeppelin/local-repo/2CJ4XM2Z4
>>
>> Dependencies
>> artifact
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar
>> /opt/cloudera/parcels/CDH/lib/hadoop/client/hadoop-common.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-common.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar
>>
>>
>> Unfortunately I haven’t found any help googling around… anyone here with
>> some helpful input?
>>
>> Best regards and many thanks in advance,
>> Alex
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Integrating with Airflow

2017-05-19 Thread Ben Vogan

Hi all,

We are really enjoying the workflow of interacting with our data via
Zeppelin, but are not sold on using the built in cron scheduling
capability.  We would like to be able to create more complex DAGs that are
better suited for something like Airflow.  I was curious as to whether
anyone has done an integration of Zeppelin with Airflow.

Either directly from within Zeppelin, or from the Airflow side.

Thanks,
-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan

I do not expect the relationship between DAGs to be described in Zeppelin -
that would be done in Airflow.  It just seems that Zeppelin is such a great
tool for a data scientists workflow that it would be nice if once they are
done with the work the note could be productionized directly.  I could
envision a couple of scenarios:

1. Using a zeppelin instance to run the note via the REST API.  The
instance could be containerized and spun up specifically for a DAG or it
could be a permanently available one.
2. A note could be pulled from git and some part of the Zeppelin engine
could execute the note without the web UI at all.

I would expect on the airflow side there to be some special operators for
executing these.

If the scheduler is pluggable then it should be possible to create a plug
in that talks to the Airflow REST API.

I happen to prefer Zeppelin to Jupyter - although I get your point about
both being python.  I don't really view that as a problem - most of the big
data platforms I'm talking to are implemented on the JVM after all.  The
python part of Airflow is really just describing what gets run and it isn't
hard to run something that isn't written in python.

On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov 
wrote:

> We also use both Zeppelin and Airflow.
>
> I'm interested in hearing what others are doing here too.
>
> Although honestly there might be some challenges
> - Airflow expects a DAG structure, while a notebook has pretty linear
> structure;
> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
> help?).
> Jupyter+Airflow might be a more natural fit to integrate?
>
> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
> while Airflow is for more finalized workflows I guess?
>
> Thanks for bringing this up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan  wrote:
>
>> Hi all,
>>
>> We are really enjoying the workflow of interacting with our data via
>> Zeppelin, but are not sold on using the built in cron scheduling
>> capability.  We would like to be able to create more complex DAGs that are
>> better suited for something like Airflow.  I was curious as to whether
>> anyone has done an integration of Zeppelin with Airflow.
>>
>> Either directly from within Zeppelin, or from the Airflow side.
>>
>> Thanks,
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>

-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan

Thanks for sharing this Ruslan - I will take a look.

I agree that paragraphs can form tasks within a DAG.  My point was that
ideally a DAG could encompass multiple notes.  I.e. the completion of one
note triggers another and so on to complete an entire chain of dependent
tasks.

For example team A has a note that generates data set A*.  Teams B & C each
have notes that depend on A* to generate B* & C* for their specific
purposes.  It doesn't make sense for all of that to have to live in one
note, but they are all part of a single workflow.

Best,
--Ben

On Fri, May 19, 2017 at 9:02 PM, Ruslan Dautkhanov 
wrote:

> Thanks for sharing this Ben.
>
> I agree Zeppelin is a better fit with tighter integration with Spark and
> built-in visualizations.
>
> We have pretty much standardized on pySpark, so here's one of the scripts
> we use internally
> to extract %pyspark, %sql and %md paragraphs into a standalone script
> (that can be scheduled in Airflow for example)
> https://github.com/Tagar/stuff/blob/master/znote.py (patches are welcome
> :-)
>
> Hope this helps.
>
> ps. In my opinion adding dependencies between paragraphs wouldn't be that
> hard for simple cases,
> and can be first step to define a DAG in Zeppelin directly. It would be
> really awesome if we see this type of
> integration in the future.
>
> Othewise I don't see much value if a whole note/ whole workflow would run
> as a single task in Airflow.
> In my opinion, each paragraph has to be a task... then it'll be very
> useful.
>
>
> Thanks,
> Ruslan
>
>
> On Fri, May 19, 2017 at 4:55 PM, Ben Vogan  wrote:
>
>> I do not expect the relationship between DAGs to be described in Zeppelin
>> - that would be done in Airflow.  It just seems that Zeppelin is such a
>> great tool for a data scientists workflow that it would be nice if once
>> they are done with the work the note could be productionized directly.  I
>> could envision a couple of scenarios:
>>
>> 1. Using a zeppelin instance to run the note via the REST API.  The
>> instance could be containerized and spun up specifically for a DAG or it
>> could be a permanently available one.
>> 2. A note could be pulled from git and some part of the Zeppelin engine
>> could execute the note without the web UI at all.
>>
>> I would expect on the airflow side there to be some special operators for
>> executing these.
>>
>> If the scheduler is pluggable then it should be possible to create a plug
>> in that talks to the Airflow REST API.
>>
>> I happen to prefer Zeppelin to Jupyter - although I get your point about
>> both being python.  I don't really view that as a problem - most of the big
>> data platforms I'm talking to are implemented on the JVM after all.  The
>> python part of Airflow is really just describing what gets run and it isn't
>> hard to run something that isn't written in python.
>>
>> On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov 
>> wrote:
>>
>>> We also use both Zeppelin and Airflow.
>>>
>>> I'm interested in hearing what others are doing here too.
>>>
>>> Although honestly there might be some challenges
>>> - Airflow expects a DAG structure, while a notebook has pretty linear
>>> structure;
>>> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
>>> help?).
>>> Jupyter+Airflow might be a more natural fit to integrate?
>>>
>>> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
>>> while Airflow is for more finalized workflows I guess?
>>>
>>> Thanks for bringing this up.
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan  wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are really enjoying the workflow of interacting with our data via
>>>> Zeppelin, but are not sold on using the built in cron scheduling
>>>> capability.  We would like to be able to create more complex DAGs that are
>>>> better suited for something like Airflow.  I was curious as to whether
>>>> anyone has done an integration of Zeppelin with Airflow.
>>>>
>>>> Either directly from within Zeppelin, or from the Airflow side.
>>>>
>>>> Thanks,
>>>> --
>>>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>>>
>>>> <http://www.shopkick.com/>
>>>> <https://www.facebook.com/shopkick>
>>>> <https://www.instagram.com/shopkick/>

Re: Livy - add external libraries from additional maven repo

2017-05-30 Thread Ben Vogan

For what it's worth I have successfully added jar files and maven packages
to sessions using zeppelin & livy 0.3 - although not using %dep.  In the
interpreter settings I set the livy.spark.jars setting for jars that are on
my HDFS cluster, and livy.spark.jars.packages for maven packages - although
only using maven central and not a local repo.

--Ben

On Tue, May 30, 2017 at 12:36 PM, Felix Cheung 
wrote:

> To add, this might be an issue with Livy.
>
> I'm seeing something similar as well.
>
> If you can get a repo with calling the Livy REST API directly it will be
> worthwhile to follow up with the Livy community separately.
>
>
> --
> *From:* Felix Cheung 
> *Sent:* Tuesday, May 30, 2017 11:34:31 AM
> *To:* users@zeppelin.apache.org; users@zeppelin.apache.org
> *Subject:* Re: Livy - add external libraries from additional maven repo
>
> if I recall, %dep only works with the built in Spark interpreter and not
> the Livy interpreter.
>
> To manage dependency win Livy you will need to set Spark conf with Livy.
>
> --
> *From:* Theofilos Kakantousis 
> *Sent:* Tuesday, May 30, 2017 9:05:15 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Livy - add external libraries from additional maven repo
>
> Hi everyone,
>
> I'm using Zeppelin with Livy 0.4 and trying to add external libraries from
> an additional maven repo to my application according to the documentation
> available here
> .
> The example works fine, but when I set the livy.spark.jars.packages to my
> library the interpreter throws an unresolved dependency error.
>
> I have added the additional maven repository in the interpreter settings
> and have also tried setting livy.spark.jars.ivy but without luck. However,
> if I use the Spark interpreter with the following code it works fine.
>
> "%dep
> z.reset();
> z.addRepo("my repo").url("http://myrepo"; ).snapshot
> z.load("mygroup:myartifact:myversion");
>
> Has anyone managed to do that with Livy? Thanks!
>
> Cheers,
> Theo
>



-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: Centos 7 Compatibility

2017-06-21 Thread Ben Vogan

I have been running Zeppelin on CentOS 7.2 for the last couple of months
without issue.

--Ben

On Wed, Jun 21, 2017 at 12:37 PM, Jim Lola  wrote:

> The beauty of Open Source, like Apache Zeppelin, is that you can try SW on
> new OS's.
>
> Per the Apache Zepplin documentation, CentOS 6 is supported.  CentOS 7 is
> NOT mentioned.
>
> There is actually a very large difference is Linux OS kernels between
> CentOS 6 and CentOS 7.   CentOS 6 is based on the Linux kernel version
> 2.6.32-71 while CentOS 7 is based on Linux kernel version 3.10.0-123.  The
> default file system is different as are the run levels.  The init system in
> CentOS 7 is now using systemd and so init is being replaced/updated.  There
> are a lot more changes between CentOS 6 to CentOS 7.
>
> It sounds like a good opportunity to get involved w/ future development of
> Apache Zeppelin.
>
>
>
> On Wed, Jun 21, 2017 at 11:10 AM, Benjamin Kim  wrote:
>
>> All,
>>
>> I’m curious to know if Zeppelin will work with CentOS 7. I don’t see it
>> in the list of OS’s supported.
>>
>> Thanks,
>> Ben
>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: Centos 7 Compatibility

2017-06-21 Thread Ben Vogan

I've been running Zeppelin 0.7.1 and no I didn't have to make any
non-standard configuration changes that I recall.  I was very pleased with
how easy it was to get up and running.

--Ben

On Wed, Jun 21, 2017 at 1:43 PM, Jim Lola  wrote:

> Which version of Zeppelin do you have working on CentOS 7.2?  Did you make
> any different/non-standard configuration changes to get it to work
> properly?  If so, could you please share them.
>
> On Wed, Jun 21, 2017 at 12:30 PM, Ben Vogan  wrote:
>
>> I have been running Zeppelin on CentOS 7.2 for the last couple of months
>> without issue.
>>
>> --Ben
>>
>> On Wed, Jun 21, 2017 at 12:37 PM, Jim Lola  wrote:
>>
>>> The beauty of Open Source, like Apache Zeppelin, is that you can try SW
>>> on new OS's.
>>>
>>> Per the Apache Zepplin documentation, CentOS 6 is supported.  CentOS 7
>>> is NOT mentioned.
>>>
>>> There is actually a very large difference is Linux OS kernels between
>>> CentOS 6 and CentOS 7.   CentOS 6 is based on the Linux kernel version
>>> 2.6.32-71 while CentOS 7 is based on Linux kernel version 3.10.0-123.  The
>>> default file system is different as are the run levels.  The init system in
>>> CentOS 7 is now using systemd and so init is being replaced/updated.  There
>>> are a lot more changes between CentOS 6 to CentOS 7.
>>>
>>> It sounds like a good opportunity to get involved w/ future development
>>> of Apache Zeppelin.
>>>
>>>
>>>
>>> On Wed, Jun 21, 2017 at 11:10 AM, Benjamin Kim 
>>> wrote:
>>>
>>>> All,
>>>>
>>>> I’m curious to know if Zeppelin will work with CentOS 7. I don’t see it
>>>> in the list of OS’s supported.
>>>>
>>>> Thanks,
>>>> Ben
>>>
>>>
>>>
>>
>>
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

Showing pandas dataframe with utf8 strings

2017-07-11 Thread Ben Vogan

Hi all,

I am trying to use the zeppelin context to show the contents of a pandas
DataFrame and getting the following error:

Traceback (most recent call last):
  File "/tmp/zeppelin_python-7554503996532642522.py", line 278, in 
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_python-7554503996532642522.py", line 271, in 
exec(code)
  File "", line 2, in 
  File "/tmp/zeppelin_python-7554503996532642522.py", line 93, in show
self.show_dataframe(p, **kwargs)
  File "/tmp/zeppelin_python-7554503996532642522.py", line 121, in
show_dataframe
body_buf.write(str(cell))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
position 79: ordinal not in range(128)

How do I go about resolving this?

I'm running version 0.7.1 with python 2.7.

Thanks,

-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: Showing pandas dataframe with utf8 strings

2017-07-11 Thread Ben Vogan

Hi Ruslan,

I tried adding:

 export LC_ALL="en_US.utf8"

To my zeppelin-env.sh script and restarted Zeppelin, but I still have the
same problem.  The print statement:

python -c "print (u'\xf1')"

works from the note.  I think the problem is the use of the str function.
Looking at the stack you can see that the zeppelin code is calling
body_buf.write(str(cell)).  If you call str(u'\xf1') you will get the error.

--Ben

On Tue, Jul 11, 2017 at 10:19 AM, Ruslan Dautkhanov 
wrote:

> $ env | grep LC
>> $
>> $ python -c "print (u'\xf1')"
>> ñ
>>
>
>
>> $ export LC_ALL="C"
>> $ python -c "print (u'\xf1')"
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
>> position 0: ordinal not in range(128)
>>
>
>
>> $ export LC_ALL="en_US.utf8"
>> $ python -c "print (u'\xf1')"
>> ñ
>>
>
>
>> $ unset LC_ALL
>> $ env | grep LC
>> $
>> $ python -c "print (u'El Ni\xf1o')"
>> El Niño
>
>
> You could add LC_ALL export to your zeppelin-env.sh script.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Jul 11, 2017 at 9:35 AM, Ben Vogan  wrote:
>
>> Hi all,
>>
>> I am trying to use the zeppelin context to show the contents of a pandas
>> DataFrame and getting the following error:
>>
>> Traceback (most recent call last):
>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 278, in
>> 
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 271, in
>> 
>> exec(code)
>>   File "", line 2, in 
>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 93, in show
>> self.show_dataframe(p, **kwargs)
>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 121, in
>> show_dataframe
>> body_buf.write(str(cell))
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
>> position 79: ordinal not in range(128)
>>
>> How do I go about resolving this?
>>
>> I'm running version 0.7.1 with python 2.7.
>>
>> Thanks,
>>
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

Zeppelin OAuth

2017-07-11 Thread Ben Vogan

Hi all,

Does anyone have instructions on how to configure Zeppelin for OAuth (I'm
trying to use Google)?

Failing that, I can put Zeppelin behind an OAuth proxy, but would then need
a Shiro component that would take the user identity out of the request
headers and log the user in.  Has anyone done something like this?

Thanks,
-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: Showing pandas dataframe with utf8 strings

2017-07-11 Thread Ben Vogan

Here is the specific example that is failing:

import pandas
z.show(pandas.DataFrame([u'Jalape\xf1os.'],[1],['Menu']))

On Tue, Jul 11, 2017 at 2:32 PM, Ruslan Dautkhanov 
wrote:

> Hi Ben,
>
> I can't reproduce this
>
> from pyspark.sql.types import *
>> rdd = sc.parallelize([[u'El Niño']])
>> df = sqlc.createDataFrame(
>>   rdd, schema=StructType([StructField("unicode data",
>> StringType(), True)])
>> )
>> df.show()
>> z.show(df)
>
>
> shows unicode character fine.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Jul 11, 2017 at 11:37 AM, Ben Vogan  wrote:
>
>> Hi Ruslan,
>>
>> I tried adding:
>>
>>  export LC_ALL="en_US.utf8"
>>
>> To my zeppelin-env.sh script and restarted Zeppelin, but I still have the
>> same problem.  The print statement:
>>
>> python -c "print (u'\xf1')"
>>
>> works from the note.  I think the problem is the use of the str
>> function.  Looking at the stack you can see that the zeppelin code is
>> calling body_buf.write(str(cell)).  If you call str(u'\xf1') you will get
>> the error.
>>
>> --Ben
>>
>> On Tue, Jul 11, 2017 at 10:19 AM, Ruslan Dautkhanov > > wrote:
>>
>>> $ env | grep LC
>>>> $
>>>> $ python -c "print (u'\xf1')"
>>>> ñ
>>>>
>>>
>>>
>>>> $ export LC_ALL="C"
>>>> $ python -c "print (u'\xf1')"
>>>> Traceback (most recent call last):
>>>>   File "", line 1, in 
>>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
>>>> position 0: ordinal not in range(128)
>>>>
>>>
>>>
>>>> $ export LC_ALL="en_US.utf8"
>>>> $ python -c "print (u'\xf1')"
>>>> ñ
>>>>
>>>
>>>
>>>> $ unset LC_ALL
>>>> $ env | grep LC
>>>> $
>>>> $ python -c "print (u'El Ni\xf1o')"
>>>> El Niño
>>>
>>>
>>> You could add LC_ALL export to your zeppelin-env.sh script.
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Tue, Jul 11, 2017 at 9:35 AM, Ben Vogan  wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to use the zeppelin context to show the contents of a
>>>> pandas DataFrame and getting the following error:
>>>>
>>>> Traceback (most recent call last):
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 278, in
>>>> 
>>>> raise Exception(traceback.format_exc())
>>>> Exception: Traceback (most recent call last):
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 271, in
>>>> 
>>>> exec(code)
>>>>   File "", line 2, in 
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 93, in show
>>>> self.show_dataframe(p, **kwargs)
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 121, in
>>>> show_dataframe
>>>> body_buf.write(str(cell))
>>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
>>>> position 79: ordinal not in range(128)
>>>>
>>>> How do I go about resolving this?
>>>>
>>>> I'm running version 0.7.1 with python 2.7.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>>>
>>>> <http://www.shopkick.com/>
>>>> <https://www.facebook.com/shopkick>
>>>> <https://www.instagram.com/shopkick/>
>>>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>>>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>>>
>>>
>>>
>>
>>
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

Locking a paragraph for editing

2017-07-20 Thread Ben Vogan

Hi all,

Is it possible to allow someone to view and run paragraphs in a note, but
not allow them to edit them?  If so, how do I do that?

Thanks,

-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

Re: Zeppelin Stops Loading Notes

2017-08-19 Thread Ben Vogan

I have seen Zeppelin get into this state once.  I restarted it without
investigating the logs however so I don't have anything useful to go on as
to why.

--Ben

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner  wrote:

> You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our
> zeppelin-env.sh. I’m going to comment that out. I suspect it is actually
> unrelated to the behavior we are seeing where pages stop loading though.
> Anyone else see this happen?
>
> I’ll report back if that happens again after the fix.
>
>  
>  Paul Brenner 
>  
>  
> 
> 
> DATA SCIENTIST
> *(217) 390-3033 <(217)%20390-3033> *
>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> [image:
> PlaceIQ:Landmark by PlaceIQ]
> 
>
> On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee  > wrote:
>
>> Hi,
>>
>> One of configuration value in your conf/zeppelin-env.sh or
>> conf/zeppelin-site.xml seems "false" which expected to be to a number.
>>
>> Do you have any environment variable or property set to "false" for the
>> configurations below?
>>
>> ZEPPELIN_PORT, zeppelin.server.port
>> ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
>> ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.
>> timeout
>> ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
>> ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit
>>
>> Thanks,
>> moon
>>
>> On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner 
>> wrote:
>>
>>> We have a team of 5 users who all use the same zeppelin server. Lately a
>>> few times we have run into a case where zeppelin notes stop responding and
>>> then when we try refreshing the webpage for the note all that loads is the
>>> zeppelin header with no note. When I look at the logs I see:
>>>  INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114}
>>> NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 :
>>> 55178 : nshah : GET_NOTE : 2CR2ANDEX
>>>  INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115}
>>> NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 :
>>> 57366. (1001) Idle Timeout
>>>  INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121}
>>> NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 :
>>> 57461. (1001) Idle Timeout
>>>  INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122}
>>> AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or
>>> cacheManager properties have been set.  Authorization cache cannot be
>>> obtained.
>>>  INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122}
>>> AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or
>>> cacheManager properties have been set.  Authorization cache cannot be
>>> obtained.
>>>  INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122}
>>> AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache

Python interpreter getting hung

2018-10-08 Thread Ben Vogan

Hi there,

We are using Zeppelin in a shared environment and are having persistent
problems with the Python interpreter getting into a state where paragraphs
are PENDING forever.  I have looked at the zeppelin-interpreter-python*.log
files and there is virtually nothing in there - just starting/finished
events.  Is there any way to see a log of any errors from python scripts?
Any suggestions on how to debug this?

Help is greatly appreciated!

Thanks,
-- 
*BENJAMIN VOGAN | *Director of Architecture

org.apache.spark.SparkException: Could not parse Master URL: 'yarn'

Re: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'

ZeppelinContext textbox for passwords

Illegal Inheritance error

Re: Illegal Inheritance error

Re: Illegal Inheritance error

Re: Illegal Inheritance error

Re: Hive interpreter Error as soon as Hive query uses MapRed

Integrating with Airflow

Re: Integrating with Airflow

Re: Integrating with Airflow

Re: Livy - add external libraries from additional maven repo

Re: Centos 7 Compatibility

Re: Centos 7 Compatibility

Showing pandas dataframe with utf8 strings

Re: Showing pandas dataframe with utf8 strings

Zeppelin OAuth

Re: Showing pandas dataframe with utf8 strings

Locking a paragraph for editing

Re: Zeppelin Stops Loading Notes

Python interpreter getting hung

21 matches

Site Navigation

Mail list logo

Footer information