Python interpreter getting hung

2018-10-08 Thread Ben Vogan
Hi there,

We are using Zeppelin in a shared environment and are having persistent
problems with the Python interpreter getting into a state where paragraphs
are PENDING forever.  I have looked at the zeppelin-interpreter-python*.log
files and there is virtually nothing in there - just starting/finished
events.  Is there any way to see a log of any errors from python scripts?
Any suggestions on how to debug this?

Help is greatly appreciated!

Thanks,
-- 
*BENJAMIN VOGAN | *Director of Architecture

 

 
 
 
 




Re: Zeppelin Stops Loading Notes

2017-08-19 Thread Ben Vogan
I have seen Zeppelin get into this state once.  I restarted it without
investigating the logs however so I don't have anything useful to go on as
to why.

--Ben

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner  wrote:

> You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our
> zeppelin-env.sh. I’m going to comment that out. I suspect it is actually
> unrelated to the behavior we are seeing where pages stop loading though.
> Anyone else see this happen?
>
> I’ll report back if that happens again after the fix.
>
>  
>  Paul Brenner 
>  
>  
> 
> 
> DATA SCIENTIST
> *(217) 390-3033 <(217)%20390-3033> *
>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> [image:
> PlaceIQ:Landmark by PlaceIQ]
> 
>
> On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee  > wrote:
>
>> Hi,
>>
>> One of configuration value in your conf/zeppelin-env.sh or
>> conf/zeppelin-site.xml seems "false" which expected to be to a number.
>>
>> Do you have any environment variable or property set to "false" for the
>> configurations below?
>>
>> ZEPPELIN_PORT, zeppelin.server.port
>> ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
>> ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.
>> timeout
>> ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
>> ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit
>>
>> Thanks,
>> moon
>>
>> On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner 
>> wrote:
>>
>>> We have a team of 5 users who all use the same zeppelin server. Lately a
>>> few times we have run into a case where zeppelin notes stop responding and
>>> then when we try refreshing the webpage for the note all that loads is the
>>> zeppelin header with no note. When I look at the logs I see:
>>>  INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114}
>>> NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 :
>>> 55178 : nshah : GET_NOTE : 2CR2ANDEX
>>>  INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115}
>>> NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 :
>>> 57366. (1001) Idle Timeout
>>>  INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121}
>>> NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 :
>>> 57461. (1001) Idle Timeout
>>>  INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122}
>>> AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or
>>> cacheManager properties have been set.  Authorization cache cannot be
>>> obtained.
>>>  INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122}
>>> AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or
>>> cacheManager properties have been set.  Authorization cache cannot be
>>> obtained.
>>>  INFO [2017-08-18 21:25:10,172] 

Re: Showing pandas dataframe with utf8 strings

2017-07-11 Thread Ben Vogan
Here is the specific example that is failing:

import pandas
z.show(pandas.DataFrame([u'Jalape\xf1os.'],[1],['Menu']))

On Tue, Jul 11, 2017 at 2:32 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> Hi Ben,
>
> I can't reproduce this
>
> from pyspark.sql.types import *
>> rdd = sc.parallelize([[u'El Niño']])
>> df = sqlc.createDataFrame(
>>   rdd, schema=StructType([StructField("unicode data",
>> StringType(), True)])
>> )
>> df.show()
>> z.show(df)
>
>
> shows unicode character fine.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Tue, Jul 11, 2017 at 11:37 AM, Ben Vogan <b...@shopkick.com> wrote:
>
>> Hi Ruslan,
>>
>> I tried adding:
>>
>>  export LC_ALL="en_US.utf8"
>>
>> To my zeppelin-env.sh script and restarted Zeppelin, but I still have the
>> same problem.  The print statement:
>>
>> python -c "print (u'\xf1')"
>>
>> works from the note.  I think the problem is the use of the str
>> function.  Looking at the stack you can see that the zeppelin code is
>> calling body_buf.write(str(cell)).  If you call str(u'\xf1') you will get
>> the error.
>>
>> --Ben
>>
>> On Tue, Jul 11, 2017 at 10:19 AM, Ruslan Dautkhanov <dautkha...@gmail.com
>> > wrote:
>>
>>> $ env | grep LC
>>>> $
>>>> $ python -c "print (u'\xf1')"
>>>> ñ
>>>>
>>>
>>>
>>>> $ export LC_ALL="C"
>>>> $ python -c "print (u'\xf1')"
>>>> Traceback (most recent call last):
>>>>   File "", line 1, in 
>>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
>>>> position 0: ordinal not in range(128)
>>>>
>>>
>>>
>>>> $ export LC_ALL="en_US.utf8"
>>>> $ python -c "print (u'\xf1')"
>>>> ñ
>>>>
>>>
>>>
>>>> $ unset LC_ALL
>>>> $ env | grep LC
>>>> $
>>>> $ python -c "print (u'El Ni\xf1o')"
>>>> El Niño
>>>
>>>
>>> You could add LC_ALL export to your zeppelin-env.sh script.
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Tue, Jul 11, 2017 at 9:35 AM, Ben Vogan <b...@shopkick.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to use the zeppelin context to show the contents of a
>>>> pandas DataFrame and getting the following error:
>>>>
>>>> Traceback (most recent call last):
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 278, in
>>>> 
>>>> raise Exception(traceback.format_exc())
>>>> Exception: Traceback (most recent call last):
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 271, in
>>>> 
>>>> exec(code)
>>>>   File "", line 2, in 
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 93, in show
>>>> self.show_dataframe(p, **kwargs)
>>>>   File "/tmp/zeppelin_python-7554503996532642522.py", line 121, in
>>>> show_dataframe
>>>> body_buf.write(str(cell))
>>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
>>>> position 79: ordinal not in range(128)
>>>>
>>>> How do I go about resolving this?
>>>>
>>>> I'm running version 0.7.1 with python 2.7.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>>>
>>>> <http://www.shopkick.com/>
>>>> <https://www.facebook.com/shopkick>
>>>> <https://www.instagram.com/shopkick/>
>>>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>>>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>>>
>>>
>>>
>>
>>
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>


Showing pandas dataframe with utf8 strings

2017-07-11 Thread Ben Vogan
Hi all,

I am trying to use the zeppelin context to show the contents of a pandas
DataFrame and getting the following error:

Traceback (most recent call last):
  File "/tmp/zeppelin_python-7554503996532642522.py", line 278, in 
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_python-7554503996532642522.py", line 271, in 
exec(code)
  File "", line 2, in 
  File "/tmp/zeppelin_python-7554503996532642522.py", line 93, in show
self.show_dataframe(p, **kwargs)
  File "/tmp/zeppelin_python-7554503996532642522.py", line 121, in
show_dataframe
body_buf.write(str(cell))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
position 79: ordinal not in range(128)

How do I go about resolving this?

I'm running version 0.7.1 with python 2.7.

Thanks,

-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 
 



Re: Centos 7 Compatibility

2017-06-21 Thread Ben Vogan
I've been running Zeppelin 0.7.1 and no I didn't have to make any
non-standard configuration changes that I recall.  I was very pleased with
how easy it was to get up and running.

--Ben

On Wed, Jun 21, 2017 at 1:43 PM, Jim Lola <jim.l...@gmail.com> wrote:

> Which version of Zeppelin do you have working on CentOS 7.2?  Did you make
> any different/non-standard configuration changes to get it to work
> properly?  If so, could you please share them.
>
> On Wed, Jun 21, 2017 at 12:30 PM, Ben Vogan <b...@shopkick.com> wrote:
>
>> I have been running Zeppelin on CentOS 7.2 for the last couple of months
>> without issue.
>>
>> --Ben
>>
>> On Wed, Jun 21, 2017 at 12:37 PM, Jim Lola <jim.l...@gmail.com> wrote:
>>
>>> The beauty of Open Source, like Apache Zeppelin, is that you can try SW
>>> on new OS's.
>>>
>>> Per the Apache Zepplin documentation, CentOS 6 is supported.  CentOS 7
>>> is NOT mentioned.
>>>
>>> There is actually a very large difference is Linux OS kernels between
>>> CentOS 6 and CentOS 7.   CentOS 6 is based on the Linux kernel version
>>> 2.6.32-71 while CentOS 7 is based on Linux kernel version 3.10.0-123.  The
>>> default file system is different as are the run levels.  The init system in
>>> CentOS 7 is now using systemd and so init is being replaced/updated.  There
>>> are a lot more changes between CentOS 6 to CentOS 7.
>>>
>>> It sounds like a good opportunity to get involved w/ future development
>>> of Apache Zeppelin.
>>>
>>>
>>>
>>> On Wed, Jun 21, 2017 at 11:10 AM, Benjamin Kim <bbuil...@gmail.com>
>>> wrote:
>>>
>>>> All,
>>>>
>>>> I’m curious to know if Zeppelin will work with CentOS 7. I don’t see it
>>>> in the list of OS’s supported.
>>>>
>>>> Thanks,
>>>> Ben
>>>
>>>
>>>
>>
>>
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>


Re: Centos 7 Compatibility

2017-06-21 Thread Ben Vogan
I have been running Zeppelin on CentOS 7.2 for the last couple of months
without issue.

--Ben

On Wed, Jun 21, 2017 at 12:37 PM, Jim Lola  wrote:

> The beauty of Open Source, like Apache Zeppelin, is that you can try SW on
> new OS's.
>
> Per the Apache Zepplin documentation, CentOS 6 is supported.  CentOS 7 is
> NOT mentioned.
>
> There is actually a very large difference is Linux OS kernels between
> CentOS 6 and CentOS 7.   CentOS 6 is based on the Linux kernel version
> 2.6.32-71 while CentOS 7 is based on Linux kernel version 3.10.0-123.  The
> default file system is different as are the run levels.  The init system in
> CentOS 7 is now using systemd and so init is being replaced/updated.  There
> are a lot more changes between CentOS 6 to CentOS 7.
>
> It sounds like a good opportunity to get involved w/ future development of
> Apache Zeppelin.
>
>
>
> On Wed, Jun 21, 2017 at 11:10 AM, Benjamin Kim  wrote:
>
>> All,
>>
>> I’m curious to know if Zeppelin will work with CentOS 7. I don’t see it
>> in the list of OS’s supported.
>>
>> Thanks,
>> Ben
>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 
 



Re: Livy - add external libraries from additional maven repo

2017-05-30 Thread Ben Vogan
For what it's worth I have successfully added jar files and maven packages
to sessions using zeppelin & livy 0.3 - although not using %dep.  In the
interpreter settings I set the livy.spark.jars setting for jars that are on
my HDFS cluster, and livy.spark.jars.packages for maven packages - although
only using maven central and not a local repo.

--Ben

On Tue, May 30, 2017 at 12:36 PM, Felix Cheung 
wrote:

> To add, this might be an issue with Livy.
>
> I'm seeing something similar as well.
>
> If you can get a repo with calling the Livy REST API directly it will be
> worthwhile to follow up with the Livy community separately.
>
>
> --
> *From:* Felix Cheung 
> *Sent:* Tuesday, May 30, 2017 11:34:31 AM
> *To:* users@zeppelin.apache.org; users@zeppelin.apache.org
> *Subject:* Re: Livy - add external libraries from additional maven repo
>
> if I recall, %dep only works with the built in Spark interpreter and not
> the Livy interpreter.
>
> To manage dependency win Livy you will need to set Spark conf with Livy.
>
> --
> *From:* Theofilos Kakantousis 
> *Sent:* Tuesday, May 30, 2017 9:05:15 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Livy - add external libraries from additional maven repo
>
> Hi everyone,
>
> I'm using Zeppelin with Livy 0.4 and trying to add external libraries from
> an additional maven repo to my application according to the documentation
> available here
> .
> The example works fine, but when I set the livy.spark.jars.packages to my
> library the interpreter throws an unresolved dependency error.
>
> I have added the additional maven repository in the interpreter settings
> and have also tried setting livy.spark.jars.ivy but without luck. However,
> if I use the Spark interpreter with the following code it works fine.
>
> "%dep
> z.reset();
> z.addRepo("my repo").url("http://myrepo; ).snapshot
> z.load("mygroup:myartifact:myversion");
>
> Has anyone managed to do that with Livy? Thanks!
>
> Cheers,
> Theo
>



-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 
 



Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
Thanks for sharing this Ruslan - I will take a look.

I agree that paragraphs can form tasks within a DAG.  My point was that
ideally a DAG could encompass multiple notes.  I.e. the completion of one
note triggers another and so on to complete an entire chain of dependent
tasks.

For example team A has a note that generates data set A*.  Teams B & C each
have notes that depend on A* to generate B* & C* for their specific
purposes.  It doesn't make sense for all of that to have to live in one
note, but they are all part of a single workflow.

Best,
--Ben

On Fri, May 19, 2017 at 9:02 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> Thanks for sharing this Ben.
>
> I agree Zeppelin is a better fit with tighter integration with Spark and
> built-in visualizations.
>
> We have pretty much standardized on pySpark, so here's one of the scripts
> we use internally
> to extract %pyspark, %sql and %md paragraphs into a standalone script
> (that can be scheduled in Airflow for example)
> https://github.com/Tagar/stuff/blob/master/znote.py (patches are welcome
> :-)
>
> Hope this helps.
>
> ps. In my opinion adding dependencies between paragraphs wouldn't be that
> hard for simple cases,
> and can be first step to define a DAG in Zeppelin directly. It would be
> really awesome if we see this type of
> integration in the future.
>
> Othewise I don't see much value if a whole note/ whole workflow would run
> as a single task in Airflow.
> In my opinion, each paragraph has to be a task... then it'll be very
> useful.
>
>
> Thanks,
> Ruslan
>
>
> On Fri, May 19, 2017 at 4:55 PM, Ben Vogan <b...@shopkick.com> wrote:
>
>> I do not expect the relationship between DAGs to be described in Zeppelin
>> - that would be done in Airflow.  It just seems that Zeppelin is such a
>> great tool for a data scientists workflow that it would be nice if once
>> they are done with the work the note could be productionized directly.  I
>> could envision a couple of scenarios:
>>
>> 1. Using a zeppelin instance to run the note via the REST API.  The
>> instance could be containerized and spun up specifically for a DAG or it
>> could be a permanently available one.
>> 2. A note could be pulled from git and some part of the Zeppelin engine
>> could execute the note without the web UI at all.
>>
>> I would expect on the airflow side there to be some special operators for
>> executing these.
>>
>> If the scheduler is pluggable then it should be possible to create a plug
>> in that talks to the Airflow REST API.
>>
>> I happen to prefer Zeppelin to Jupyter - although I get your point about
>> both being python.  I don't really view that as a problem - most of the big
>> data platforms I'm talking to are implemented on the JVM after all.  The
>> python part of Airflow is really just describing what gets run and it isn't
>> hard to run something that isn't written in python.
>>
>> On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
>> wrote:
>>
>>> We also use both Zeppelin and Airflow.
>>>
>>> I'm interested in hearing what others are doing here too.
>>>
>>> Although honestly there might be some challenges
>>> - Airflow expects a DAG structure, while a notebook has pretty linear
>>> structure;
>>> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
>>> help?).
>>> Jupyter+Airflow might be a more natural fit to integrate?
>>>
>>> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
>>> while Airflow is for more finalized workflows I guess?
>>>
>>> Thanks for bringing this up.
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan <b...@shopkick.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are really enjoying the workflow of interacting with our data via
>>>> Zeppelin, but are not sold on using the built in cron scheduling
>>>> capability.  We would like to be able to create more complex DAGs that are
>>>> better suited for something like Airflow.  I was curious as to whether
>>>> anyone has done an integration of Zeppelin with Airflow.
>>>>
>>>> Either directly from within Zeppelin, or from the Airflow side.
>>>>
>>>> Thanks,
>>>> --
>>>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>>>
>>>> <http://www.shopkick.com/>
>>>> <https://www.facebook.com/shopkick>
>

Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
I do not expect the relationship between DAGs to be described in Zeppelin -
that would be done in Airflow.  It just seems that Zeppelin is such a great
tool for a data scientists workflow that it would be nice if once they are
done with the work the note could be productionized directly.  I could
envision a couple of scenarios:

1. Using a zeppelin instance to run the note via the REST API.  The
instance could be containerized and spun up specifically for a DAG or it
could be a permanently available one.
2. A note could be pulled from git and some part of the Zeppelin engine
could execute the note without the web UI at all.

I would expect on the airflow side there to be some special operators for
executing these.

If the scheduler is pluggable then it should be possible to create a plug
in that talks to the Airflow REST API.

I happen to prefer Zeppelin to Jupyter - although I get your point about
both being python.  I don't really view that as a problem - most of the big
data platforms I'm talking to are implemented on the JVM after all.  The
python part of Airflow is really just describing what gets run and it isn't
hard to run something that isn't written in python.

On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> We also use both Zeppelin and Airflow.
>
> I'm interested in hearing what others are doing here too.
>
> Although honestly there might be some challenges
> - Airflow expects a DAG structure, while a notebook has pretty linear
> structure;
> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
> help?).
> Jupyter+Airflow might be a more natural fit to integrate?
>
> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
> while Airflow is for more finalized workflows I guess?
>
> Thanks for bringing this up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan <b...@shopkick.com> wrote:
>
>> Hi all,
>>
>> We are really enjoying the workflow of interacting with our data via
>> Zeppelin, but are not sold on using the built in cron scheduling
>> capability.  We would like to be able to create more complex DAGs that are
>> better suited for something like Airflow.  I was curious as to whether
>> anyone has done an integration of Zeppelin with Airflow.
>>
>> Either directly from within Zeppelin, or from the Airflow side.
>>
>> Thanks,
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>


Re: Hive interpreter Error as soon as Hive query uses MapRed

2017-05-19 Thread Ben Vogan
I am running CDH 5.7 and Spark 1.6 as well and hive is working for me with
the following configuration:

Properties
namevalue
common.max_count 1000
default.driver org.apache.hive.jdbc.HiveDriver
default.password
default.url jdbc:hive2://hdfs004:1
default.user hive
zeppelin.interpreter.localRepo
/services/zeppelin/zeppelin-0.7.1/local-repo/2CECB8FBV
zeppelin.jdbc.auth.type
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.concurrent.use true
zeppelin.jdbc.keytab.location
zeppelin.jdbc.principal
Dependencies
artifactexclude
org.apache.hive:hive-jdbc:0.14.0
org.apache.hadoop:hadoop-common:2.6.0

I admit to not having spent time figuring out whether there are any edge
cases that are broken because I am using the open source version of the
odbc driver vs using the cloudera jars.  However, it definitely returns
results from complex select queries and has no issues with DDL statements
that I've tried.

Good luck!
--Ben

On Fri, May 19, 2017 at 12:10 PM, Meier, Alexander <
alexander.me...@t-systems-dmc.com> wrote:

> Yes, the script (i.e. The select statement) runs fine in hive cli, hue and
> also in spark sql ( spark sql also in zeppelin).
> Just not when using the hive interpreter in zeppelin.
>
>
>
> Sent from my iPhone
>
> Am 19.05.2017 um 19:35 schrieb Jongyoul Lee :
>
> Can you check your script works in native hive environment?
>
> On Fri, May 19, 2017 at 10:20 AM, Meier, Alexander <
> alexander.me...@t-systems-dmc.com> wrote:
>
>> Hi list
>>
>> I’m trying to get a Hive interpreter correctly running on a CDH 5.7
>> Cluster with Spark 1.6. Simple queries are running fine, but as soon as a
>> query needs a MapRed tasks in order to complete, the query fails with:
>>
>> java.sql.SQLException: Error while processing statement: FAILED:
>> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec
>> .mr.MapRedTask
>> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.
>> java:279)
>> at org.apache.commons.dbcp2.DelegatingStatement.execute(Delegat
>> ingStatement.java:291)
>> at org.apache.commons.dbcp2.DelegatingStatement.execute(Delegat
>> ingStatement.java:291)
>> at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInte
>> rpreter.java:580)
>> at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInter
>> preter.java:692)
>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.
>> interpret(LazyOpenInterpreter.java:95)
>> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServ
>> er$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOSchedu
>> ler.java:139)
>> at java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> etc…
>>
>> I’ve got the interpreter set up as follows:
>>
>> Properties
>> namevalue
>> default.driver
>> org.apache.hive.jdbc.HiveDriver
>> default.url
>>  jdbc:hive2://[hostname]:1
>> hive.driver
>>  org.apache.hive.jdbc.HiveDriver
>> hive.url
>> jdbc:hive2://[hostname]:1
>> zeppelin.interpreter.localRepo  /opt/zeppelin/local-repo/2CJ4XM2Z4
>>
>> Dependencies
>> artifact
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar
>> /opt/cloudera/parcels/CDH/lib/hadoop/client/hadoop-common.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-common.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar
>>
>>
>> Unfortunately I haven’t found any help googling around… anyone here with
>> some helpful input?
>>
>> Best regards and many thanks in advance,
>> Alex
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 
 



Re: Illegal Inheritance error

2017-05-15 Thread Ben Vogan
-jars/livy-core_2.10-0.3.0.jar
at spark://10.19.194.147:53267/jars/livy-core_2.10-0.3.0.jar with
timestamp 1494893058609
17/05/16 00:04:18 INFO cluster.YarnClusterScheduler: Created
YarnClusterScheduler
17/05/16 00:04:18 INFO util.Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port
57551.
17/05/16 00:04:18 INFO netty.NettyBlockTransferService: Server created on 57551
17/05/16 00:04:18 INFO storage.BlockManager: external shuffle service
port = 7337
17/05/16 00:04:18 INFO storage.BlockManagerMaster: Trying to register
BlockManager
17/05/16 00:04:18 INFO storage.BlockManagerMasterEndpoint: Registering
block manager 10.19.194.147:57551 with 1966.1 MB RAM,
BlockManagerId(driver, 10.19.194.147, 57551)
17/05/16 00:04:18 INFO storage.BlockManagerMaster: Registered BlockManager
17/05/16 00:04:19 INFO scheduler.EventLoggingListener: Logging events
to 
hdfs://jarvis-nameservice001/user/spark/applicationHistory/application_1494373289850_0336_1
17/05/16 00:04:19 INFO cluster.YarnClusterSchedulerBackend:
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.8
17/05/16 00:04:19 INFO cluster.YarnClusterScheduler:
YarnClusterScheduler.postStartHook done
17/05/16 00:04:19 INFO
cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster
registered as NettyRpcEndpointRef(spark://YarnAM@10.19.194.147:53267)
17/05/16 00:04:19 INFO yarn.YarnRMClient: Registering the ApplicationMaster
17/05/16 00:04:19 INFO yarn.ApplicationMaster: Started progress
reporter thread with (heartbeat : 3000, initial allocation : 200)
intervals
17/05/16 00:04:19 INFO hive.HiveContext: Initializing execution hive,
version 1.1.0
17/05/16 00:04:19 INFO client.ClientWrapper: Inspected Hadoop version:
2.6.0-cdh5.7.0
17/05/16 00:04:19 INFO client.ClientWrapper: Loaded
org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version
2.6.0-cdh5.7.0
17/05/16 00:04:20 INFO hive.metastore: Trying to connect to metastore
with URI thrift://jarvis-hdfs003.internal.shopkick.com:9083
17/05/16 00:04:20 INFO hive.metastore: Opened a connection to
metastore, current connections: 1
17/05/16 00:04:20 INFO hive.metastore: Connected to metastore.
17/05/16 00:04:20 INFO session.SessionState: Created HDFS directory:
file:/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/spark-2217d267-a3c0-4cf4-9565-45f80517d41c/scratch/hdfs
17/05/16 00:04:20 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/yarn
17/05/16 00:04:20 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/478f39e9-5295-4e8e-97aa-40b5828f9440_resources
17/05/16 00:04:20 INFO session.SessionState: Created HDFS directory:
file:/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/spark-2217d267-a3c0-4cf4-9565-45f80517d41c/scratch/hdfs/478f39e9-5295-4e8e-97aa-40b5828f9440
17/05/16 00:04:20 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/yarn/478f39e9-5295-4e8e-97aa-40b5828f9440
17/05/16 00:04:20 INFO session.SessionState: Created HDFS directory:
file:/yarn/nm/usercache/hdfs/appcache/application_1494373289850_0336/container_e14_1494373289850_0336_01_01/tmp/spark-2217d267-a3c0-4cf4-9565-45f80517d41c/scratch/hdfs/478f39e9-5295-4e8e-97aa-40b5828f9440/_tmp_space.db
17/05/16 00:04:20 INFO session.SessionState: No Tez session required
at this point. hive.execution.engine=mr.
17/05/16 00:04:20 INFO repl.SparkInterpreter: Created sql context
(with Hive support).


On Mon, May 15, 2017 at 5:43 PM, Jeff Zhang <zjf...@gmail.com> wrote:

>
> Which version of zeppelin do you use ? And can you check the yarn app log ?
>
>
> Ben Vogan <b...@shopkick.com>于2017年5月15日周一 下午5:56写道:
>
>> Hi all,
>>
>> For some reason today I'm getting a stack:
>>
>> org.apache.zeppelin.livy.LivyException: Fail to create
>> SQLContext,:4: error: illegal inheritance;
>> at org.apache.zeppelin.livy.LivySparkSQLInterpreter.open(
>> LivySparkSQLInterpreter.java:76)
>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(
>> LazyOpenInterpreter.java:70)
>> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$
>> InterpretJob.jobRun(RemoteInterpreterServer.java:483)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(
>> FIFOScheduler.java:139)
>> at java.util.concurrent.Executors$RunnableAdapter.
>> call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTas

Re: Illegal Inheritance error

2017-05-15 Thread Ben Vogan
Hi all,

For some reason today I'm getting a stack:

org.apache.zeppelin.livy.LivyException: Fail to create
SQLContext,:4: error: illegal inheritance;
at
org.apache.zeppelin.livy.LivySparkSQLInterpreter.open(LivySparkSQLInterpreter.java:76)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

On the Livy server I see no errors and there is an open session on yarn.

Some help on this would be greatly appreciated!

--Ben

On Sun, May 14, 2017 at 6:16 AM, Ben Vogan <b...@shopkick.com> wrote:

> Hi all,
>
> I've been using Zeppelin for a couple of weeks now with a stable
> configuration, but all of a sudden I am getting "Illegal inheritance"
> errors like so:
>
>  INFO [2017-05-14 03:25:32,678] ({pool-2-thread-56}
> Paragraph.java[jobRun]:362) - run paragraph 20170514-032326_663206142 using
> livy org.apache.zeppelin.interpreter.LazyOpenInterpreter@505a171c
>  WARN [2017-05-14 03:25:33,696] ({pool-2-thread-56} 
> NotebookServer.java[afterStatusChange]:2058)
> - Job 20170514-032326_663206142 is finished, status: ERROR, exception:
> null, result: %text :4: error: illegal inheritance;
>
> It happens across multiple notebooks and across by my spark and livy
> interpreters.  I don't know where to look for more information about what
> is wrong.  I don't see any errors in spark/yarn at all.  The driver got
> created, but it looks like no jobs were ever submitted to spark.
>
> Help would be greatly appreciated.
>
> Thanks,
>
> --
> *BENJAMIN VOGAN* | Data Platform Team Lead
>
> <http://www.shopkick.com/>
> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>



-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>


ZeppelinContext textbox for passwords

2017-05-09 Thread Ben Vogan
Hi there,

Is it possible to create a textbox for accepting passwords via the
ZeppelinContext (i.e. one that masks input)?  I do not see any way to do
so, but I hope I'm missing something.

Thanks,

-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 
 



org.apache.spark.SparkException: Could not parse Master URL: 'yarn'

2017-04-12 Thread Ben Vogan
Hello all,

I am trying to install Zeppelin 0.7.1 on my CDH 5.7 Cluster.  I have been
following the instructions here:

https://zeppelin.apache.org/docs/0.7.1/install/install.html
https://zeppelin.apache.org/docs/0.7.1/install/configuration.html
https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html

I copied the zeppelin-env.sh.template into zeppelin-env.sh and made the
following changes:
export JAVA_HOME=/usr/java/latest
export MASTER=yarn-client

export ZEPPELIN_LOG_DIR=/var/log/services/zeppelin
export ZEPPELIN_PID_DIR=/services/zeppelin/data
export ZEPPELIN_WAR_TEMPDIR=/services/zeppelin/data/jetty_tmp
export ZEPPELIN_NOTEBOOK_DIR=/services/zeppelin/data/notebooks
export ZEPPELIN_NOTEBOOK_PUBLIC=true

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export HADOOP_CONF_DIR=/etc/spark/conf/yarn-conf
export PYSPARK_PYTHON=/usr/lib/python

I then start Zeppelin and hit the UI in my browser and create a spark note:

%spark
sqlContext.sql("select 1+1").collect().foreach(println)

And I get this error:

org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2746)
at org.apache.spark.SparkContext.(SparkContext.scala:533)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_1(SparkInterpreter.java:484)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:382)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I specified "yarn-client" as indicated by the instructions so I'm not sure
where it is getting "yarn" from.  In my spark-defaults.conf it
spark.master=yarn-client as well.

Help would be greatly appreciated.

Thanks,
-- 
*BENJAMIN VOGAN* | Data Platform Team Lead