Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
Thanks for sharing this Ruslan - I will take a look.

I agree that paragraphs can form tasks within a DAG.  My point was that
ideally a DAG could encompass multiple notes.  I.e. the completion of one
note triggers another and so on to complete an entire chain of dependent
tasks.

For example team A has a note that generates data set A*.  Teams B & C each
have notes that depend on A* to generate B* & C* for their specific
purposes.  It doesn't make sense for all of that to have to live in one
note, but they are all part of a single workflow.

Best,
--Ben

On Fri, May 19, 2017 at 9:02 PM, Ruslan Dautkhanov 
wrote:

> Thanks for sharing this Ben.
>
> I agree Zeppelin is a better fit with tighter integration with Spark and
> built-in visualizations.
>
> We have pretty much standardized on pySpark, so here's one of the scripts
> we use internally
> to extract %pyspark, %sql and %md paragraphs into a standalone script
> (that can be scheduled in Airflow for example)
> https://github.com/Tagar/stuff/blob/master/znote.py (patches are welcome
> :-)
>
> Hope this helps.
>
> ps. In my opinion adding dependencies between paragraphs wouldn't be that
> hard for simple cases,
> and can be first step to define a DAG in Zeppelin directly. It would be
> really awesome if we see this type of
> integration in the future.
>
> Othewise I don't see much value if a whole note/ whole workflow would run
> as a single task in Airflow.
> In my opinion, each paragraph has to be a task... then it'll be very
> useful.
>
>
> Thanks,
> Ruslan
>
>
> On Fri, May 19, 2017 at 4:55 PM, Ben Vogan  wrote:
>
>> I do not expect the relationship between DAGs to be described in Zeppelin
>> - that would be done in Airflow.  It just seems that Zeppelin is such a
>> great tool for a data scientists workflow that it would be nice if once
>> they are done with the work the note could be productionized directly.  I
>> could envision a couple of scenarios:
>>
>> 1. Using a zeppelin instance to run the note via the REST API.  The
>> instance could be containerized and spun up specifically for a DAG or it
>> could be a permanently available one.
>> 2. A note could be pulled from git and some part of the Zeppelin engine
>> could execute the note without the web UI at all.
>>
>> I would expect on the airflow side there to be some special operators for
>> executing these.
>>
>> If the scheduler is pluggable then it should be possible to create a plug
>> in that talks to the Airflow REST API.
>>
>> I happen to prefer Zeppelin to Jupyter - although I get your point about
>> both being python.  I don't really view that as a problem - most of the big
>> data platforms I'm talking to are implemented on the JVM after all.  The
>> python part of Airflow is really just describing what gets run and it isn't
>> hard to run something that isn't written in python.
>>
>> On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov 
>> wrote:
>>
>>> We also use both Zeppelin and Airflow.
>>>
>>> I'm interested in hearing what others are doing here too.
>>>
>>> Although honestly there might be some challenges
>>> - Airflow expects a DAG structure, while a notebook has pretty linear
>>> structure;
>>> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
>>> help?).
>>> Jupyter+Airflow might be a more natural fit to integrate?
>>>
>>> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
>>> while Airflow is for more finalized workflows I guess?
>>>
>>> Thanks for bringing this up.
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan  wrote:
>>>
 Hi all,

 We are really enjoying the workflow of interacting with our data via
 Zeppelin, but are not sold on using the built in cron scheduling
 capability.  We would like to be able to create more complex DAGs that are
 better suited for something like Airflow.  I was curious as to whether
 anyone has done an integration of Zeppelin with Airflow.

 Either directly from within Zeppelin, or from the Airflow side.

 Thanks,
 --
 *BENJAMIN VOGAN* | Data Platform Team Lead

 
 
 
  
 

>>>
>>>
>>
>>
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> 
>>  
>>  
>> 
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 

Re: Integrating with Airflow

2017-05-19 Thread Ruslan Dautkhanov
Thanks for sharing this Ben.

I agree Zeppelin is a better fit with tighter integration with Spark and
built-in visualizations.

We have pretty much standardized on pySpark, so here's one of the scripts
we use internally
to extract %pyspark, %sql and %md paragraphs into a standalone script (that
can be scheduled in Airflow for example)
https://github.com/Tagar/stuff/blob/master/znote.py (patches are welcome :-)

Hope this helps.

ps. In my opinion adding dependencies between paragraphs wouldn't be that
hard for simple cases,
and can be first step to define a DAG in Zeppelin directly. It would be
really awesome if we see this type of
integration in the future.

Othewise I don't see much value if a whole note/ whole workflow would run
as a single task in Airflow.
In my opinion, each paragraph has to be a task... then it'll be very
useful.


Thanks,
Ruslan


On Fri, May 19, 2017 at 4:55 PM, Ben Vogan  wrote:

> I do not expect the relationship between DAGs to be described in Zeppelin
> - that would be done in Airflow.  It just seems that Zeppelin is such a
> great tool for a data scientists workflow that it would be nice if once
> they are done with the work the note could be productionized directly.  I
> could envision a couple of scenarios:
>
> 1. Using a zeppelin instance to run the note via the REST API.  The
> instance could be containerized and spun up specifically for a DAG or it
> could be a permanently available one.
> 2. A note could be pulled from git and some part of the Zeppelin engine
> could execute the note without the web UI at all.
>
> I would expect on the airflow side there to be some special operators for
> executing these.
>
> If the scheduler is pluggable then it should be possible to create a plug
> in that talks to the Airflow REST API.
>
> I happen to prefer Zeppelin to Jupyter - although I get your point about
> both being python.  I don't really view that as a problem - most of the big
> data platforms I'm talking to are implemented on the JVM after all.  The
> python part of Airflow is really just describing what gets run and it isn't
> hard to run something that isn't written in python.
>
> On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov 
> wrote:
>
>> We also use both Zeppelin and Airflow.
>>
>> I'm interested in hearing what others are doing here too.
>>
>> Although honestly there might be some challenges
>> - Airflow expects a DAG structure, while a notebook has pretty linear
>> structure;
>> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
>> help?).
>> Jupyter+Airflow might be a more natural fit to integrate?
>>
>> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
>> while Airflow is for more finalized workflows I guess?
>>
>> Thanks for bringing this up.
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan  wrote:
>>
>>> Hi all,
>>>
>>> We are really enjoying the workflow of interacting with our data via
>>> Zeppelin, but are not sold on using the built in cron scheduling
>>> capability.  We would like to be able to create more complex DAGs that are
>>> better suited for something like Airflow.  I was curious as to whether
>>> anyone has done an integration of Zeppelin with Airflow.
>>>
>>> Either directly from within Zeppelin, or from the Airflow side.
>>>
>>> Thanks,
>>> --
>>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>>
>>> 
>>> 
>>> 
>>>  
>>> 
>>>
>>
>>
>
>
> --
> *BENJAMIN VOGAN* | Data Platform Team Lead
>
> 
>  
>  
> 
>


Re: Create cassandra interpreter for Zeppelin on AWS EMR spark cluster

2017-05-19 Thread shyla deshpande
I now have the cassandra interpreter setup.

But I am getting the following error
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.zeppelin.cassandra.DisplaySystem$NoResultDisplay$
at
org.apache.zeppelin.cassandra.EnhancedSession.(EnhancedSession.scala:40)

Please help.

On Fri, May 19, 2017 at 3:45 PM, DuyHai Doan  wrote:

> Just download a binary of Zeppelin with all interpreters on your AWS EMR
> cluster and install it there
>
> On Sat, May 20, 2017 at 12:21 AM, shyla deshpande <
> deshpandesh...@gmail.com> wrote:
>
>> I don't see the cassandra interpreter, so I need to create the
>> interpreter.
>>
>> Locally, I installed the Binary package with all interpreters and works
>> fine.
>>
>> On AWS EMR, all the binaries may not be there, and thats where I need
>> help.
>>
>> Thanks
>>
>> On Fri, May 19, 2017 at 3:04 PM, DuyHai Doan 
>> wrote:
>>
>>> So just use the Cassandra interpreter, I don't see where is the problem,
>>> there is even documentation here:
>>>
>>> http://zeppelin.apache.org/docs/0.7.1/interpreter/cassandra.html
>>>
>>> On Fri, May 19, 2017 at 11:51 PM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
 Yes, we do have cassandra cluster on AWS-EC2.

 On Fri, May 19, 2017 at 2:39 PM, DuyHai Doan 
 wrote:

> If you're running on AWS EMR Spark Cluster, why do you need a
> Cassandra interpreter  Is Cassandra installed on your AWS cluster ?
>
> On Fri, May 19, 2017 at 11:23 PM, shyla deshpande <
> deshpandesh...@gmail.com> wrote:
>
>> Hello all,
>>
>> I need help in creating cassandra interpreter for Zeppelin on AWS EMR
>> spark cluster. Please give me the info on what to install and configure.
>>
>> Thanks
>>
>>
>

>>>
>>
>


Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
I do not expect the relationship between DAGs to be described in Zeppelin -
that would be done in Airflow.  It just seems that Zeppelin is such a great
tool for a data scientists workflow that it would be nice if once they are
done with the work the note could be productionized directly.  I could
envision a couple of scenarios:

1. Using a zeppelin instance to run the note via the REST API.  The
instance could be containerized and spun up specifically for a DAG or it
could be a permanently available one.
2. A note could be pulled from git and some part of the Zeppelin engine
could execute the note without the web UI at all.

I would expect on the airflow side there to be some special operators for
executing these.

If the scheduler is pluggable then it should be possible to create a plug
in that talks to the Airflow REST API.

I happen to prefer Zeppelin to Jupyter - although I get your point about
both being python.  I don't really view that as a problem - most of the big
data platforms I'm talking to are implemented on the JVM after all.  The
python part of Airflow is really just describing what gets run and it isn't
hard to run something that isn't written in python.

On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov 
wrote:

> We also use both Zeppelin and Airflow.
>
> I'm interested in hearing what others are doing here too.
>
> Although honestly there might be some challenges
> - Airflow expects a DAG structure, while a notebook has pretty linear
> structure;
> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
> help?).
> Jupyter+Airflow might be a more natural fit to integrate?
>
> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
> while Airflow is for more finalized workflows I guess?
>
> Thanks for bringing this up.
>
>
>
> --
> Ruslan Dautkhanov
>
> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan  wrote:
>
>> Hi all,
>>
>> We are really enjoying the workflow of interacting with our data via
>> Zeppelin, but are not sold on using the built in cron scheduling
>> capability.  We would like to be able to create more complex DAGs that are
>> better suited for something like Airflow.  I was curious as to whether
>> anyone has done an integration of Zeppelin with Airflow.
>>
>> Either directly from within Zeppelin, or from the Airflow side.
>>
>> Thanks,
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> 
>>  
>>  
>> 
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 
 



Re: Create cassandra interpreter for Zeppelin on AWS EMR spark cluster

2017-05-19 Thread DuyHai Doan
Just download a binary of Zeppelin with all interpreters on your AWS EMR
cluster and install it there

On Sat, May 20, 2017 at 12:21 AM, shyla deshpande 
wrote:

> I don't see the cassandra interpreter, so I need to create the interpreter.
>
> Locally, I installed the Binary package with all interpreters and works
> fine.
>
> On AWS EMR, all the binaries may not be there, and thats where I need
> help.
>
> Thanks
>
> On Fri, May 19, 2017 at 3:04 PM, DuyHai Doan  wrote:
>
>> So just use the Cassandra interpreter, I don't see where is the problem,
>> there is even documentation here:
>>
>> http://zeppelin.apache.org/docs/0.7.1/interpreter/cassandra.html
>>
>> On Fri, May 19, 2017 at 11:51 PM, shyla deshpande <
>> deshpandesh...@gmail.com> wrote:
>>
>>> Yes, we do have cassandra cluster on AWS-EC2.
>>>
>>> On Fri, May 19, 2017 at 2:39 PM, DuyHai Doan 
>>> wrote:
>>>
 If you're running on AWS EMR Spark Cluster, why do you need a Cassandra
 interpreter  Is Cassandra installed on your AWS cluster ?

 On Fri, May 19, 2017 at 11:23 PM, shyla deshpande <
 deshpandesh...@gmail.com> wrote:

> Hello all,
>
> I need help in creating cassandra interpreter for Zeppelin on AWS EMR
> spark cluster. Please give me the info on what to install and configure.
>
> Thanks
>
>

>>>
>>
>


Re: Create cassandra interpreter for Zeppelin on AWS EMR spark cluster

2017-05-19 Thread shyla deshpande
I don't see the cassandra interpreter, so I need to create the interpreter.

Locally, I installed the Binary package with all interpreters and works
fine.

On AWS EMR, all the binaries may not be there, and thats where I need help.

Thanks

On Fri, May 19, 2017 at 3:04 PM, DuyHai Doan  wrote:

> So just use the Cassandra interpreter, I don't see where is the problem,
> there is even documentation here:
>
> http://zeppelin.apache.org/docs/0.7.1/interpreter/cassandra.html
>
> On Fri, May 19, 2017 at 11:51 PM, shyla deshpande <
> deshpandesh...@gmail.com> wrote:
>
>> Yes, we do have cassandra cluster on AWS-EC2.
>>
>> On Fri, May 19, 2017 at 2:39 PM, DuyHai Doan 
>> wrote:
>>
>>> If you're running on AWS EMR Spark Cluster, why do you need a Cassandra
>>> interpreter  Is Cassandra installed on your AWS cluster ?
>>>
>>> On Fri, May 19, 2017 at 11:23 PM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
 Hello all,

 I need help in creating cassandra interpreter for Zeppelin on AWS EMR
 spark cluster. Please give me the info on what to install and configure.

 Thanks


>>>
>>
>


Re: Create cassandra interpreter for Zeppelin on AWS EMR spark cluster

2017-05-19 Thread DuyHai Doan
So just use the Cassandra interpreter, I don't see where is the problem,
there is even documentation here:

http://zeppelin.apache.org/docs/0.7.1/interpreter/cassandra.html

On Fri, May 19, 2017 at 11:51 PM, shyla deshpande 
wrote:

> Yes, we do have cassandra cluster on AWS-EC2.
>
> On Fri, May 19, 2017 at 2:39 PM, DuyHai Doan  wrote:
>
>> If you're running on AWS EMR Spark Cluster, why do you need a Cassandra
>> interpreter  Is Cassandra installed on your AWS cluster ?
>>
>> On Fri, May 19, 2017 at 11:23 PM, shyla deshpande <
>> deshpandesh...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I need help in creating cassandra interpreter for Zeppelin on AWS EMR
>>> spark cluster. Please give me the info on what to install and configure.
>>>
>>> Thanks
>>>
>>>
>>
>


Create cassandra interpreter for Zeppelin on AWS EMR spark cluster

2017-05-19 Thread shyla deshpande
Hello all,

I need help in creating cassandra interpreter for Zeppelin on AWS EMR spark
cluster. Please give me the info on what to install and configure.

Thanks


Re: Hive interpreter Error as soon as Hive query uses MapRed

2017-05-19 Thread Ben Vogan
I am running CDH 5.7 and Spark 1.6 as well and hive is working for me with
the following configuration:

Properties
namevalue
common.max_count 1000
default.driver org.apache.hive.jdbc.HiveDriver
default.password
default.url jdbc:hive2://hdfs004:1
default.user hive
zeppelin.interpreter.localRepo
/services/zeppelin/zeppelin-0.7.1/local-repo/2CECB8FBV
zeppelin.jdbc.auth.type
zeppelin.jdbc.concurrent.max_connection 10
zeppelin.jdbc.concurrent.use true
zeppelin.jdbc.keytab.location
zeppelin.jdbc.principal
Dependencies
artifactexclude
org.apache.hive:hive-jdbc:0.14.0
org.apache.hadoop:hadoop-common:2.6.0

I admit to not having spent time figuring out whether there are any edge
cases that are broken because I am using the open source version of the
odbc driver vs using the cloudera jars.  However, it definitely returns
results from complex select queries and has no issues with DDL statements
that I've tried.

Good luck!
--Ben

On Fri, May 19, 2017 at 12:10 PM, Meier, Alexander <
alexander.me...@t-systems-dmc.com> wrote:

> Yes, the script (i.e. The select statement) runs fine in hive cli, hue and
> also in spark sql ( spark sql also in zeppelin).
> Just not when using the hive interpreter in zeppelin.
>
>
>
> Sent from my iPhone
>
> Am 19.05.2017 um 19:35 schrieb Jongyoul Lee :
>
> Can you check your script works in native hive environment?
>
> On Fri, May 19, 2017 at 10:20 AM, Meier, Alexander <
> alexander.me...@t-systems-dmc.com> wrote:
>
>> Hi list
>>
>> I’m trying to get a Hive interpreter correctly running on a CDH 5.7
>> Cluster with Spark 1.6. Simple queries are running fine, but as soon as a
>> query needs a MapRed tasks in order to complete, the query fails with:
>>
>> java.sql.SQLException: Error while processing statement: FAILED:
>> Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec
>> .mr.MapRedTask
>> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.
>> java:279)
>> at org.apache.commons.dbcp2.DelegatingStatement.execute(Delegat
>> ingStatement.java:291)
>> at org.apache.commons.dbcp2.DelegatingStatement.execute(Delegat
>> ingStatement.java:291)
>> at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInte
>> rpreter.java:580)
>> at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInter
>> preter.java:692)
>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.
>> interpret(LazyOpenInterpreter.java:95)
>> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServ
>> er$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOSchedu
>> ler.java:139)
>> at java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> etc…
>>
>> I’ve got the interpreter set up as follows:
>>
>> Properties
>> namevalue
>> default.driver
>> org.apache.hive.jdbc.HiveDriver
>> default.url
>>  jdbc:hive2://[hostname]:1
>> hive.driver
>>  org.apache.hive.jdbc.HiveDriver
>> hive.url
>> jdbc:hive2://[hostname]:1
>> zeppelin.interpreter.localRepo  /opt/zeppelin/local-repo/2CJ4XM2Z4
>>
>> Dependencies
>> artifact
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar
>> /opt/cloudera/parcels/CDH/lib/hadoop/client/hadoop-common.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-common.jar
>> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar
>>
>>
>> Unfortunately I haven’t found any help googling around… anyone here with
>> some helpful input?
>>
>> Best regards and many thanks in advance,
>> Alex
>
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead


 
 



Re: Hive interpreter Error as soon as Hive query uses MapRed

2017-05-19 Thread Meier, Alexander
Yes, the script (i.e. The select statement) runs fine in hive cli, hue and also 
in spark sql ( spark sql also in zeppelin).
Just not when using the hive interpreter in zeppelin.



Sent from my iPhone

Am 19.05.2017 um 19:35 schrieb Jongyoul Lee 
>:

Can you check your script works in native hive environment?

On Fri, May 19, 2017 at 10:20 AM, Meier, Alexander 
> 
wrote:
Hi list

I'm trying to get a Hive interpreter correctly running on a CDH 5.7 Cluster 
with Spark 1.6. Simple queries are running fine, but as soon as a query needs a 
MapRed tasks in order to complete, the query fails with:

java.sql.SQLException: Error while processing statement: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:279)
at 
org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at 
org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at 
org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:580)
at 
org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
etc...

I've got the interpreter set up as follows:

Properties
namevalue
default.driver  org.apache.hive.jdbc.HiveDriver
default.url jdbc:hive2://[hostname]:1
hive.driver org.apache.hive.jdbc.HiveDriver
hive.url
jdbc:hive2://[hostname]:1
zeppelin.interpreter.localRepo  /opt/zeppelin/local-repo/2CJ4XM2Z4

Dependencies
artifact
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar
/opt/cloudera/parcels/CDH/lib/hadoop/client/hadoop-common.jar
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-common.jar
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar


Unfortunately I haven't found any help googling around... anyone here with some 
helpful input?

Best regards and many thanks in advance,
Alex



--
???, Jongyoul Lee, ???
http://madeng.net


Re: Setting Note permission

2017-05-19 Thread Jongyoul Lee
Simply, you'd better enables personalized mode in the top of note. Then,
one user's behaviors doesn't affect another one.

Try it and leave comments.

Thanks,
Jongyoul Lee

On Tue, May 16, 2017 at 1:36 PM, shyla deshpande 
wrote:

> I want to know if this is possible. Works great for a single user but in a
> multi-user environment, we need more granular control on who can do what.
> The readers permission is not useful, because the user cannot execute or
> even change the display type
>
> Please share your experience how you are using in multi user environment.
>
> Thanks
>
> On Sun, May 14, 2017 at 9:53 PM, shyla deshpande  > wrote:
>
>> How do I set permissions to a Note to do only the following :
>> 1. execute the paragraphs
>> 2. choose a different value from dynamic dropdown
>> 3. change display type from bar chart to tabular
>> 4. download the result data as a csv or tsv file
>>
>> I do no want the users to change the code or access Interpreter menu or
>> change configuration.
>>
>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: Hive interpreter Error as soon as Hive query uses MapRed

2017-05-19 Thread Jongyoul Lee
Can you check your script works in native hive environment?

On Fri, May 19, 2017 at 10:20 AM, Meier, Alexander <
alexander.me...@t-systems-dmc.com> wrote:

> Hi list
>
> I’m trying to get a Hive interpreter correctly running on a CDH 5.7
> Cluster with Spark 1.6. Simple queries are running fine, but as soon as a
> query needs a MapRed tasks in order to complete, the query fails with:
>
> java.sql.SQLException: Error while processing statement: FAILED: Execution
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> at org.apache.hive.jdbc.HiveStatement.execute(
> HiveStatement.java:279)
> at org.apache.commons.dbcp2.DelegatingStatement.execute(
> DelegatingStatement.java:291)
> at org.apache.commons.dbcp2.DelegatingStatement.execute(
> DelegatingStatement.java:291)
> at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(
> JDBCInterpreter.java:580)
> at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(
> JDBCInterpreter.java:692)
> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(
> LazyOpenInterpreter.java:95)
> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$
> InterpretJob.jobRun(RemoteInterpreterServer.java:490)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(
> FIFOScheduler.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> etc…
>
> I’ve got the interpreter set up as follows:
>
> Properties
> namevalue
> default.driver  org.apache.hive.jdbc.
> HiveDriver
> default.url
>  jdbc:hive2://[hostname]:1
> hive.driver org.apache.hive.jdbc.
> HiveDriver
> hive.url
> jdbc:hive2://[hostname]:1
> zeppelin.interpreter.localRepo  /opt/zeppelin/local-repo/2CJ4XM2Z4
>
> Dependencies
> artifact
> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar
> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar
> /opt/cloudera/parcels/CDH/lib/hadoop/client/hadoop-common.jar
> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-common.jar
> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar
>
>
> Unfortunately I haven’t found any help googling around… anyone here with
> some helpful input?
>
> Best regards and many thanks in advance,
> Alex




-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: Does Zeppelin 0.7.1 work with Spark Structured Streaming 2.1.1?

2017-05-19 Thread kant kodali
Look for something that is more like in this video where the graphs
automatically update themselves. Is that possible in Zeppelin?

https://www.youtube.com/watch?v=IJmFTXvUZgY

You can watch it from 9:20


On Fri, May 19, 2017 at 9:21 AM, kant kodali  wrote:

> Hi All,
>
> I have the following code
>
> StreamingQuery query = df2.writeStream().outputMode("complete").queryName(
> "foo").option("truncate", "false").format("console").start();
>
> query.awaitTermination();
>
>
> and it works fine however when I change it to the below code. I do get the
> output but only once and I tried running *%spark.sql select * from foo* over
> and over again but I don't see results getting updated but in console
> format like above it works perfect fine I can see updates on each batch so
> should I be doing something else for memory sink?
>
> StreamingQuery query = 
> df2.writeStream().outputMode("complete").queryName("foo").option("truncate", 
> "false").format("memory").start();
>
> %spark.sql select * from foo
>
>
> Thanks!
>
>
>


Hive interpreter Error as soon as Hive query uses MapRed

2017-05-19 Thread Meier, Alexander
Hi list

I’m trying to get a Hive interpreter correctly running on a CDH 5.7 Cluster 
with Spark 1.6. Simple queries are running fine, but as soon as a query needs a 
MapRed tasks in order to complete, the query fails with:

java.sql.SQLException: Error while processing statement: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:279)
at 
org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at 
org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at 
org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:580)
at 
org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
etc…

I’ve got the interpreter set up as follows:

Properties
namevalue
default.driver  org.apache.hive.jdbc.HiveDriver
default.url jdbc:hive2://[hostname]:1
hive.driver org.apache.hive.jdbc.HiveDriver
hive.url
jdbc:hive2://[hostname]:1
zeppelin.interpreter.localRepo  /opt/zeppelin/local-repo/2CJ4XM2Z4

Dependencies
artifact
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar 
/opt/cloudera/parcels/CDH/lib/hadoop/client/hadoop-common.jar   
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-common.jar  
/opt/cloudera/parcels/CDH/lib/hive/lib/hive-metastore.jar   


Unfortunately I haven’t found any help googling around… anyone here with some 
helpful input?

Best regards and many thanks in advance,
Alex