Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Zhong Wang
+1 on @rick. quality is really important... I am still encountering bugs
consistently

On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV 
wrote:

> +1 on @rick
>
> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim  wrote:
>
>> I see in the Enterprise section that multi-tenancy will be included, will
>> this have user impersonation too? In this way, the user executing will be
>> the user owning the process.
>>
>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed  wrote:
>>
>> +1
>>
>> Hi Tamas,
>>Pluggable external visualization is really a GREAT feature to have.
>> I'm looking forward to this :)
>>
>> Regards
>> Shabeel
>>
>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi 
>> wrote:
>>
>>> Hey,
>>>
>>> Really promising roadmap.
>>>
>>> I'd only push more visualization options. I agree built in
>>> visualization is needed with limited charting options but I think we also
>>> need somehow 'inject' external js visualizations also.
>>>
>>>
>>> For scheduling Zeppelin notebooks  we use
>>>  https://github.com/airbnb/airflow  
>>> through
>>> the job rest api. It's an enterprise ready and very robust solution
>>> right now.
>>>
>>>
>>> *Tamas*
>>>
>>> On 1 March 2016 at 09:12, Eran Witkon  wrote:
>>>
 One point to clarify, I don't want to suggest Oozie in specific, I want
 to think about which features we develop and which ones we integrate
 external, preferred Apache, technology? We don't think about building our
 own storage services so why build our own scheduler?
 Eran
 On Tue, 1 Mar 2016 at 09:49 moon soo Lee  wrote:

> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling.
> Either external or built-in, I completely agree having enterprise level 
> job
> scheduling support on the roadmap.
> ZEPPELIN-137 ,
> ZEPPELIN-531  are
> related issues i can find in our JIRA.
>
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable
> notebook storage layer (see related package
> ).
> So, github notebook sync can be implemented easily.
>
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying
> data. So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will
> follow. Then we'll get idea when those features will be available.
>
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good
> idea.
>
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
>
> Thanks,
> moon
>
>
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  wrote:
>
>> Hi,
>>
>> For one, I know that there is rudimentary scheduling built into
>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>> feature a few months ago).
>> But another point is, that Zeppelin should also focus on quality,
>> reproduceability and portability.
>> Although this doesn't offer exciting new features, it would make
>> development much easier.
>>
>> Cross-platform testability, Tests that pass when run sequentially,
>> compatibility with Firefox, and many more open issues that make it so 
>> much
>> harder to enhance Zeppelin and add features should be addressed soon,
>> preferably before more features are added. Already Zeppelin is suffering 
>> -
>> in my opinion - from quite a lot of feature creep, and we should avoid
>> putting in the kitchen sink, at the cost of quality and maintainability.
>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>
>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>> on many clusters, but it's not getting the love it needs, and I wouldn't
>> bet on it, when it comes to integrating scheduling. Instead, any external
>> tool should be able to use the REST-API to trigger executions, if you 
>> want
>> external scheduling.
>>
>> So, in conclusion, if we take Moon's list as a list of descending
>> priorities, I fully agree, under the condition that code quality is
>> included as a subset of enterprise-readyness. Auth* is 

Re: error "Could not find creator property with name 'id' "

2016-03-01 Thread enzo
Hi Moon

Thanks!!  The fixes proposed in the post resolved my problem.

On the other hand, if this is happening to everybody (as I assume),  maybe this 
should be addressed a bit more systematically??

Thanks again!

Enzo
e...@smartinsightsfromdata.com



> On 1 Mar 2016, at 19:13, moon soo Lee  wrote:
> 
> Hi Enzo,
> 
> It happens when you have multiple version of jackson library in your 
> classpath. Please check following email thread
> http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/com-fasterxml-jackson-databind-JsonMappingException-td1607.html
>  
> 
> 
> Thanks,
> moon
> 
> On Tue, Mar 1, 2016 at 8:46 AM enzo  > wrote:
> I get the following euro in a variety of circumstances.
> 
> I’ve downloaded zeppelin a couple of days ago.  I use Spark 1.6.0.
> 
> 
> For example:
> 
> %spark
> 
> val raw = sc.textFile("/tmp/github.json”)  // reading a 25Mb file from /tmp
> 
> Gives the following error.  Hey please!!
> 
> 
> com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
> property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
>  at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
>   at 
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at scala.Option.map(Option.scala:145)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
>   at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
>   at $iwC$$iwC$$iwC$$iwC.(:59)
>   at $iwC$$iwC$$iwC.(:61)
>   at $iwC$$iwC.(:63)
>   at $iwC.(:65)
>   at (:67)
>   at .(:71)
>   at .()
>   at .(:7)
>   at .()
>

Re: error "Could not find creator property with name 'id' "

2016-03-01 Thread moon soo Lee
Hi Enzo,

It happens when you have multiple version of jackson library in your
classpath. Please check following email thread
http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/com-fasterxml-jackson-databind-JsonMappingException-td1607.html

Thanks,
moon

On Tue, Mar 1, 2016 at 8:46 AM enzo  wrote:

> I get the following euro in a variety of circumstances.
>
> I’ve downloaded zeppelin a couple of days ago.  I use Spark 1.6.0.
>
>
> For example:
>
> %spark
>
> val raw = sc.textFile("/tmp/github.json”)  // reading a 25Mb file from /tmp
>
> Gives the following error.  Hey please!!
>
>
> com.fasterxml.jackson.databind.JsonMappingException: Could not find
> creator property with name 'id' (in class
> org.apache.spark.rdd.RDDOperationScope)
> at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
> at
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
> at
> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
> at
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
> at
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
> at
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
> at
> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
> at
> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
> at
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
> at
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
> at
> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
> at
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
> at
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
> at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
> at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
> at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
> at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
> at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
> at $iwC$$iwC$$iwC$$iwC.(:59)
> at $iwC$$iwC$$iwC.(:61)
> at $iwC$$iwC.(:63)
> at $iwC.(:65)
> at (:67)
> at .(:71)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at
> org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:780)
> at
> 

Re: [DISCUSS] Update Roadmap

2016-03-01 Thread TEJA SRIVASTAV
+1 on @rick

On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim  wrote:

> I see in the Enterprise section that multi-tenancy will be included, will
> this have user impersonation too? In this way, the user executing will be
> the user owning the process.
>
> On Mar 1, 2016, at 12:51 AM, Shabeel Syed  wrote:
>
> +1
>
> Hi Tamas,
>Pluggable external visualization is really a GREAT feature to have.
> I'm looking forward to this :)
>
> Regards
> Shabeel
>
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi 
> wrote:
>
>> Hey,
>>
>> Really promising roadmap.
>>
>> I'd only push more visualization options. I agree built in visualization
>> is needed with limited charting options but I think we also need somehow
>> 'inject' external js visualizations also.
>>
>>
>> For scheduling Zeppelin notebooks  we use
>>  https://github.com/airbnb/airflow  
>> through
>> the job rest api. It's an enterprise ready and very robust solution
>> right now.
>>
>>
>> *Tamas*
>>
>> On 1 March 2016 at 09:12, Eran Witkon  wrote:
>>
>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>> to think about which features we develop and which ones we integrate
>>> external, preferred Apache, technology? We don't think about building our
>>> own storage services so why build our own scheduler?
>>> Eran
>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee  wrote:
>>>
 @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
 Now I can see a lot of demands around enterprise level job scheduling.
 Either external or built-in, I completely agree having enterprise level job
 scheduling support on the roadmap.
 ZEPPELIN-137 ,
 ZEPPELIN-531  are
 related issues i can find in our JIRA.

 @Vinayak
 Regarding importing notebook from github, Zeppelin has pluggable
 notebook storage layer (see related package
 ).
 So, github notebook sync can be implemented easily.

 @Shabeel
 Right, we need better manage management to prevent such OOM.
 And i think table is one of the most frequently used way of displaying
 data. So definitely, we'll need more features like filter, sort, etc.
 After this roadmap discussion, discussion for the next release will
 follow. Then we'll get idea when those features will be available.

 @Prasad
 Thanks for mentioning HA and DR. They're really important subject for
 enterprise use. Definitely Zeppelin will need to address them.
 And displaying meta information of notebook on top level page is good
 idea.

 It's really great to hear many opinions and ideas.
 And thanks @Rick for sharing valuable view to Zeppelin project.

 Thanks,
 moon


 On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  wrote:

> Hi,
>
> For one, I know that there is rudimentary scheduling built into
> Zeppelin already (at least I fixed a bug in the test for a scheduling
> feature a few months ago).
> But another point is, that Zeppelin should also focus on quality,
> reproduceability and portability.
> Although this doesn't offer exciting new features, it would make
> development much easier.
>
> Cross-platform testability, Tests that pass when run sequentially,
> compatibility with Firefox, and many more open issues that make it so much
> harder to enhance Zeppelin and add features should be addressed soon,
> preferably before more features are added. Already Zeppelin is suffering -
> in my opinion - from quite a lot of feature creep, and we should avoid
> putting in the kitchen sink, at the cost of quality and maintainability.
> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>
> Oozie, in my opinion, is a dead end - it may de-facto still be in use
> on many clusters, but it's not getting the love it needs, and I wouldn't
> bet on it, when it comes to integrating scheduling. Instead, any external
> tool should be able to use the REST-API to trigger executions, if you want
> external scheduling.
>
> So, in conclusion, if we take Moon's list as a list of descending
> priorities, I fully agree, under the condition that code quality is
> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
> SPNEGO SSO support is what we really want) with user and group rights
> assignment on the notebook level. We probably also need Knox-integration
> (ODP-Members looking at integrating Zeppelin should consider contributing
> this), and integration of something like Spree (
> 

Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Benjamin Kim
I see in the Enterprise section that multi-tenancy will be included, will this 
have user impersonation too? In this way, the user executing will be the user 
owning the process.

> On Mar 1, 2016, at 12:51 AM, Shabeel Syed  wrote:
> 
> +1
> 
> Hi Tamas,
>Pluggable external visualization is really a GREAT feature to have. I'm 
> looking forward to this :)
> 
> Regards
> Shabeel
> 
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi  > wrote:
> Hey,
> 
> Really promising roadmap.
> 
> I'd only push more visualization options. I agree built in visualization is 
> needed with limited charting options but I think we also need somehow 
> 'inject' external js visualizations also. 
> 
> 
> For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow 
>  through the job rest api. It's an 
> enterprise ready and very robust solution right now.
> 
> Tamas
> 
> 
> On 1 March 2016 at 09:12, Eran Witkon  > wrote:
> One point to clarify, I don't want to suggest Oozie in specific, I want to 
> think about which features we develop and which ones we integrate external, 
> preferred Apache, technology? We don't think about building our own storage 
> services so why build our own scheduler?
> Eran 
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee  > wrote:
> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling. Either 
> external or built-in, I completely agree having enterprise level job 
> scheduling support on the roadmap.
> ZEPPELIN-137 , 
> ZEPPELIN-531  are related 
> issues i can find in our JIRA.
> 
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable notebook 
> storage layer (see related package 
> ).
>  So, github notebook sync can be implemented easily.
> 
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying data. 
> So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will follow. 
> Then we'll get idea when those features will be available.
> 
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for 
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good idea.
> 
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
> 
> Thanks,
> moon
> 
> 
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  > wrote:
> Hi,
> 
> For one, I know that there is rudimentary scheduling built into Zeppelin 
> already (at least I fixed a bug in the test for a scheduling feature a few 
> months ago).
> But another point is, that Zeppelin should also focus on quality, 
> reproduceability and portability.
> Although this doesn't offer exciting new features, it would make development 
> much easier.
> 
> Cross-platform testability, Tests that pass when run sequentially, 
> compatibility with Firefox, and many more open issues that make it so much 
> harder to enhance Zeppelin and add features should be addressed soon, 
> preferably before more features are added. Already Zeppelin is suffering - in 
> my opinion - from quite a lot of feature creep, and we should avoid putting 
> in the kitchen sink, at the cost of quality and maintainability. Instead 
> modularity (ZEPPELIN-533 in particular) should be targeted.
> 
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on many 
> clusters, but it's not getting the love it needs, and I wouldn't bet on it, 
> when it comes to integrating scheduling. Instead, any external tool should be 
> able to use the REST-API to trigger executions, if you want external 
> scheduling.
> 
> So, in conclusion, if we take Moon's list as a list of descending priorities, 
> I fully agree, under the condition that code quality is included as a subset 
> of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is 
> what we really want) with user and group rights assignment on the notebook 
> level. We probably also need Knox-integration (ODP-Members looking at 
> integrating Zeppelin should consider contributing this), and integration of 
> something like Spree (https://github.com/hammerlab/spree 
> ) to be able to profile jobs.
> 
> I'm hopeful that soon I can resume contributing some quality-oriented code, 
> to drive this 

Re: Can zeppelin send email by using scheduler?

2016-03-01 Thread Felix Cheung
Sounds like it could be an interesting feature to add.
Would you like to contribute? :)






On Tue, Mar 1, 2016 at 3:49 AM -0800, "魏龙星"  wrote:





In that case, users have to write code for every notebook.


Eran Witkon 于2016年3月1日周二 下午7:48写道:

> I guess that if the scheduler can run a notebook then the notebook code
> can send the mail
> Eran
> On Tue, 1 Mar 2016 at 13:38 魏龙星  wrote:
>
>> Zeppelin already support scheduler. However users can only check the
>> results on the web. The scheduler is useless since users can execute on the
>> web.  I am wondering whether zeppelin can support sending the result to
>> users like crontab.
>>
>> Any suggestions?
>>
>> Thanks.
>> Longxing
>>
>


error "Could not find creator property with name 'id' "

2016-03-01 Thread enzo
I get the following euro in a variety of circumstances.

I’ve downloaded zeppelin a couple of days ago.  I use Spark 1.6.0.


For example:

%spark

val raw = sc.textFile("/tmp/github.json”)  // reading a 25Mb file from /tmp

Gives the following error.  Hey please!!


com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
 at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
at 
com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at 
com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at 
com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at 
com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at 
org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
at 
org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
at $iwC$$iwC$$iwC$$iwC.(:59)
at $iwC$$iwC$$iwC.(:61)
at $iwC$$iwC.(:63)
at $iwC.(:65)
at (:67)
at .(:71)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:780)
at 
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:744)
at 

Re: Math formula on %md model

2016-03-01 Thread Aish Fenton
+1 that'd be amazing.
On Mon, Feb 29, 2016 at 6:04 PM Trevor Grant 
wrote:

> +1 for math formula support in the mark down interpreter,
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Mon, Feb 29, 2016 at 7:57 PM, Jun Chen  wrote:
>
>> Hi,
>>
>> I found both Jupyter and Gitbook(use katex or mathjax) support math
>> formula on markdown model, so how about zeppelin? The $$ math formula $$ is
>> not work in zeppelin.
>>
>> BR
>> mufeng
>>
>
>


Re: problem with start H2OContent

2016-03-01 Thread Aleksandr Modestov
Zepellin doesn't work with external Apache Spark, but I can launch Spark
with h2o package from shell.
I use a string export "SPARK_HOME=/path_with_spark_1.5" in zeppelin-env.sh.
But I'm not sure that zeppelin sees the extermal interpreter.

On Mon, Feb 29, 2016 at 7:35 PM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

>
>
> Can you try running it from just a Spark shell to confirm it works that
> way (no other conflict)?
>
>
>
> bin/spark-shell --master local[*] --packages
> ai.h2o:sparkling-water-core_2.10:1.5.10
>
>
>
> Also, are you able to run the Spark interpreter without the h2o package?
>
>
>
> Thanks,
>
> Silvio
>
>
>
> *From: *Aleksandr Modestov 
> *Sent: *Monday, February 29, 2016 11:30 AM
> *To: *users@zeppelin.incubator.apache.org
> *Subject: *Re: problem with start H2OContent
>
>
> I use Spark 1.5
> The problem with the external Spark with internal Spark I can not launch
> h2oContent :)
> The error is:
>
> "ERROR [2016-02-29 19:28:16,609] ({pool-1-thread-3}
> NotebookServer.java[afterStatusChange]:766) - Error
> org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
> Connection refused
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:268)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:198)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
> at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:322)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
> Connection refused
> at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
> at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
> at
> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
> at
> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
> at
> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
> at
> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:266)
> ... 11 more
> Caused by: org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection refused
> at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
> at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
> ... 18 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:579)
> at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
> ... 19 more"
>
> On Mon, Feb 29, 2016 at 7:07 PM, Silvio Fiorito <
> silvio.fior...@granturing.com> wrote:
>
>> In your zeppelin-env you set SPARK_HOME and SPARK_SUBMIT_OPTIONS ?
>> Anything in the logs? Looks like the interpreter failed to start.
>>
>>
>>
>> Also, Sparkling Water currently supports up to 1.5 only, last I checked.
>>
>>
>>
>> Thanks,
>>
>> Silvio
>>
>>
>>
>>
>>
>>
>>
>> *From: *Aleksandr Modestov 
>> *Sent: *Monday, February 29, 2016 10:43 AM
>> *To: *users@zeppelin.incubator.apache.org
>> *Subject: *Re: problem with start H2OContent
>>
>>
>> When I use external Spark I get exeption:
>>
>> java.net.ConnectException: Connection refused at
>> java.net.PlainSocketImpl.socketConnect(Native Method) at
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>> at
>> 

Can zeppelin send email by using scheduler?

2016-03-01 Thread 魏龙星
Zeppelin already support scheduler. However users can only check the
results on the web. The scheduler is useless since users can execute on the
web.  I am wondering whether zeppelin can support sending the result to
users like crontab.

Any suggestions?

Thanks.
Longxing


Zeppelin Real Time usecases

2016-03-01 Thread Shabeel Syed
Hi All,

   I'm planning to give a demo tomorrow to my team here on Zeppelin.

   It would be great, if I can get a list of some real time usecases with
Zeppelin by prominent companies .

Regards,
Shabeel