Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

2016-07-13 Thread CloverHearts
nice to meet you.

I have created a <https://github.com/apache/zeppelin/pull/1176>.

Do you need the feature to run for all paragraph in note?

I think that the function is needed.

I will implement it.

 

Thank you.

 

출발: xiufeng liu <toxiuf...@gmail.com>
회신 대상: <users@zeppelin.apache.org>
날짜: 2016년 7월 14일 목요일 오전 3:18
받는 사람: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
주제: Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

 

It is easy to change the code. I did myself and use it as an ETL tool. It is 
very powerful 

 

Afancy 

On Wednesday, July 13, 2016, Ahmed Sobhi <ahmed.so...@gmail.com> wrote:

I think this pr addresses what I need. Case 2 seem to describe the issue I'm 
having if I'm reading it correctly.

 

The proposed solution, however, is not that clear to me.

 

Is it that you define workflows where a work flow is a sequence of (notebook, 
paragraph) pairs that are to be run in a specific order?

If that's the case, then this definitely solves my problem, but it's really 
cumbersome from a usability point of view. I think a better solution for my use 
case is to just have an option to run all paragraphs in the order they appear 
in on the notebook, regardless of which interpreter they use.

 

On Wed, Jul 13, 2016 at 12:31 PM, Hyung Sung Shim <hss...@nflabs.com> wrote:

hi.

Maybe https://github.com/apache/zeppelin/pull/1176 is related what you want.

Please check this pr.


2016년 7월 13일 수요일, xiufeng liu<toxiuf...@gmail.com>님이 작성한 메시지:

 

You have to change the source codes to add the dependencies of running 
paragraphs. I think it is a really interesting feature, for example, it can be 
use as an ETL tool. But, unfortunately, there is no configuration option right 
now.

 

/afancy

 

On Wed, Jul 13, 2016 at 12:27 PM, Ahmed Sobhi <ahmed.so...@gmail.com> wrote:

Hello,

 

I have been working on a large Spark Scala notebook. I recently had the 
requirement to produce graphs/plots out of these data. Python and PySpark 
seemed like a natural fit but since I've already invested a lot of time and 
effort into the Scala version, I want to restrict my usage of python to just 
plotting.

 

I found a good workflow for where in the scala paragraphs I can use 
registerTempTable and in python I can just use sqlContext.table to retrieve 
that table.

 

The problem now is that if I try to run all paragraphs to get the notebook 
updated, the python paragraphs fail because they are running before the scala 
ones eventhough they are placed after them.

 

It seems like the behavior in Zeppelin is that it attempts to run the 
paragraphs concurrently if they were running on different interpreters which 
might seem fine on the surface. But now that I want to introduce some 
dependency between spark/pyspark paragraphs, is there any way to do that?
 

-- 

Cheers,
Ahmed

 



 

-- 

Cheers,
Ahmed

http://bit.ly/ahmed_abtme



Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

2016-07-13 Thread Hyung Sung Shim
hi
I think you can run the workflows that you defined just 'run' paragraph.
and I believe functionality of view are going to be better. :)

2016년 7월 14일 목요일, xiufeng liu님이 작성한 메시지:

> It is easy to change the code. I did myself and use it as an ETL tool. It
> is very powerful
>
> Afancy
>
> On Wednesday, July 13, 2016, Ahmed Sobhi  > wrote:
>
>> I think this pr addresses what I need. Case 2 seem to describe the issue
>> I'm having if I'm reading it correctly.
>>
>> The proposed solution, however, is not that clear to me.
>>
>> Is it that you define workflows where a work flow is a sequence of
>> (notebook, paragraph) pairs that are to be run in a specific order?
>> If that's the case, then this definitely solves my problem, but it's
>> really cumbersome from a usability point of view. I think a better solution
>> for my use case is to just have an option to run all paragraphs in the
>> order they appear in on the notebook, regardless of which interpreter they
>> use.
>>
>> On Wed, Jul 13, 2016 at 12:31 PM, Hyung Sung Shim 
>> wrote:
>>
>>> hi.
>>> Maybe https://github.com/apache/zeppelin/pull/1176 is related what you
>>> want.
>>> Please check this pr.
>>>
>>> 2016년 7월 13일 수요일, xiufeng liu님이 작성한 메시지:
>>>
>>> You have to change the source codes to add the dependencies of running
 paragraphs. I think it is a really interesting feature, for example, it can
 be use as an ETL tool. But, unfortunately, there is no configuration option
 right now.

 /afancy

 On Wed, Jul 13, 2016 at 12:27 PM, Ahmed Sobhi 
 wrote:

> Hello,
>
> I have been working on a large Spark Scala notebook. I recently had
> the requirement to produce graphs/plots out of these data. Python and
> PySpark seemed like a natural fit but since I've already invested a lot of
> time and effort into the Scala version, I want to restrict my usage of
> python to just plotting.
>
> I found a good workflow for where in the scala paragraphs I can use 
> *registerTempTable
> *and in python I can just use *sqlContext.table *to retrieve that
> table.
>
> The problem now is that if I try to run all paragraphs to get the
> notebook updated, the python paragraphs fail because they are running
> before the scala ones eventhough they are placed after them.
>
> It seems like the behavior in Zeppelin is that it attempts to run the
> paragraphs concurrently if they were running on different interpreters
> which might seem fine on the surface. But now that I want to introduce 
> some
> dependency between spark/pyspark paragraphs, is there any way to do that?
>
> --
> Cheers,
> Ahmed
>


>>
>>
>> --
>> Cheers,
>> Ahmed
>> http://bit.ly/ahmed_abtme 
>>
>


Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

2016-07-13 Thread xiufeng liu
It is easy to change the code. I did myself and use it as an ETL tool. It
is very powerful

Afancy

On Wednesday, July 13, 2016, Ahmed Sobhi  wrote:

> I think this pr addresses what I need. Case 2 seem to describe the issue
> I'm having if I'm reading it correctly.
>
> The proposed solution, however, is not that clear to me.
>
> Is it that you define workflows where a work flow is a sequence of
> (notebook, paragraph) pairs that are to be run in a specific order?
> If that's the case, then this definitely solves my problem, but it's
> really cumbersome from a usability point of view. I think a better solution
> for my use case is to just have an option to run all paragraphs in the
> order they appear in on the notebook, regardless of which interpreter they
> use.
>
> On Wed, Jul 13, 2016 at 12:31 PM, Hyung Sung Shim  > wrote:
>
>> hi.
>> Maybe https://github.com/apache/zeppelin/pull/1176 is related what you
>> want.
>> Please check this pr.
>>
>> 2016년 7월 13일 수요일, xiufeng liu> >님이 작성한 메시지:
>>
>> You have to change the source codes to add the dependencies of running
>>> paragraphs. I think it is a really interesting feature, for example, it can
>>> be use as an ETL tool. But, unfortunately, there is no configuration option
>>> right now.
>>>
>>> /afancy
>>>
>>> On Wed, Jul 13, 2016 at 12:27 PM, Ahmed Sobhi 
>>> wrote:
>>>
 Hello,

 I have been working on a large Spark Scala notebook. I recently had the
 requirement to produce graphs/plots out of these data. Python and PySpark
 seemed like a natural fit but since I've already invested a lot of time and
 effort into the Scala version, I want to restrict my usage of python to
 just plotting.

 I found a good workflow for where in the scala paragraphs I can use 
 *registerTempTable
 *and in python I can just use *sqlContext.table *to retrieve that
 table.

 The problem now is that if I try to run all paragraphs to get the
 notebook updated, the python paragraphs fail because they are running
 before the scala ones eventhough they are placed after them.

 It seems like the behavior in Zeppelin is that it attempts to run the
 paragraphs concurrently if they were running on different interpreters
 which might seem fine on the surface. But now that I want to introduce some
 dependency between spark/pyspark paragraphs, is there any way to do that?

 --
 Cheers,
 Ahmed

>>>
>>>
>
>
> --
> Cheers,
> Ahmed
> http://bit.ly/ahmed_abtme 
>


Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

2016-07-13 Thread xiufeng liu
You have to change the source codes to add the dependencies of running
paragraphs. I think it is a really interesting feature, for example, it can
be use as an ETL tool. But, unfortunately, there is no configuration option
right now.

/afancy

On Wed, Jul 13, 2016 at 12:27 PM, Ahmed Sobhi  wrote:

> Hello,
>
> I have been working on a large Spark Scala notebook. I recently had the
> requirement to produce graphs/plots out of these data. Python and PySpark
> seemed like a natural fit but since I've already invested a lot of time and
> effort into the Scala version, I want to restrict my usage of python to
> just plotting.
>
> I found a good workflow for where in the scala paragraphs I can use 
> *registerTempTable
> *and in python I can just use *sqlContext.table *to retrieve that table.
>
> The problem now is that if I try to run all paragraphs to get the notebook
> updated, the python paragraphs fail because they are running before the
> scala ones eventhough they are placed after them.
>
> It seems like the behavior in Zeppelin is that it attempts to run the
> paragraphs concurrently if they were running on different interpreters
> which might seem fine on the surface. But now that I want to introduce some
> dependency between spark/pyspark paragraphs, is there any way to do that?
>
> --
> Cheers,
> Ahmed
>


Order of paragraphs vs. different interpreters (spark vs. pyspark)

2016-07-13 Thread Ahmed Sobhi
Hello,

I have been working on a large Spark Scala notebook. I recently had the
requirement to produce graphs/plots out of these data. Python and PySpark
seemed like a natural fit but since I've already invested a lot of time and
effort into the Scala version, I want to restrict my usage of python to
just plotting.

I found a good workflow for where in the scala paragraphs I can use
*registerTempTable
*and in python I can just use *sqlContext.table *to retrieve that table.

The problem now is that if I try to run all paragraphs to get the notebook
updated, the python paragraphs fail because they are running before the
scala ones eventhough they are placed after them.

It seems like the behavior in Zeppelin is that it attempts to run the
paragraphs concurrently if they were running on different interpreters
which might seem fine on the surface. But now that I want to introduce some
dependency between spark/pyspark paragraphs, is there any way to do that?

-- 
Cheers,
Ahmed