Re: More than 2 notebooks in R failing with error sparkr intrepreter not responding

2018-01-02 Thread Meethu Mathew
Hi Jeff,

PFB the interpreter log.

INFO [2018-01-03 12:10:05,960] ({pool-2-thread-9}
Logging.scala[logInfo]:58) - Starting HTTP Server
 INFO [2018-01-03 12:10:05,961] ({pool-2-thread-9}
Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
 INFO [2018-01-03 12:10:05,963] ({pool-2-thread-9}
AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:58989
 INFO [2018-01-03 12:10:05,963] ({pool-2-thread-9}
Logging.scala[logInfo]:58) - Successfully started service 'HTTP class
server' on port 58989.
 INFO [2018-01-03 12:10:06,094] ({dispatcher-event-loop-1}
Logging.scala[logInfo]:58) - Removed broadcast_1_piece0 on localhost:42453
in memory (size: 854.0 B, free: 511.1 MB)
 INFO [2018-01-03 12:10:07,049] ({pool-2-thread-9}
ZeppelinR.java[createRScript]:353) - File
/tmp/zeppelin_sparkr-5046601627391341672.R created
ERROR [2018-01-03 12:10:17,051] ({pool-2-thread-9} Job.java[run]:188) - Job
failed
*org.apache.zeppelin.interpreter.InterpreterException: sparkr is not
responding *


R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)


> args <- commandArgs(trailingOnly = TRUE)

> hashCode <- as.integer(args[1])

> port <- as.integer(args[2])

> libPath <- args[3]

> version <- as.integer(args[4])

> rm(args)

>
> print(paste("Port ", toString(port)))
[1]
 "Port  58063"

> print(paste("LibPath ", libPath))
[1]

 "LibPath  /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib"
>

> .libPaths(c(file.path(libPath), .libPaths()))

> library(SparkR)

Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from ‘package:base’:

colnames, colnames<-, endsWith, intersect, rank, rbind, sample,
startsWith, subset, summary, table, transform

> SparkR:::connectBackend("localhost", port, 6000)
A connection with
description "->localhost:58063"
class
 "sockconn"
mode"wb"
text"binary"
opened  "opened"
can read"yes"
can write   "yes"
>

> # scStartTime is needed by R/pkg/R/sparkR.R

> assign(".scStartTime", as.integer(Sys.time()), envir =
SparkR:::.sparkREnv)

> # getZeppelinR

> *.zeppelinR = SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinR",
"getZeppelinR", hashCode)*

at
org.apache.zeppelin.spark.ZeppelinR.waitForRScriptInitialized(ZeppelinR.java:285)
at org.apache.zeppelin.spark.ZeppelinR.request(ZeppelinR.java:227)
at org.apache.zeppelin.spark.ZeppelinR.eval(ZeppelinR.java:176)
at org.apache.zeppelin.spark.ZeppelinR.open(ZeppelinR.java:165)
at
org.apache.zeppelin.spark.SparkRInterpreter.open(SparkRInterpreter.java:90)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 INFO [2018-01-03 12:10:17,070] ({pool-2-thread-9}
SchedulerFactory.java[jobFinished]:137) - Job
remoteInterpretJob_1514961605951 finished by scheduler
org.apache.zeppelin.spark.SparkRInterpreter392022746
 INFO [2018-01-03 12:39:22,664] ({Spark Context Cleaner}
Logging.scala[logInfo]:58) - Cleaned accumulator 2


PFB the output of the command  *ps -ef | grep /usr/lib/R/bin/exec/R*

meethu6647  6470  0 12:09 pts/100:00:00 /usr/lib/R/bin/exec/R
--no-save --no-restore -f /tmp/zeppelin_sparkr-1100854828050763213.R --args
214655664 58063 /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib 10601
meethu6701  6470  0 12:09 pts/100:00:00 /usr/lib/R/bin/exec/R
--no-save --no-restore -f /tmp/zeppelin_sparkr-4152305170353311178.R --args
1642312173 58063 /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib 10601
meethu6745  6470  0 12:10 pts/100:00:00 /usr/lib/R/bin/exec/R
--no-save --no-restore -f /tmp/zeppelin_sparkr-5046601627391341672.R --args
1158632477 58063 /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib 10601


Regards,
Meethu Mathew


On Wed, Jan 3, 2018 at 12:56 PM, Jeff Zhang <zjf...@gmail.com> wrote:

>
> Could

More than 2 notebooks in R failing with error sparkr intrepreter not responding

2018-01-02 Thread Meethu Mathew
Hi,

I have met with a strange issue in running R notebooks in zeppelin(0.7.2).
Spark intrepreter is in per note Scoped mode and spark version is 1.6.2

Please find the steps below to reproduce the issue:
1. Create a notebook (Note1) and run any r code in a paragraph. I ran the
following code.

> %r
>
> rdf <- data.frame(c(1,2,3,4))
>
> colnames(rdf) <- c("myCol")
>
> sdf <- createDataFrame(sqlContext, rdf)
>
> withColumn(sdf, "newCol", sdf$myCol * 2.0)
>
>
2.  Create another notebook (Note2) and run any r code in a paragraph. I
ran the same code as above.

Till now everything works fine.

3. Create third notebook (Note3) and run any r code in a paragraph. I ran
the same code. This notebook fails with the error

> org.apache.zeppelin.interpreter.InterpreterException: sparkr is not
> responding


 What I understood from the analysis is that  the process created for
sparkr interpreter is not getting killed properly and this makes every
third model to throw an error while executing. The process will be killed
on restarting the sparkr interpreter and another 2 models could be executed
successfully. ie, For every third model run using the sparkr interpreter,
the error is thrown. We suspect this as a limitation with zeppelin.

Please help to solve this issue

Regards,
Meethu Mathew


Re: Zeppelin framework is not getting unregistered from Mesos

2017-04-27 Thread Meethu Mathew
Hi Moon,

Yes its fixed in 0.7.1. Thank you

Regards,
Meethu Mathew


On Wed, Apr 26, 2017 at 10:42 PM, moon soo Lee <m...@apache.org> wrote:

> Some bugs related to interpreter process management has been fixed in
> 0.7.1 release [1]. Could you try 0.7.1 or master branch and see if the same
> problem occurs?
>
> Thanks,
> moon
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1832
>
> On Wed, Apr 26, 2017 at 1:13 AM Meethu Mathew <meethu.mat...@flytxt.com>
> wrote:
>
>> Hi,
>>
>> We have connected our zeppelin to mesos. But the issue we are facing is
>> that Zeppelin framework is not getting unregistered from Mesos  even if the
>> notebook is closed.
>>
>> Another problem is if the user logout from zeppelin, the SparkContext is
>> getting stopped. When the same user login again, it creates another
>> SparkContext and then the previous SparkContext will become a dead process
>> and exist.
>>
>> Is it a bug of zeppelin or is there any other proper way to unbind the
>> zeppelin framework?
>>
>> Zeppelin version is 0.7.0
>>
>> Regards,
>>
>>
>> Meethu Mathew
>>
>>


Zeppelin framework is not getting unregistered from Mesos

2017-04-26 Thread Meethu Mathew
Hi,

We have connected our zeppelin to mesos. But the issue we are facing is
that Zeppelin framework is not getting unregistered from Mesos  even if the
notebook is closed.

Another problem is if the user logout from zeppelin, the SparkContext is
getting stopped. When the same user login again, it creates another
SparkContext and then the previous SparkContext will become a dead process
and exist.

Is it a bug of zeppelin or is there any other proper way to unbind the
zeppelin framework?

Zeppelin version is 0.7.0

Regards,
Meethu Mathew


Re: UnicodeDecodeError in zeppelin 0.7.1

2017-04-20 Thread Meethu Mathew
​​
Hi,

Thanks for the repsonse.

@ moon soo Lee: The interpreter setting is same in 0.7.0 and 0.7.1

@ Felix Cheng : The Python version is same.

The code is as follows:

*PYSPARK*

def textPreProcessor(text):
>  for w in text.split():
>
> ​ ​
> regex = re.compile('[%s]' % re.escape(string.punctuation))
>
> ​* ​*
> *no_punctuation = unicode(regex.sub(' ', w),'utf8')*
>
> ​ ​
> tokens = word_tokenize(no_punctuation)
>
> ​ ​
> lowercased = [t.lower() for t in tokens]
>
> ​ ​
> no_stopwords = [w for w in lowercased if not w in stopwordsX]
>
> ​ ​
> stemmed = [stemmerX.stem(w) for w in no_stopwords]
>
> ​ ​
> return [w for w in stemmed if w]



>- docs =sc.textFile(hdfs_path+training_data,*use_unicode=False*
>).repartition(96)
>- docs.map(lambda features: sentimentObject.textPreProcessor(features.
>split(delimiter)[text_colum])).count()
>
>
*Error:*

   - UnicodeDecodeError: 'utf8' codec can't decode byte 0x9b in position
   17: invalid start byte


   - Same error  *use_unicode=False* is not used


   - Error change to *'ascii' codec can't decode byte 0x97 in position 3:
   ordinal not in range(128) when **no_punctuation = regex.sub(' ', w)* is
   used instead of *no_punctuation = unicode(regex.sub(' ', w),'utf8'). *

*Note :: In version 0.7.0 the code was running fine without using
use_unicode and unicode(regex.sub(' ', w),'utf8')*

*PYTHON*

def textPreProcessor(text_column):
> processed_text=[]
> for text in text_column:
>for w in text.split():
>   regex = re.compile('[%s]' % re.escape(string.punctuation)) # reg
> exprn for puntuation
>   no_punctuation = unicode(regex.sub(' ', text_),'utf8')
>  tokens = word_tokenize(no_punctuation)
>  lowercased = [t.lower() for t in tokens]
>no_stopwords = [w for w in lowercased if not w in stopwordsX]
>stemmed = [stemmerX.stem(w) for w in no_stopwords]
>processed_text.append([w for w in stemmed if w])
> return processed_text


   - new_training = pd.read_csv(training_data,header=None,
   delimiter=delimiter, error_bad_lines=False, usecols=[label_column,text_
   column],names=['label','msg']).dropna()
   - new_training['processed_msg'] = textPreProcessor(new_training['msg'])

This python code is working and I am getting result. In version 0.7.0, I am
getting output without using the unicode function.

Hope the problem is clear now.

Regards,
Meethu Mathew


On Fri, Apr 21, 2017 at 3:07 AM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> And are they running with the same Python version? What is the Python
> version?
>
> _
> From: moon soo Lee <m...@apache.org>
> Sent: Thursday, April 20, 2017 11:53 AM
> Subject: Re: UnicodeDecodeError in zeppelin 0.7.1
> To: <users@zeppelin.apache.org>
>
>
>
> Hi,
>
> 0.7.1 didn't changed any encoding type as far as i know.
> One difference is 0.7.1 official artifact has been built with JDK8 while
> 0.7.0 built with JDK7 (we'll use JDK7 to build upcoming 0.7.2 binary). But
> i'm not sure that can make pyspark and spark encoding type changes.
>
> Do you have exactly the same interpreter setting in 0.7.1 and 0.7.0?
>
> Thanks,
> moon
>
> On Wed, Apr 19, 2017 at 5:30 AM Meethu Mathew <meethu.mat...@flytxt.com>
> wrote:
>
>> Hi,
>>
>> I just migrated from zeppelin 0.7.0 to zeppelin 0.7.1 and I am facing
>> this error while creating an RDD(in pyspark).
>>
>> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
>>> invalid start byte
>>
>>
>> I was able to create the RDD without any error after adding
>> use_unicode=False as follows
>>
>>> sc.textFile("file.csv",use_unicode=False)
>>
>>
>> ​But it fails when I try to stem the text. I am getting similar error
>> when trying to apply stemming to the text using python interpreter.
>>
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
>>> ordinal not in range(128)
>>
>> All these code is working in 0.7.0 version. There is no change in the
>> dataset and code. ​Is there any change in the encoding type in the new
>> version of zeppelin?
>>
>> Regards,
>>
>>
>> Meethu Mathew
>>
>>
>
>


UnicodeDecodeError in zeppelin 0.7.1

2017-04-19 Thread Meethu Mathew
Hi,

I just migrated from zeppelin 0.7.0 to zeppelin 0.7.1 and I am facing this
error while creating an RDD(in pyspark).

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
> invalid start byte


I was able to create the RDD without any error after adding
use_unicode=False as follows

> sc.textFile("file.csv",use_unicode=False)


​But it fails when I try to stem the text. I am getting similar error when
trying to apply stemming to the text using python interpreter.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
> ordinal not in range(128)

All these code is working in 0.7.0 version. There is no change in the
dataset and code. ​Is there any change in the encoding type in the new
version of zeppelin?

Regards,
Meethu Mathew


sqlContext not avilable as hiveContext in notebook

2017-04-04 Thread Meethu Mathew
Hi,

I am running zeppelin 0.7.0. the sqlContext already created in the zeppelin
notebook returns a ,
even though my spark is built with HIVE.

"zeppelin.spark.useHiveContext" in the spark properties is set to true.

As mentioned in https://issues.apache.org/jira/browse/ZEPPELIN-1728, I
tried

  hc = HiveContext.getOrCreate(sc)

​but still its returning
​
​.

My pyspark shell and ​jupyter notebook is returning

​ without doing anything.

How to get

​ in the zeppelin notebook ?​

Regards,
Meethu Mathew


Separate interpreter running scope Per user or Per Note documentation

2017-03-28 Thread Meethu Mathew
Hi,

I couldnt find the documentation for the feature Separate interpreter
running scope Per user or Per Note at
https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html#interpreter-binding-mode
.
 Can somebody  help me in understanding the per note scoped mode and per
user scoped mode?

Regards,
Meethu Mathew


Auto completion for defined variable names

2017-03-20 Thread Meethu Mathew
Hi,

Is there any way to get auto-completion or suggestions for the defined
variable names? In Jupyter notebooks, once defined variables will show
under suggestions.
Ctrl+. is giving awkward suggestions for related functions also. For a
spark data frame, it wont show the relevant functions.

Please improve the suggestion functionality.

Regards,
Meethu Mathew


--files in SPARK_SUBMIT_OPTIONS not working - ZEPPELIN-2136

2017-03-17 Thread Meethu Mathew
Hi,

Acc to the zeppelin documentation, to pass a python package to zeppelin
pyspark interpreter, you can export it through --files option in
SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh.

When I add a .egg file through the --files option in SPARK_SUBMIT_OPTIONS ,
zeppelin notebook is not throwing error, but I am not able to import the
module inside the zeppelin notebook.

Spark version is 1.6.2 and the zepplein-env.sh(version 0.7.0) file looks
like:
export SPARK_HOME=/home/me/spark-1.6.1-bin-hadoop2.6
export SPARK_SUBMIT_OPTIONS="--jars
/home/me/spark-csv-1.5.0-s_2.10.jar,/home/me/commons-csv-1.4.jar --files
/home/me/models/Churn/package/build/dist/fly_libs-1.1-py2.7.egg"

Any progress in this ticket ZEPPELIN-2136
<https://issues.apache.org/jira/browse/ZEPPELIN-2136> ?

Regards,
Meethu Mathew


python prints "..." in the place of comments in output

2017-03-16 Thread Meethu Mathew
Hi,

The output of following code prints unexpected dots in the result if there
is a comment in the code. Is it a bug with zeppelin?

*Code :*

%python
v = [1,2,3]
#comment 1
#comment
print v

*output*
... ... [1, 2, 3]

Regards,
Meethu Mathew


Re: "spark ui" button in spark interpreter does not show Spark web-ui

2017-03-13 Thread Meethu Mathew
Hi,

I have noticed the same problem

Regards,
Meethu Mathew


On Mon, Mar 13, 2017 at 9:56 AM, Xiaohui Liu <hero...@gmail.com> wrote:

> Hi,
>
> We used 0.7.1-snapshot with our Mesos cluster, almost all our needed
> features (ldap login, notebook acl control, livy/pyspark/rspark/scala,
> etc.) work pretty well.
>
> But one thing does not work for us is the 'spark ui' button does not
> response to user clicks. No errors in browser side.
>
> Anyone has met similar issues? Any suggestions about where I should check?
>
> Regards
> Xiaohui
>


Adding images in the %md interpreter

2017-03-02 Thread Meethu Mathew
Hi all,

I am trying to display images in the %md interpreter of zeppelin(version
0.7.0) notebook using the following code.
   * ![](model-files/sentiment_donut_viz.png)*

But I am facing the following problems:

1. Not able to give a local path
2. I put the file inside the {zeppelin_home}/webapps/webapp and it worked.
But the files or folders added in this folder which is
the ZEPPELIN_WAR_TEMPDIR is deleted after a restart.

How can I add images in the mark down interpreter without using other
webservers?

Regards,
Meethu Mathew