Re: More than 2 notebooks in R failing with error sparkr intrepreter not responding
Hi Jeff, PFB the interpreter log. INFO [2018-01-03 12:10:05,960] ({pool-2-thread-9} Logging.scala[logInfo]:58) - Starting HTTP Server INFO [2018-01-03 12:10:05,961] ({pool-2-thread-9} Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT INFO [2018-01-03 12:10:05,963] ({pool-2-thread-9} AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0:58989 INFO [2018-01-03 12:10:05,963] ({pool-2-thread-9} Logging.scala[logInfo]:58) - Successfully started service 'HTTP class server' on port 58989. INFO [2018-01-03 12:10:06,094] ({dispatcher-event-loop-1} Logging.scala[logInfo]:58) - Removed broadcast_1_piece0 on localhost:42453 in memory (size: 854.0 B, free: 511.1 MB) INFO [2018-01-03 12:10:07,049] ({pool-2-thread-9} ZeppelinR.java[createRScript]:353) - File /tmp/zeppelin_sparkr-5046601627391341672.R created ERROR [2018-01-03 12:10:17,051] ({pool-2-thread-9} Job.java[run]:188) - Job failed *org.apache.zeppelin.interpreter.InterpreterException: sparkr is not responding * R version 3.4.1 (2017-06-30) -- "Single Candle" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) > args <- commandArgs(trailingOnly = TRUE) > hashCode <- as.integer(args[1]) > port <- as.integer(args[2]) > libPath <- args[3] > version <- as.integer(args[4]) > rm(args) > > print(paste("Port ", toString(port))) [1] "Port 58063" > print(paste("LibPath ", libPath)) [1] "LibPath /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib" > > .libPaths(c(file.path(libPath), .libPaths())) > library(SparkR) Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var The following objects are masked from ‘package:base’: colnames, colnames<-, endsWith, intersect, rank, rbind, sample, startsWith, subset, summary, table, transform > SparkR:::connectBackend("localhost", port, 6000) A connection with description "->localhost:58063" class "sockconn" mode"wb" text"binary" opened "opened" can read"yes" can write "yes" > > # scStartTime is needed by R/pkg/R/sparkR.R > assign(".scStartTime", as.integer(Sys.time()), envir = SparkR:::.sparkREnv) > # getZeppelinR > *.zeppelinR = SparkR:::callJStatic("org.apache.zeppelin.spark.ZeppelinR", "getZeppelinR", hashCode)* at org.apache.zeppelin.spark.ZeppelinR.waitForRScriptInitialized(ZeppelinR.java:285) at org.apache.zeppelin.spark.ZeppelinR.request(ZeppelinR.java:227) at org.apache.zeppelin.spark.ZeppelinR.eval(ZeppelinR.java:176) at org.apache.zeppelin.spark.ZeppelinR.open(ZeppelinR.java:165) at org.apache.zeppelin.spark.SparkRInterpreter.open(SparkRInterpreter.java:90) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO [2018-01-03 12:10:17,070] ({pool-2-thread-9} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1514961605951 finished by scheduler org.apache.zeppelin.spark.SparkRInterpreter392022746 INFO [2018-01-03 12:39:22,664] ({Spark Context Cleaner} Logging.scala[logInfo]:58) - Cleaned accumulator 2 PFB the output of the command *ps -ef | grep /usr/lib/R/bin/exec/R* meethu6647 6470 0 12:09 pts/100:00:00 /usr/lib/R/bin/exec/R --no-save --no-restore -f /tmp/zeppelin_sparkr-1100854828050763213.R --args 214655664 58063 /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib 10601 meethu6701 6470 0 12:09 pts/100:00:00 /usr/lib/R/bin/exec/R --no-save --no-restore -f /tmp/zeppelin_sparkr-4152305170353311178.R --args 1642312173 58063 /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib 10601 meethu6745 6470 0 12:10 pts/100:00:00 /usr/lib/R/bin/exec/R --no-save --no-restore -f /tmp/zeppelin_sparkr-5046601627391341672.R --args 1158632477 58063 /home/meethu/spark-1.6.1-bin-hadoop2.6/R/lib 10601 Regards, Meethu Mathew On Wed, Jan 3, 2018 at 12:56 PM, Jeff Zhang <zjf...@gmail.com> wrote: > > Could
More than 2 notebooks in R failing with error sparkr intrepreter not responding
Hi, I have met with a strange issue in running R notebooks in zeppelin(0.7.2). Spark intrepreter is in per note Scoped mode and spark version is 1.6.2 Please find the steps below to reproduce the issue: 1. Create a notebook (Note1) and run any r code in a paragraph. I ran the following code. > %r > > rdf <- data.frame(c(1,2,3,4)) > > colnames(rdf) <- c("myCol") > > sdf <- createDataFrame(sqlContext, rdf) > > withColumn(sdf, "newCol", sdf$myCol * 2.0) > > 2. Create another notebook (Note2) and run any r code in a paragraph. I ran the same code as above. Till now everything works fine. 3. Create third notebook (Note3) and run any r code in a paragraph. I ran the same code. This notebook fails with the error > org.apache.zeppelin.interpreter.InterpreterException: sparkr is not > responding What I understood from the analysis is that the process created for sparkr interpreter is not getting killed properly and this makes every third model to throw an error while executing. The process will be killed on restarting the sparkr interpreter and another 2 models could be executed successfully. ie, For every third model run using the sparkr interpreter, the error is thrown. We suspect this as a limitation with zeppelin. Please help to solve this issue Regards, Meethu Mathew
Re: Zeppelin framework is not getting unregistered from Mesos
Hi Moon, Yes its fixed in 0.7.1. Thank you Regards, Meethu Mathew On Wed, Apr 26, 2017 at 10:42 PM, moon soo Lee <m...@apache.org> wrote: > Some bugs related to interpreter process management has been fixed in > 0.7.1 release [1]. Could you try 0.7.1 or master branch and see if the same > problem occurs? > > Thanks, > moon > > [1] https://issues.apache.org/jira/browse/ZEPPELIN-1832 > > On Wed, Apr 26, 2017 at 1:13 AM Meethu Mathew <meethu.mat...@flytxt.com> > wrote: > >> Hi, >> >> We have connected our zeppelin to mesos. But the issue we are facing is >> that Zeppelin framework is not getting unregistered from Mesos even if the >> notebook is closed. >> >> Another problem is if the user logout from zeppelin, the SparkContext is >> getting stopped. When the same user login again, it creates another >> SparkContext and then the previous SparkContext will become a dead process >> and exist. >> >> Is it a bug of zeppelin or is there any other proper way to unbind the >> zeppelin framework? >> >> Zeppelin version is 0.7.0 >> >> Regards, >> >> >> Meethu Mathew >> >>
Zeppelin framework is not getting unregistered from Mesos
Hi, We have connected our zeppelin to mesos. But the issue we are facing is that Zeppelin framework is not getting unregistered from Mesos even if the notebook is closed. Another problem is if the user logout from zeppelin, the SparkContext is getting stopped. When the same user login again, it creates another SparkContext and then the previous SparkContext will become a dead process and exist. Is it a bug of zeppelin or is there any other proper way to unbind the zeppelin framework? Zeppelin version is 0.7.0 Regards, Meethu Mathew
Re: UnicodeDecodeError in zeppelin 0.7.1
Hi, Thanks for the repsonse. @ moon soo Lee: The interpreter setting is same in 0.7.0 and 0.7.1 @ Felix Cheng : The Python version is same. The code is as follows: *PYSPARK* def textPreProcessor(text): > for w in text.split(): > > > regex = re.compile('[%s]' % re.escape(string.punctuation)) > > * * > *no_punctuation = unicode(regex.sub(' ', w),'utf8')* > > > tokens = word_tokenize(no_punctuation) > > > lowercased = [t.lower() for t in tokens] > > > no_stopwords = [w for w in lowercased if not w in stopwordsX] > > > stemmed = [stemmerX.stem(w) for w in no_stopwords] > > > return [w for w in stemmed if w] >- docs =sc.textFile(hdfs_path+training_data,*use_unicode=False* >).repartition(96) >- docs.map(lambda features: sentimentObject.textPreProcessor(features. >split(delimiter)[text_colum])).count() > > *Error:* - UnicodeDecodeError: 'utf8' codec can't decode byte 0x9b in position 17: invalid start byte - Same error *use_unicode=False* is not used - Error change to *'ascii' codec can't decode byte 0x97 in position 3: ordinal not in range(128) when **no_punctuation = regex.sub(' ', w)* is used instead of *no_punctuation = unicode(regex.sub(' ', w),'utf8'). * *Note :: In version 0.7.0 the code was running fine without using use_unicode and unicode(regex.sub(' ', w),'utf8')* *PYTHON* def textPreProcessor(text_column): > processed_text=[] > for text in text_column: >for w in text.split(): > regex = re.compile('[%s]' % re.escape(string.punctuation)) # reg > exprn for puntuation > no_punctuation = unicode(regex.sub(' ', text_),'utf8') > tokens = word_tokenize(no_punctuation) > lowercased = [t.lower() for t in tokens] >no_stopwords = [w for w in lowercased if not w in stopwordsX] >stemmed = [stemmerX.stem(w) for w in no_stopwords] >processed_text.append([w for w in stemmed if w]) > return processed_text - new_training = pd.read_csv(training_data,header=None, delimiter=delimiter, error_bad_lines=False, usecols=[label_column,text_ column],names=['label','msg']).dropna() - new_training['processed_msg'] = textPreProcessor(new_training['msg']) This python code is working and I am getting result. In version 0.7.0, I am getting output without using the unicode function. Hope the problem is clear now. Regards, Meethu Mathew On Fri, Apr 21, 2017 at 3:07 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > And are they running with the same Python version? What is the Python > version? > > _ > From: moon soo Lee <m...@apache.org> > Sent: Thursday, April 20, 2017 11:53 AM > Subject: Re: UnicodeDecodeError in zeppelin 0.7.1 > To: <users@zeppelin.apache.org> > > > > Hi, > > 0.7.1 didn't changed any encoding type as far as i know. > One difference is 0.7.1 official artifact has been built with JDK8 while > 0.7.0 built with JDK7 (we'll use JDK7 to build upcoming 0.7.2 binary). But > i'm not sure that can make pyspark and spark encoding type changes. > > Do you have exactly the same interpreter setting in 0.7.1 and 0.7.0? > > Thanks, > moon > > On Wed, Apr 19, 2017 at 5:30 AM Meethu Mathew <meethu.mat...@flytxt.com> > wrote: > >> Hi, >> >> I just migrated from zeppelin 0.7.0 to zeppelin 0.7.1 and I am facing >> this error while creating an RDD(in pyspark). >> >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: >>> invalid start byte >> >> >> I was able to create the RDD without any error after adding >> use_unicode=False as follows >> >>> sc.textFile("file.csv",use_unicode=False) >> >> >> But it fails when I try to stem the text. I am getting similar error >> when trying to apply stemming to the text using python interpreter. >> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: >>> ordinal not in range(128) >> >> All these code is working in 0.7.0 version. There is no change in the >> dataset and code. Is there any change in the encoding type in the new >> version of zeppelin? >> >> Regards, >> >> >> Meethu Mathew >> >> > >
UnicodeDecodeError in zeppelin 0.7.1
Hi, I just migrated from zeppelin 0.7.0 to zeppelin 0.7.1 and I am facing this error while creating an RDD(in pyspark). UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: > invalid start byte I was able to create the RDD without any error after adding use_unicode=False as follows > sc.textFile("file.csv",use_unicode=False) But it fails when I try to stem the text. I am getting similar error when trying to apply stemming to the text using python interpreter. UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: > ordinal not in range(128) All these code is working in 0.7.0 version. There is no change in the dataset and code. Is there any change in the encoding type in the new version of zeppelin? Regards, Meethu Mathew
sqlContext not avilable as hiveContext in notebook
Hi, I am running zeppelin 0.7.0. the sqlContext already created in the zeppelin notebook returns a , even though my spark is built with HIVE. "zeppelin.spark.useHiveContext" in the spark properties is set to true. As mentioned in https://issues.apache.org/jira/browse/ZEPPELIN-1728, I tried hc = HiveContext.getOrCreate(sc) but still its returning . My pyspark shell and jupyter notebook is returning without doing anything. How to get in the zeppelin notebook ? Regards, Meethu Mathew
Separate interpreter running scope Per user or Per Note documentation
Hi, I couldnt find the documentation for the feature Separate interpreter running scope Per user or Per Note at https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html#interpreter-binding-mode . Can somebody help me in understanding the per note scoped mode and per user scoped mode? Regards, Meethu Mathew
Auto completion for defined variable names
Hi, Is there any way to get auto-completion or suggestions for the defined variable names? In Jupyter notebooks, once defined variables will show under suggestions. Ctrl+. is giving awkward suggestions for related functions also. For a spark data frame, it wont show the relevant functions. Please improve the suggestion functionality. Regards, Meethu Mathew
--files in SPARK_SUBMIT_OPTIONS not working - ZEPPELIN-2136
Hi, Acc to the zeppelin documentation, to pass a python package to zeppelin pyspark interpreter, you can export it through --files option in SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh. When I add a .egg file through the --files option in SPARK_SUBMIT_OPTIONS , zeppelin notebook is not throwing error, but I am not able to import the module inside the zeppelin notebook. Spark version is 1.6.2 and the zepplein-env.sh(version 0.7.0) file looks like: export SPARK_HOME=/home/me/spark-1.6.1-bin-hadoop2.6 export SPARK_SUBMIT_OPTIONS="--jars /home/me/spark-csv-1.5.0-s_2.10.jar,/home/me/commons-csv-1.4.jar --files /home/me/models/Churn/package/build/dist/fly_libs-1.1-py2.7.egg" Any progress in this ticket ZEPPELIN-2136 <https://issues.apache.org/jira/browse/ZEPPELIN-2136> ? Regards, Meethu Mathew
python prints "..." in the place of comments in output
Hi, The output of following code prints unexpected dots in the result if there is a comment in the code. Is it a bug with zeppelin? *Code :* %python v = [1,2,3] #comment 1 #comment print v *output* ... ... [1, 2, 3] Regards, Meethu Mathew
Re: "spark ui" button in spark interpreter does not show Spark web-ui
Hi, I have noticed the same problem Regards, Meethu Mathew On Mon, Mar 13, 2017 at 9:56 AM, Xiaohui Liu <hero...@gmail.com> wrote: > Hi, > > We used 0.7.1-snapshot with our Mesos cluster, almost all our needed > features (ldap login, notebook acl control, livy/pyspark/rspark/scala, > etc.) work pretty well. > > But one thing does not work for us is the 'spark ui' button does not > response to user clicks. No errors in browser side. > > Anyone has met similar issues? Any suggestions about where I should check? > > Regards > Xiaohui >
Adding images in the %md interpreter
Hi all, I am trying to display images in the %md interpreter of zeppelin(version 0.7.0) notebook using the following code. * ![](model-files/sentiment_donut_viz.png)* But I am facing the following problems: 1. Not able to give a local path 2. I put the file inside the {zeppelin_home}/webapps/webapp and it worked. But the files or folders added in this folder which is the ZEPPELIN_WAR_TEMPDIR is deleted after a restart. How can I add images in the mark down interpreter without using other webservers? Regards, Meethu Mathew