[jira] [Created] (ZEPPELIN-2740) Pyspark not working error thrown after installing Zeppelin
Nassir created ZEPPELIN-2740: Summary: Pyspark not working error thrown after installing Zeppelin Key: ZEPPELIN-2740 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2740 Project: Zeppelin Issue Type: Bug Reporter: Nassir Hi, I get this error message when running a simple script cell %pyspark x = 5 error: pyspark is not responding Some logs output in command window are below if useful: ZeppelinServer DEBUG [2017-07-06 11:16:21,207] ({Thread-39} InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter session key: shared_session, for note: 2CNACUTPT, user: anonymous, InterpreterSetting Name: spark DEBUG [2017-07-06 11:16:21,209] ({Thread-41} InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter session key: shared_session, for note: 2CNACUTPT, user: anonymous, InterpreterSetting Name: spark DEBUG [2017-07-06 11:16:21,230] ({pool-4-thread-1} AppendOutputRunner.java[run]:91) - Processing time for append-output took 0 milliseconds DEBUG [2017-07-06 11:16:21,231] ({pool-4-thread-1} AppendOutputRunner.java[run]:107) - Processing size for append-output is 725 characters DEBUG [2017-07-06 11:16:21,590] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,590] ({pool-2-thread-4} Logging.scala[logInfo]:54) - Starting job: count at :30 DEBUG [2017-07-06 11:16:21,603] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,603] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Got job 0 (count at :30) with 8 output partitions DEBUG [2017-07-06 11:16:21,603] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,603] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Final stage: ResultStage 0 (count at :30) DEBUG [2017-07-06 11:16:21,604] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,604] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Parents of final stage: List() DEBUG [2017-07-06 11:16:21,607] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,606] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Missing parents: List() DEBUG [2017-07-06 11:16:21,610] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,610] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :27), which has no missing parents DEBUG [2017-07-06 11:16:21,708] ({Thread-39} InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter session key: shared_session, for note: 2CNACUTPT, user: anonymous, InterpreterSetting Name: spark DEBUG [2017-07-06 11:16:21,712] ({Thread-41} InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter session key: shared_session, for note: 2CNACUTPT, user: anonymous, InterpreterSetting Name: spark DEBUG [2017-07-06 11:16:21,735] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,735] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Block broadcast_0 stored as values in memory (estimated size 1216.0 B, free 408.9 MB) DEBUG [2017-07-06 11:16:21,767] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,767] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Block broadcast_0_piece0 stored as bytes in memory (estimated size 879.0 B, free 408.9 MB) DEBUG [2017-07-06 11:16:21,770] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,770] ({dispatcher-event-loop-4} Logging.scala[logInfo]:54) - Added broadcast_0_piece0 in memory on 192.168.11.1:7299 (size: 879.0 B, free: 408.9 MB) DEBUG [2017-07-06 11:16:21,775] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,774] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Created broadcast 0 from broadcast at DAGScheduler.scala:996 DEBUG [2017-07-06 11:16:21,778] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,778] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Submitting 8 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :27) DEBUG [2017-07-06 11:16:21,779] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06 11:16:21,779] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Adding task set 0.0 with 8 tasks DEBUG [2017-07-06 11:16:21,788] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:206) - INFO [2017-07-06
[jira] [Created] (ZEPPELIN-2738) Zeppelin Interpreter crashing
Nassir created ZEPPELIN-2738: Summary: Zeppelin Interpreter crashing Key: ZEPPELIN-2738 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2738 Project: Zeppelin Issue Type: Bug Reporter: Nassir The Zeppelin interpreter crashes on windows when I try to save any edits. Note that the first edit saves fine, but if I try to save another edit, zeppelin just hangs and requires a full restart. Basically, through the browser I can only make one edit per Zeppelin start. This problem is on windows where I run Zeppelin. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZEPPELIN-2737) Unable to run pyspark in Zeppelin 0.7.2
Nassir created ZEPPELIN-2737: Summary: Unable to run pyspark in Zeppelin 0.7.2 Key: ZEPPELIN-2737 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2737 Project: Zeppelin Issue Type: Bug Reporter: Nassir I have installed Zeppelin, and can run Scala spark code without any problems. however, if I try to add reference to python either by making a system environment variable, or by specifying the python path in Zeppelin Interpreter, not only does pyspark still not work, but also spark throws an error stating java null exception Can anyone advise on how to setup apache zeppelin on windows, as I managed to get it working on another laptop, but failing on the second machine? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZEPPELIN-2678) Pyspark cell fails to execute, but normal spark code in scala executes fine
Nassir created ZEPPELIN-2678: Summary: Pyspark cell fails to execute, but normal spark code in scala executes fine Key: ZEPPELIN-2678 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2678 Project: Zeppelin Issue Type: Bug Reporter: Nassir Hi, I have installed zeppelin for windows and can now run cells with the default spark interpreter - so scala code. However, when I try to execute a pyspark cell e.g. %pyspark x = 5 I get an error: "failed to start pyspark" Any ideas on what is going wrong here? I can see %pyspark as an interpreter under Spark in the Interpreter page. Do I need to set some environment variables? I had anaconda installed for running python but did not add any environmental path variables? Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZEPPELIN-2677) Zeppelin on windows does throws error after installation and trying to run a cell
Nassir created ZEPPELIN-2677: Summary: Zeppelin on windows does throws error after installation and trying to run a cell Key: ZEPPELIN-2677 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2677 Project: Zeppelin Issue Type: Bug Reporter: Nassir Hi, I have installed zeppelin on windows 10 - Spark has already been installed and runs correctly. I start zeppelin on the command line, and then open the browser to localhost:8080. I can see the zeppelin browser home screen, the right corner shows green symbol, and I can create new notebooks. However, when I try to run a cell I see the following error: org.apache.zeppelin.interpreter.InterpreterException: The filename, directory name, or volume label syntax is incorrect. at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:265) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:430) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:111) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZEPPELIN-2644) Import IPython Or any Databricks supported format Notebooks into Zeppelin
Nassir created ZEPPELIN-2644: Summary: Import IPython Or any Databricks supported format Notebooks into Zeppelin Key: ZEPPELIN-2644 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2644 Project: Zeppelin Issue Type: Bug Reporter: Nassir Can anyone help in importing IPython notebooks into Apache Zeppelin. I also would like to be able to use any of the export formats from Databricks such as DBC archive, IPython notebook, HTML, or source file. can zeppelin import any of these? Thanks Nassir -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZEPPELIN-2608) Zeppelin not displaying d3 visualisation in cell
Nassir created ZEPPELIN-2608: Summary: Zeppelin not displaying d3 visualisation in cell Key: ZEPPELIN-2608 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2608 Project: Zeppelin Issue Type: Bug Reporter: Nassir When I run the code below in a cell referencing a rawJson in scala the bubble graph is not displayed. Instead, either I see no display or the bubble graph appears being my cells: %spark print(s"""%html circle { fill: rgb(31, 119, 180); fill-opacity: 0.5; stroke: rgb(31, 119, 180); stroke-width: 1px; } .leaf circle { fill: #ff7f0e; fill-opacity: 1; } text { font: 14px sans-serif; } https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js"> var json = { "name": "data", "children": [ { "name": "topics", "children": [ ${rawJson} ] } ] }; var r = 1500, format = d3.format(",d"), fill = d3.scale.category20c(); var bubble = d3.layout.pack() .sort(null) .size([r, r]) .padding(1.5); var vis = d3.select("body").append("svg") .attr("width", r) .attr("height", r) .attr("class", "bubble"); var node = vis.selectAll("g.node") .data(bubble.nodes(classes(json)) .filter(function(d) { return !d.children; })) .enter().append("g") .attr("class", "node") .attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; }) color = d3.scale.category20(); node.append("title") .text(function(d) { return d.className + ": " + format(d.value); }); node.append("circle") .attr("r", function(d) { return d.r; }) .style("fill", function(d) {return color(d.topicName);}); var text = node.append("text") .attr("text-anchor", "middle") .attr("dy", ".3em") .text(function(d) { return d.className.substring(0, d.r / 3)}); text.append("tspan") .attr("dy", "1.2em") .attr("x", 0) .text(function(d) {return Math.ceil(d.value * 1) /1; }); // Returns a flattened hierarchy containing all leaf nodes under the root. function classes(root) { var classes = []; function recurse(term, node) { if (node.children) node.children.forEach(function(child) { recurse(node.term, child); }); else classes.push({topicName: node.topicId, className: node.term, value: node.probability}); } recurse(null, root); return {children: classes}; } """) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZEPPELIN-2398) comments in %pyspark mode do not highlight correctly as e.g. green
Nassir created ZEPPELIN-2398: Summary: comments in %pyspark mode do not highlight correctly as e.g. green Key: ZEPPELIN-2398 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2398 Project: Zeppelin Issue Type: Bug Reporter: Nassir When in %pyspark mode within a cell comments in %pyspark mode do not highlight correctly as e.g. green -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZEPPELIN-2351) View Apache Spark UI when using Zeppelin on EMR cluster and port forwarding 4040
Nassir created ZEPPELIN-2351: Summary: View Apache Spark UI when using Zeppelin on EMR cluster and port forwarding 4040 Key: ZEPPELIN-2351 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2351 Project: Zeppelin Issue Type: Bug Reporter: Nassir Hi I am trying to view the Spark UI to get an idea of resource usage, debugging, etc. However, after creating an SSH tunnel and port forwarding 4040, I can view the ""Zeppelin Application UI" however, the pages do not load correctly and it appears as the CSS is not loaded - hence navigation is not working and the pages are difficult to understand. I am basically trying to view the Spark UI where I am running spark with Zeppelin installation on EMR cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZEPPELIN-2254) Select certain fields (long text fields) from a table results in lots of text showing under the formatted table when using z.show()
Nassir created ZEPPELIN-2254: Summary: Select certain fields (long text fields) from a table results in lots of text showing under the formatted table when using z.show() Key: ZEPPELIN-2254 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2254 Project: Zeppelin Issue Type: Bug Reporter: Nassir Please can you help resolve this issue. When I make a sql selection or display the contents of a dataframe, sometimes a lot of text shows under the nicely formatted tables. This makes it extremely difficult to work as the text takes up huge amounts of space in the cell. I'm not sure why this additional text shows. (I am working in pyspark) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZEPPELIN-2143) Unable to upload JSON files greater than 4MB
Nassir created ZEPPELIN-2143: Summary: Unable to upload JSON files greater than 4MB Key: ZEPPELIN-2143 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2143 Project: Zeppelin Issue Type: Bug Environment: Apache Zeppelin running on Amazon EC2 Cluster Reporter: Nassir Priority: Blocker I have managed to increase the upload limit on Apache Zeppelin from 1mb to 4mb following this issue: https://issues.apache.org/jira/browse/ZEPPELIN-1979 However, why is there such a small restriction on file upload size? How can I increase the upload limit beyond 4mb? Are there any upload restrictions from external URLs like from GitHub or S3? -- This message was sent by Atlassian JIRA (v6.3.15#6346)