[jira] [Created] (ZEPPELIN-2740) Pyspark not working error thrown after installing Zeppelin

2017-07-06 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2740:


 Summary: Pyspark not working error thrown after installing Zeppelin
 Key: ZEPPELIN-2740
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2740
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


Hi,

I get this error message when running a simple script cell

%pyspark
x = 5

error: pyspark is not responding

Some logs output in command window are below if useful:

 ZeppelinServer
DEBUG [2017-07-06 11:16:21,207] ({Thread-39} 
InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter 
session key: shared_session, for note: 2CNACUTPT, user: anonymous, 
InterpreterSetting Name: spark
DEBUG [2017-07-06 11:16:21,209] ({Thread-41} 
InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter 
session key: shared_session, for note: 2CNACUTPT, user: anonymous, 
InterpreterSetting Name: spark
DEBUG [2017-07-06 11:16:21,230] ({pool-4-thread-1} 
AppendOutputRunner.java[run]:91) - Processing time for append-output took 0 
milliseconds
DEBUG [2017-07-06 11:16:21,231] ({pool-4-thread-1} 
AppendOutputRunner.java[run]:107) - Processing size for append-output is 725 
characters
DEBUG [2017-07-06 11:16:21,590] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,590] ({pool-2-thread-4} Logging.scala[logInfo]:54) - Starting job: 
count at :30
DEBUG [2017-07-06 11:16:21,603] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,603] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Got job 
0 (count at :30) with 8 output partitions
DEBUG [2017-07-06 11:16:21,603] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,603] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Final 
stage: ResultStage 0 (count at :30)
DEBUG [2017-07-06 11:16:21,604] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,604] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Parents 
of final stage: List()
DEBUG [2017-07-06 11:16:21,607] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,606] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Missing 
parents: List()
DEBUG [2017-07-06 11:16:21,610] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,610] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - 
Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at 
:27), which has no missing parents
DEBUG [2017-07-06 11:16:21,708] ({Thread-39} 
InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter 
session key: shared_session, for note: 2CNACUTPT, user: anonymous, 
InterpreterSetting Name: spark
DEBUG [2017-07-06 11:16:21,712] ({Thread-41} 
InterpreterSettingManager.java[getInterpreterSessionKey]:831) - Interpreter 
session key: shared_session, for note: 2CNACUTPT, user: anonymous, 
InterpreterSetting Name: spark
DEBUG [2017-07-06 11:16:21,735] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,735] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Block 
broadcast_0 stored as values in memory (estimated size 1216.0 B, free 408.9 MB)
DEBUG [2017-07-06 11:16:21,767] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,767] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Block 
broadcast_0_piece0 stored as bytes in memory (estimated size 879.0 B, free 
408.9 MB)
DEBUG [2017-07-06 11:16:21,770] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,770] ({dispatcher-event-loop-4} Logging.scala[logInfo]:54) - Added 
broadcast_0_piece0 in memory on 192.168.11.1:7299 (size: 879.0 B, free: 408.9 
MB)
DEBUG [2017-07-06 11:16:21,775] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,774] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Created 
broadcast 0 from broadcast at DAGScheduler.scala:996
DEBUG [2017-07-06 11:16:21,778] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,778] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - 
Submitting 8 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at 
parallelize at :27)
DEBUG [2017-07-06 11:16:21,779] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 
11:16:21,779] ({dag-scheduler-event-loop} Logging.scala[logInfo]:54) - Adding 
task set 0.0 with 8 tasks
DEBUG [2017-07-06 11:16:21,788] ({Exec Stream Pumper} 
RemoteInterpreterManagedProcess.java[processLine]:206) -  INFO [2017-07-06 

[jira] [Created] (ZEPPELIN-2738) Zeppelin Interpreter crashing

2017-07-06 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2738:


 Summary: Zeppelin Interpreter crashing
 Key: ZEPPELIN-2738
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2738
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


The Zeppelin interpreter crashes on windows when I try to save any edits.

Note that the first edit saves fine, but if I try to save another edit, 
zeppelin just hangs and requires a full restart. 

Basically, through the browser I can only make one edit per Zeppelin start.

This problem is on windows where I run Zeppelin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2737) Unable to run pyspark in Zeppelin 0.7.2

2017-07-06 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2737:


 Summary: Unable to run pyspark in Zeppelin 0.7.2
 Key: ZEPPELIN-2737
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2737
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


I have installed Zeppelin, and can run Scala spark code without any problems. 

however, if I try to add reference to python either by making a system 
environment variable, or by specifying the python path in Zeppelin Interpreter, 
not only does pyspark still not work, but also spark throws an error stating 
java null exception

Can anyone advise on how to setup apache zeppelin on windows, as I managed to 
get it working on another laptop, but failing on the second machine?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2678) Pyspark cell fails to execute, but normal spark code in scala executes fine

2017-06-22 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2678:


 Summary: Pyspark cell fails to execute, but normal spark code in 
scala executes fine
 Key: ZEPPELIN-2678
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2678
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


Hi,

I have installed zeppelin for windows and can now run cells with the default 
spark interpreter - so scala code.

However, when I try to execute a pyspark cell e.g.

%pyspark
x = 5

I get an error: 

"failed to start pyspark"

Any ideas on what is going wrong here? I can see %pyspark as an interpreter 
under Spark in the Interpreter page.

Do I need to set some environment variables? I had anaconda installed for 
running python but did not add any environmental path variables?

Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2677) Zeppelin on windows does throws error after installation and trying to run a cell

2017-06-22 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2677:


 Summary: Zeppelin on windows does throws error after installation 
and trying to run a cell
 Key: ZEPPELIN-2677
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2677
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


Hi,

I have installed zeppelin on windows 10 - Spark has already been installed and 
runs correctly. 

I start zeppelin on the command line, and then open the browser to 
localhost:8080. I can see the zeppelin browser home screen, the right corner 
shows green symbol, and I can create new notebooks. However, when I try to run 
a cell I see the following error:

org.apache.zeppelin.interpreter.InterpreterException: The filename, directory 
name, or volume label syntax is incorrect.


at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:265)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:430)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:111)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at 
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2644) Import IPython Or any Databricks supported format Notebooks into Zeppelin

2017-06-13 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2644:


 Summary: Import IPython Or any Databricks supported format 
Notebooks into Zeppelin
 Key: ZEPPELIN-2644
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2644
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


Can anyone help in importing IPython notebooks into Apache Zeppelin.

I also would like to be able to use any of the export formats from Databricks 
such as DBC archive, IPython notebook, HTML, or source file.

can zeppelin import any of these?

Thanks
Nassir



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-2608) Zeppelin not displaying d3 visualisation in cell

2017-06-01 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2608:


 Summary: Zeppelin not displaying d3 visualisation in cell
 Key: ZEPPELIN-2608
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2608
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


When I run the code below in a cell referencing a rawJson in scala the bubble 
graph is not displayed. Instead, either I see no display or the bubble graph 
appears being my cells:

%spark
print(s"""%html 



circle {
  fill: rgb(31, 119, 180);
  fill-opacity: 0.5;
  stroke: rgb(31, 119, 180);
  stroke-width: 1px;
}

.leaf circle {
  fill: #ff7f0e;
  fill-opacity: 1;
}

text {
  font: 14px sans-serif;
}



https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js";>


var json = {
 "name": "data",
 "children": [
  {
 "name": "topics",
 "children": [
  ${rawJson}
 ]
}
   ]
};

var r = 1500,
format = d3.format(",d"),
fill = d3.scale.category20c();

var bubble = d3.layout.pack()
.sort(null)
.size([r, r])
.padding(1.5);

var vis = d3.select("body").append("svg")
.attr("width", r)
.attr("height", r)
.attr("class", "bubble");

  
var node = vis.selectAll("g.node")
.data(bubble.nodes(classes(json))
.filter(function(d) { return !d.children; }))
.enter().append("g")
.attr("class", "node")
.attr("transform", function(d) { return "translate(" + d.x + "," + d.y + 
")"; })
color = d3.scale.category20();
  
  node.append("title")
  .text(function(d) { return d.className + ": " + format(d.value); });

  node.append("circle")
  .attr("r", function(d) { return d.r; })
  .style("fill", function(d) {return color(d.topicName);});

var text = node.append("text")
.attr("text-anchor", "middle")
.attr("dy", ".3em")
.text(function(d) { return d.className.substring(0, d.r / 3)});
  
  text.append("tspan")
  .attr("dy", "1.2em")
  .attr("x", 0)
  .text(function(d) {return Math.ceil(d.value * 1) /1; });

// Returns a flattened hierarchy containing all leaf nodes under the root.
function classes(root) {
  var classes = [];

  function recurse(term, node) {
if (node.children) node.children.forEach(function(child) { 
recurse(node.term, child); });
else classes.push({topicName: node.topicId, className: node.term, value: 
node.probability});
  }

  recurse(null, root);
  return {children: classes};
}



""")



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2398) comments in %pyspark mode do not highlight correctly as e.g. green

2017-04-12 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2398:


 Summary: comments in %pyspark mode do not highlight correctly as 
e.g. green
 Key: ZEPPELIN-2398
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2398
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


When in %pyspark mode within a cell comments in %pyspark mode do not highlight 
correctly as e.g. green



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2351) View Apache Spark UI when using Zeppelin on EMR cluster and port forwarding 4040

2017-04-04 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2351:


 Summary: View Apache Spark UI when using Zeppelin on EMR cluster 
and port forwarding 4040
 Key: ZEPPELIN-2351
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2351
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


Hi I am trying to view the Spark UI to get an idea of resource usage, 
debugging, etc. However, after creating an SSH tunnel and port forwarding 4040, 
I can view the ""Zeppelin Application UI" however, the pages do not load 
correctly and it appears as the CSS is not loaded - hence navigation is not 
working and the pages are difficult to understand.

I am basically trying to view the Spark UI where I am running spark with 
Zeppelin installation on EMR cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2254) Select certain fields (long text fields) from a table results in lots of text showing under the formatted table when using z.show()

2017-03-13 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2254:


 Summary: Select certain fields (long text fields) from a table 
results in lots of text showing under the formatted table when using z.show()
 Key: ZEPPELIN-2254
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2254
 Project: Zeppelin
  Issue Type: Bug
Reporter: Nassir


Please can you help resolve this issue. When I make a sql selection or display 
the contents of a dataframe, sometimes a lot of text shows under the nicely 
formatted tables. 

This makes it extremely difficult to work as the text takes up huge amounts of 
space in the cell. 

I'm not sure why this additional text shows. (I am working in pyspark)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-2143) Unable to upload JSON files greater than 4MB

2017-02-21 Thread Nassir (JIRA)
Nassir created ZEPPELIN-2143:


 Summary: Unable to upload JSON files greater than 4MB
 Key: ZEPPELIN-2143
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2143
 Project: Zeppelin
  Issue Type: Bug
 Environment: Apache Zeppelin running on Amazon EC2 Cluster
Reporter: Nassir
Priority: Blocker


I have managed to increase the upload limit on Apache Zeppelin from 1mb to 4mb 
following this issue: https://issues.apache.org/jira/browse/ZEPPELIN-1979

However, why is there such a small restriction on file upload size?

How can I increase the upload limit beyond 4mb?

Are there any upload restrictions from external URLs like from GitHub or S3?





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)