Upload data file for use in notebook
Hello - I am wondering if there is a simple way to allow a Zeppelin Notebook user to upload a data file to use in a Zeppelin Notebook? I can FTP a file onto the server and access it through a notebook, but this method wouldn't work for a user with only front-end access. What I am looking for is a way for a user to run a paragraph that would allow them to input a path to a file on their local system, and upload it to the server. I have looked into ways to do this through Python and Spark code, but with no access to a DISPLAY environment variable through the web browser, and no access to the user's local machine, I don't see a good way to do this. I have a solution that allows them to paste their data file contents into a text box in a paragraph and run the paragraph to save the text to a file on the server, but this will only work for very simple text files. The other solution I am looking into is adding an "Upload" button to the Notebook UI that would go up where the "Export this Note" and "Version Control" buttons are that would allow file upload through the browser, but this requires modifying the Zeppelin UI and back-end, which I would like to avoid. Thanks, Chris Krentz
Re: Spark job fails when zeppelin.spark.useNew is true
I add the jar by editing the Spark interpreter on the interpreters page and adding the path to the jar at the bottom. I am not familiar with the spark.jars method. Is there a guide for that somewhere? Could that cause the difference between spark.useNew being set to true versus false? On Thu, May 23, 2019 at 9:16 PM Jeff Zhang wrote: > >>> adding a Geomesa-Accumulo-Spark jar to the Spark interpreter. > > How do you add jar to spark interpreter ? It is encouraged to add jar via > spark.jars > > > Krentz 于2019年5月24日周五 上午4:53写道: > >> Hello - I am looking for insight into an issue I have been having with >> our Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark >> jar to the Spark interpreter. The notebook paragraphs run fine until we try >> to access the data, at which point we get an "Unread Block Data" error from >> the Spark process. However, this error only occurs when the interpreter >> setting "zeppelin.spark.useNew" is set to true. If this parameter is set to >> false, the paragraph works just fine. Here is a paragraph that fails: >> >> %sql >> select linktype,count(linktype) from linkageview group by linktype >> >> The error we get as a result is this: >> java.lang.IllegalStateException: unread block data >> at >> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> >> If I drill down and inspect the Spark job itself, I get an error saying >> "readObject can't find class >> org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full >> stack trace is attached. We dug into and opened up the __spark_conf and >> __spark_libs files associated with the Spark job (under >> /user/root/.sparkStaging/application_/ but they did not contain the >> jar file containing this method. However, it was not present in both the >> spark.useNew true version false version. >> >> Basically I am just trying to figure out why the spark.useNew option >> would cause the error to happen when it works fine turned off. We can move >> forward with it turned off for now, but I would like to get to the bottom >> of this issue in case there is something deeper going wrong. >> >> Thanks so much, >> Chris Krentz >> >> >> > > -- > Best Regards > > Jeff Zhang >
Spark job fails when zeppelin.spark.useNew is true
Hello - I am looking for insight into an issue I have been having with our Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark jar to the Spark interpreter. The notebook paragraphs run fine until we try to access the data, at which point we get an "Unread Block Data" error from the Spark process. However, this error only occurs when the interpreter setting "zeppelin.spark.useNew" is set to true. If this parameter is set to false, the paragraph works just fine. Here is a paragraph that fails: %sql select linktype,count(linktype) from linkageview group by linktype The error we get as a result is this: java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) If I drill down and inspect the Spark job itself, I get an error saying "readObject can't find class org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full stack trace is attached. We dug into and opened up the __spark_conf and __spark_libs files associated with the Spark job (under /user/root/.sparkStaging/application_/ but they did not contain the jar file containing this method. However, it was not present in both the spark.useNew true version false version. Basically I am just trying to figure out why the spark.useNew option would cause the error to happen when it works fine turned off. We can move forward with it turned off for now, but I would like to get to the bottom of this issue in case there is something deeper going wrong. Thanks so much, Chris Krentz 19/05/23 20:45:55 ERROR util.Utils: Exception encountered java.lang.RuntimeException: readObject can't find class org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit at org.apache.hadoop.io.ObjectWritable.loadClass(ObjectWritable.java:377) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:228) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:45) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply(SerializableWritable.scala:41) at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply(SerializableWritable.scala:41) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269) at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deseri
Spark Interpreter failing to start: NumberFormat exception
All - I am having an issue with a build I forked from master that is compiled as 0.9. We have another build running 0.8 that works just fine. The Spark interpreter is failing to start, and giving a NumberFormatException. It looks like when Zeppelin runs interpreter.sh, the RemoteInterpreterServer.java main method is pulling the IP address instead of the port number. Here is the command it tries running: INFO [2019-04-17 19:33:17,507] ({SchedulerFactory2} RemoteInterpreterManagedProcess.java[start]:136) - Run interpreter process *[/opt/zeppelin/bin/interpreter.sh, -d, /opt/zeppelin/interpreter/spark, -c, 11.3.64.129, -p, 38675, -r, :, -i, spark-shared_process, -l, /opt/zeppelin/local-repo/spark, -g, spark]* and here is the code from RemoteInterpreterServer.java starting at line 270: if (args.length > 0) { zeppelinServerHost = args[0]; *port = Integer.parseInt(args[1]);* interpreterGroupId = args[2]; if (args.length > 3) { portRange = args[3]; } } It gets a NumberFormatException because it tries to do Integer.parseInt() on an IP address, the second arg passed into the interpreter.sh Here is the error: Exception in thread "main" java.lang.NumberFormatException: For input string: "11.3.64.129" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:272) ... Why is RemoteInterpreterServer pulling out the wrong arg index? Or alternately, why is zeppelin attempting to run interpreter.sh with the wrong arguments? Has this issue been fixed somewhere that I missed? Am I on a bad snapshot? I believe I am up-to-date with Master. My personal code changes were focused on the front-end and realms so I haven't touched any of the code in zeppelin-zengine or zeppelin-interpreter. Any help figuring out why I am running into this is appreciated! Thanks, Chris Krentz