Upload data file for use in notebook

2019-07-15 Thread Krentz
Hello - I am wondering if there is a simple way to allow a Zeppelin
Notebook user to upload a data file to use in a Zeppelin Notebook? I can
FTP a file onto the server and access it through a notebook, but this
method wouldn't work for a user with only front-end access. What I am
looking for is a way for a user to run a paragraph that would allow them to
input a path to a file on their local system, and upload it to the server.
I have looked into ways to do this through Python and Spark code, but with
no access to a DISPLAY environment variable through the web browser, and no
access to the user's local machine, I don't see a good way to do this. I
have a solution that allows them to paste their data file contents into a
text box in a paragraph and run the paragraph to save the text to a file on
the server, but this will only work for very simple text files.

The other solution I am looking into is adding an "Upload" button to the
Notebook UI that would go up where the "Export this Note" and "Version
Control" buttons are that would allow file upload through the browser, but
this requires modifying the Zeppelin UI and back-end, which I would like to
avoid.

Thanks,
Chris Krentz


Re: Spark job fails when zeppelin.spark.useNew is true

2019-05-23 Thread Krentz
I add the jar by editing the Spark interpreter on the interpreters page and
adding the path to the jar at the bottom. I am not familiar with the
spark.jars method. Is there a guide for that somewhere? Could that cause
the difference between spark.useNew being set to true versus false?

On Thu, May 23, 2019 at 9:16 PM Jeff Zhang  wrote:

> >>> adding a Geomesa-Accumulo-Spark jar to the Spark interpreter.
>
> How do you add jar to spark interpreter ? It is encouraged to add jar via
> spark.jars
>
>
> Krentz  于2019年5月24日周五 上午4:53写道:
>
>> Hello - I am looking for insight into an issue I have been having with
>> our Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark
>> jar to the Spark interpreter. The notebook paragraphs run fine until we try
>> to access the data, at which point we get an "Unread Block Data" error from
>> the Spark process. However, this error only occurs when the interpreter
>> setting "zeppelin.spark.useNew" is set to true. If this parameter is set to
>> false, the paragraph works just fine. Here is a paragraph that fails:
>>
>> %sql
>> select linktype,count(linktype) from linkageview group by linktype
>>
>> The error we get as a result is this:
>> java.lang.IllegalStateException: unread block data
>> at
>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605)
>> at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
>> at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>> at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> If I drill down and inspect the Spark job itself, I get an error saying
>> "readObject can't find class
>> org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full
>> stack trace is attached. We dug into and opened up the __spark_conf and
>> __spark_libs files associated with the Spark job (under
>> /user/root/.sparkStaging/application_/ but they did not contain the
>> jar  file containing this method. However, it was not present in both the
>> spark.useNew true version false version.
>>
>> Basically I am just trying to figure out why the spark.useNew option
>> would cause the error to happen when it works fine turned off. We can move
>> forward with it turned off for now, but I would like to get to the bottom
>> of this issue in case there is something deeper going wrong.
>>
>> Thanks so much,
>> Chris Krentz
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>


Spark job fails when zeppelin.spark.useNew is true

2019-05-23 Thread Krentz
Hello - I am looking for insight into an issue I have been having with our
Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark jar to
the Spark interpreter. The notebook paragraphs run fine until we try to
access the data, at which point we get an "Unread Block Data" error from
the Spark process. However, this error only occurs when the interpreter
setting "zeppelin.spark.useNew" is set to true. If this parameter is set to
false, the paragraph works just fine. Here is a paragraph that fails:

%sql
select linktype,count(linktype) from linkageview group by linktype

The error we get as a result is this:
java.lang.IllegalStateException: unread block data
at
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


If I drill down and inspect the Spark job itself, I get an error saying
"readObject can't find class
org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full
stack trace is attached. We dug into and opened up the __spark_conf and
__spark_libs files associated with the Spark job (under
/user/root/.sparkStaging/application_/ but they did not contain the
jar  file containing this method. However, it was not present in both the
spark.useNew true version false version.

Basically I am just trying to figure out why the spark.useNew option would
cause the error to happen when it works fine turned off. We can move
forward with it turned off for now, but I would like to get to the bottom
of this issue in case there is something deeper going wrong.

Thanks so much,
Chris Krentz
19/05/23 20:45:55 ERROR util.Utils: Exception encountered
java.lang.RuntimeException: readObject can't find class 
org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit
at 
org.apache.hadoop.io.ObjectWritable.loadClass(ObjectWritable.java:377)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:228)
at 
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
at 
org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:45)
at 
org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply(SerializableWritable.scala:41)
at 
org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply(SerializableWritable.scala:41)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269)
at 
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at 
org.apache.spark.serializer.JavaSerializerInstance.deseri

Spark Interpreter failing to start: NumberFormat exception

2019-04-18 Thread Krentz
All -


I am having an issue with a build I forked from master that is compiled as
0.9. We have another build running 0.8 that works just fine. The Spark
interpreter is failing to start, and giving a NumberFormatException. It
looks like when Zeppelin runs interpreter.sh, the
RemoteInterpreterServer.java main method is pulling the IP address instead
of the port number.


Here is the command it tries running:


INFO [2019-04-17 19:33:17,507] ({SchedulerFactory2}
RemoteInterpreterManagedProcess.java[start]:136) - Run interpreter
process *[/opt/zeppelin/bin/interpreter.sh,
-d, /opt/zeppelin/interpreter/spark, -c, 11.3.64.129, -p, 38675, -r, :, -i,
spark-shared_process, -l, /opt/zeppelin/local-repo/spark, -g, spark]*


and here is the code from RemoteInterpreterServer.java starting at line 270:

if (args.length > 0) {

  zeppelinServerHost = args[0];

  *port = Integer.parseInt(args[1]);*

  interpreterGroupId = args[2];

  if (args.length > 3) {

portRange = args[3];

  }

}

It gets a NumberFormatException because it tries to do Integer.parseInt()
on an IP address, the second arg passed into the interpreter.sh


Here is the error:


Exception in thread "main" java.lang.NumberFormatException: For input
string: "11.3.64.129"

at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

at java.lang.Integer.parseInt(Integer.java:580)

at java.lang.Integer.parseInt(Integer.java:615)

at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:272)

...


Why is RemoteInterpreterServer pulling out the wrong arg index? Or
alternately, why is zeppelin attempting to run interpreter.sh with the
wrong arguments? Has this issue been fixed somewhere that I missed? Am I on
a bad snapshot? I believe I am up-to-date with Master. My personal code
changes were focused on the front-end and realms so I haven't touched any
of the code in zeppelin-zengine or zeppelin-interpreter. Any help figuring
out why I am running into this is appreciated!


Thanks,

Chris Krentz