Re: PySpark with livy

2017-10-01 Thread Jeff Zhang
I see your test with livy via curl command, but seems you are submitting it
as batch. Could you do it via interactive livy session, this is what livy
interpreter of zeppelin does.


Mauro Schneider 于2017年10月1日周日 上午4:55写道:

> Hi Jeff
>
> Yes, the code work with PySpark Shell and the Spark Submit in the same
> server where is running the Zeppelin and Livy. And I did an another test, I
> executed the same code with cUrl using to Livy and work ok.
>
>
>
>
>
> Mauro Schneider
>
>
> On Fri, Sep 29, 2017 at 8:26 PM, Jeff Zhang  wrote:
>
>> It is more likely your spark configuration issue, could you run this code
>> in pyspark shell ?
>>
>>
>>
>> Mauro Schneider 于2017年9月29日周五 下午11:24写道:
>>
>>>
>>> Hi
>>>
>>> I'm trying execute PySpark code with Zeppelin and Livy but without
>>> success. With Scala and Livy work well but when I execute the code below I
>>> getting a Exception from Zeppelin.
>>>
>>> 
>>> %livy.pyspark
>>> txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
>>> counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda word:
>>> (word, 1)).reduceByKey(lambda a, b: a + b)
>>> counts.collect()
>>> 
>>>
>>> 
>>> Version:0.9 StartHTML:000168 EndHTML:024858
>>> StartFragment:000204 EndFragment:024822 SourceURL:
>>> http://dtbhad02p.bvs.corp:8080/#/notebook/2CSK9TFZM
>>> An error occurred while calling o47.textFile.
>>> : java.lang.UnsatisfiedLinkError:
>>> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
>>> at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
>>> at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
>>> at
>>> org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
>>> at org.apache.spark.io
>>> .SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:156)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
>>> at scala.Option.map(Option.scala:145)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:200)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85)
>>> at
>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>>> at
>>> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
>>> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1334)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1022)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1019)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
>>> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1019)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:840)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:838)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
>>> at org.apache.spark.SparkContext.textFile(SparkContext.scala:838)
>>> at
>>> org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:188)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>>> at py4j.Gateway.invoke(Gateway.java:259)
>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>> at py4j.GatewayConnection.run(GatewayConnection.java:209)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 
>>>
>>> I had too tested the code below with Curl and Livy and work correctly
>>>
>>> 
>>> import sys
>>> from pyspark import SparkContext
>>>
>>> if __name__ == "__main__":
>>> sc = SparkContext(appName="Hello Spark")
>>> txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
>>> counts = txtFile.flatMap(lambda line: line.split("
>>> ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
>>> counts.saveAsTextFile("test_wc_py")
>>> 
>>>
>>> 
>>> curl  -i --negotiate -u : -X POST --data '{"file":
>>> "/us

Re: PySpark with livy

2017-09-30 Thread Mauro Schneider
Hi Jeff

Yes, the code work with PySpark Shell and the Spark Submit in the same
server where is running the Zeppelin and Livy. And I did an another test, I
executed the same code with cUrl using to Livy and work ok.





Mauro Schneider


On Fri, Sep 29, 2017 at 8:26 PM, Jeff Zhang  wrote:

> It is more likely your spark configuration issue, could you run this code
> in pyspark shell ?
>
>
>
> Mauro Schneider 于2017年9月29日周五 下午11:24写道:
>
>>
>> Hi
>>
>> I'm trying execute PySpark code with Zeppelin and Livy but without
>> success. With Scala and Livy work well but when I execute the code below I
>> getting a Exception from Zeppelin.
>>
>> 
>> %livy.pyspark
>> txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
>> counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda word:
>> (word, 1)).reduceByKey(lambda a, b: a + b)
>> counts.collect()
>> 
>>
>> 
>> Version:0.9 StartHTML:000168 EndHTML:024858
>> StartFragment:000204 EndFragment:024822 SourceURL:
>> http://dtbhad02p.bvs.corp:8080/#/notebook/2CSK9TFZM
>> An error occurred while calling o47.textFile.
>> : java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.
>> maxCompressedLength(I)I
>> at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
>> at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
>> at org.xerial.snappy.SnappyOutputStream.(
>> SnappyOutputStream.java:79)
>> at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(
>> CompressionCodec.scala:156)
>> at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.
>> apply(TorrentBroadcast.scala:200)
>> at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.
>> apply(TorrentBroadcast.scala:200)
>> at scala.Option.map(Option.scala:145)
>> at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(
>> TorrentBroadcast.scala:200)
>> at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(
>> TorrentBroadcast.scala:102)
>> at org.apache.spark.broadcast.TorrentBroadcast.(
>> TorrentBroadcast.scala:85)
>> at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(
>> TorrentBroadcastFactory.scala:34)
>> at org.apache.spark.broadcast.BroadcastManager.newBroadcast(
>> BroadcastManager.scala:63)
>> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1334)
>> at org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(
>> SparkContext.scala:1022)
>> at org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(
>> SparkContext.scala:1019)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:150)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:111)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
>> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1019)
>> at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(
>> SparkContext.scala:840)
>> at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(
>> SparkContext.scala:838)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:150)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:111)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
>> at org.apache.spark.SparkContext.textFile(SparkContext.scala:838)
>> at org.apache.spark.api.java.JavaSparkContext.textFile(
>> JavaSparkContext.scala:188)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(
>> NativeMethodAccessorImpl.java:57)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>> at py4j.Gateway.invoke(Gateway.java:259)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:209)
>> at java.lang.Thread.run(Thread.java:745)
>> 
>>
>> I had too tested the code below with Curl and Livy and work correctly
>>
>> 
>> import sys
>> from pyspark import SparkContext
>>
>> if __name__ == "__main__":
>> sc = SparkContext(appName="Hello Spark")
>> txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
>> counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda
>> word: (word, 1)).reduceByKey(lambda a, b: a + b)
>> counts.saveAsTextFile("test_wc_py")
>> 
>>
>> 
>> curl  -i --negotiate -u : -X POST --data '{"file":
>> "/user/mulisses/test.py"}' -H "Content-Type: application/json"
>> dtbhad02p.bvs.corp:8998/batches
>> 
>>
>> Is anyone know how to solve it ? I had tested with Zeppelin 0.7.2 and
>> Zeppelin 0.7.3
>> Am I forgetting some configuration?
>>
>> Best regards,
>>
>> Mauro Schneider
>>
>>
>>


Re: PySpark with livy

2017-09-29 Thread Jeff Zhang
It is more likely your spark configuration issue, could you run this code
in pyspark shell ?



Mauro Schneider 于2017年9月29日周五 下午11:24写道:

>
> Hi
>
> I'm trying execute PySpark code with Zeppelin and Livy but without
> success. With Scala and Livy work well but when I execute the code below I
> getting a Exception from Zeppelin.
>
> 
> %livy.pyspark
> txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
> counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda word:
> (word, 1)).reduceByKey(lambda a, b: a + b)
> counts.collect()
> 
>
> 
> Version:0.9 StartHTML:000168 EndHTML:024858
> StartFragment:000204 EndFragment:024822 SourceURL:
> http://dtbhad02p.bvs.corp:8080/#/notebook/2CSK9TFZM
> An error occurred while calling o47.textFile.
> : java.lang.UnsatisfiedLinkError:
> org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
> at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
> at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
> at org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
> at
> org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:156)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:200)
> at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
> at
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85)
> at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1334)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1022)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1019)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1019)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:840)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:838)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
> at org.apache.spark.SparkContext.textFile(SparkContext.scala:838)
> at
> org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:188)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:209)
> at java.lang.Thread.run(Thread.java:745)
> 
>
> I had too tested the code below with Curl and Livy and work correctly
>
> 
> import sys
> from pyspark import SparkContext
>
> if __name__ == "__main__":
> sc = SparkContext(appName="Hello Spark")
> txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
> counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda
> word: (word, 1)).reduceByKey(lambda a, b: a + b)
> counts.saveAsTextFile("test_wc_py")
> 
>
> 
> curl  -i --negotiate -u : -X POST --data '{"file":
> "/user/mulisses/test.py"}' -H "Content-Type: application/json"
> dtbhad02p.bvs.corp:8998/batches
> 
>
> Is anyone know how to solve it ? I had tested with Zeppelin 0.7.2 and
> Zeppelin 0.7.3
> Am I forgetting some configuration?
>
> Best regards,
>
> Mauro Schneider
>
>
>


PySpark with livy

2017-09-29 Thread Mauro Schneider
Hi

I'm trying execute PySpark code with Zeppelin and Livy but without success.
With Scala and Livy work well but when I execute the code below I getting a
Exception from Zeppelin.


%livy.pyspark
txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda word:
(word, 1)).reduceByKey(lambda a, b: a + b)
counts.collect()



Version:0.9 StartHTML:000168 EndHTML:024858
StartFragment:000204 EndFragment:024822 SourceURL:
http://dtbhad02p.bvs.corp:8080/#/notebook/2CSK9TFZM
An error occurred while calling o47.textFile.
: java.lang.UnsatisfiedLinkError:
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
at org.xerial.snappy.SnappyOutputStream.(SnappyOutputStream.java:79)
at
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:156)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:200)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:200)
at
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
at
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85)
at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1334)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1022)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1019)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1019)
at
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:840)
at
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:838)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:722)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:838)
at
org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)


I had too tested the code below with Curl and Livy and work correctly


import sys
from pyspark import SparkContext

if __name__ == "__main__":
sc = SparkContext(appName="Hello Spark")
txtFile = sc.textFile ("/data/staging/zeppelin_test/data.txt")
counts = txtFile.flatMap(lambda line: line.split(" ")).map(lambda
word: (word, 1)).reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("test_wc_py")



curl  -i --negotiate -u : -X POST --data '{"file":
"/user/mulisses/test.py"}' -H "Content-Type: application/json"
dtbhad02p.bvs.corp:8998/batches


Is anyone know how to solve it ? I had tested with Zeppelin 0.7.2 and
Zeppelin 0.7.3
Am I forgetting some configuration?

Best regards,

Mauro Schneider