Hi All,
I am new in Zepplin and HDFS. I manage to install zeppelin and working fine
while loading data from local directory . But when same I am trying to load
from HDFS (install locally standalone mode).
here is my code :
val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
above is working fine.
but when trying form hdfs :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
not working and giving error :
java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetAdditionalDatanodeRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at
java.net.URLClassLoader.access$100(URLClassLoader.java:71)
complete code is :
val bankText = sc.textFile("hdfs://127.0.0.1:9000/demo/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>/
<http://127.0.0.1:9000/demo/csv/bank-full.csv>bank-full.csv
<http://127.0.0.1:9000/demo/csv/bank-full.csv>")
// val bankText = sc.textFile("/home/ranveer/Desktop/CSVs/bank-full.csv")
case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use bank.registerTempTable("bank") instead.
bank.toDF().registerTempTable("bank")
println(bankText.count())
my environment are :
spark version : 1.3.1 with hadoop 2.6
zeppelin : binary from apache 0.5
hadoop version : 2.6 binary from apache
java : 1.8
please help I am stuck here.
thanks
regards
Ranveer