Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException
Interesting. The phoenix dependency wasn't shown in the classpath of your previous email. On Thu, Apr 28, 2016 at 4:12 AM, pierre lacave wrote: > Narrowed down to some version incompatibility with Phoenix 4.7 , > > Including $SPARK_HOME/lib/phoenix-4.7.0-HBase-1.1-client-spark.jar to > extraClassPath and that trigger the issue above. > > I ll have a go at adding the individual dependencies as opposed to this > fat jar and see how it goes. > > Thanks > > > On Thu, Apr 28, 2016 at 10:52 AM, pierre lacave wrote: > >> Thanks Ted, >> >> I am actually using the hadoop free version of spark >> (spark-1.5.0-bin-without-hadoop) over hadoop 2.6.1, so could very well be >> related indeed. >> >> I have configured spark-env.sh with export >> SPARK_DIST_CLASSPATH=$($HADOOP_PREFIX/bin/hadoop classpath), which is the >> only version of hadoop on the system (2.6.1) and able to interface with >> hdfs (on no secured zones) >> >> Interestingly running this in the repl works fine >> >> // Create a simple DataFrame, stored into a partition directory >> val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double") >> df1.write.parquet("/securedzone/test") >> >> >> but if packaged as an app and ran in local or yarn client/cluster mode, >> it fails with the error described. >> >> I am not including any hadoop specific, so not sure where the difference >> in DFSClient could come from. >> >> [info] Loading project definition from >> /Users/zoidberg/Documents/demo/x/trunk/src/jobs/project >> [info] Set current project to root (in build >> file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/) >> [info] Updating >> {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}common... >> [info] com.demo.project:root_2.10:0.2.3 [S] >> [info] com.demo.project:common_2.10:0.2.3 [S] >> [info] +-joda-time:joda-time:2.8.2 >> [info] >> [info] Resolving org.fusesource.jansi#jansi;1.4 ... >> [info] Done updating. >> [info] Updating >> {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}extract... >> [info] Resolving org.fusesource.jansi#jansi;1.4 ... >> [info] Done updating. >> [info] com.demo.project:extract_2.10:0.2.3 [S] >> [info] +-com.demo.project:common_2.10:0.2.3 [S] >> [info] | +-joda-time:joda-time:2.8.2 >> [info] | >> [info] +-com.databricks:spark-csv_2.10:1.3.0 [S] >> [info] +-com.univocity:univocity-parsers:1.5.1 >> [info] +-org.apache.commons:commons-csv:1.1 >> [info] >> [success] Total time: 9 s, completed 28-Apr-2016 10:40:25 >> >> >> I am assuming I do not need to rebuild spark to use it with hadoop 2.6.1 >> and that the spark with user provided hadoop would let me do that, >> >> >> $HADOOP_PREFIX/bin/hadoop classpath expends to: >> >> >> /usr/local/project/hadoop/conf:/usr/local/project/hadoop/share/hadoop/common/lib/*:/usr/local/project/hadoop/share/hadoop/common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/project/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar >> >> Thanks >> >> >> On Sun, Apr 24, 2016 at 2:20 AM, Ted Yu wrote: >> >>> Can you check that the DFSClient Spark uses is the same version as on >>> the server side ? >>> >>> The client and server (NameNode) negotiate a "crypto protocol version" - >>> this is a forward-looking feature. >>> Please note: >>> >>> bq. Client provided: [] >>> >>> Meaning client didn't provide any supported crypto protocol version. >>> >>> Cheers >>> >>> On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave wrote: >>> Hi I am trying to use spark to write to a protected zone in hdfs, I am able to create and list file using the hdfs client but when writing via Spark I get this exception. I could not find any mention of CryptoProtocolVersion in the spark doc. Any idea what could have gone wrong? spark (1.5.0), hadoop (2.6.1) Thanks org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException): No crypto protocol versions provided by the client are supported. Client provided: [] NameNode supports: [CryptoProtocolVersion{description='Unknown', version=1, unknownValue=null}, CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579) at org.apache.hadoop.hdfs.protoco
Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException
Narrowed down to some version incompatibility with Phoenix 4.7 , Including $SPARK_HOME/lib/phoenix-4.7.0-HBase-1.1-client-spark.jar to extraClassPath and that trigger the issue above. I ll have a go at adding the individual dependencies as opposed to this fat jar and see how it goes. Thanks On Thu, Apr 28, 2016 at 10:52 AM, pierre lacave wrote: > Thanks Ted, > > I am actually using the hadoop free version of spark > (spark-1.5.0-bin-without-hadoop) over hadoop 2.6.1, so could very well be > related indeed. > > I have configured spark-env.sh with export > SPARK_DIST_CLASSPATH=$($HADOOP_PREFIX/bin/hadoop classpath), which is the > only version of hadoop on the system (2.6.1) and able to interface with > hdfs (on no secured zones) > > Interestingly running this in the repl works fine > > // Create a simple DataFrame, stored into a partition directory > val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double") > df1.write.parquet("/securedzone/test") > > > but if packaged as an app and ran in local or yarn client/cluster mode, it > fails with the error described. > > I am not including any hadoop specific, so not sure where the difference > in DFSClient could come from. > > [info] Loading project definition from > /Users/zoidberg/Documents/demo/x/trunk/src/jobs/project > [info] Set current project to root (in build > file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/) > [info] Updating > {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}common... > [info] com.demo.project:root_2.10:0.2.3 [S] > [info] com.demo.project:common_2.10:0.2.3 [S] > [info] +-joda-time:joda-time:2.8.2 > [info] > [info] Resolving org.fusesource.jansi#jansi;1.4 ... > [info] Done updating. > [info] Updating > {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}extract... > [info] Resolving org.fusesource.jansi#jansi;1.4 ... > [info] Done updating. > [info] com.demo.project:extract_2.10:0.2.3 [S] > [info] +-com.demo.project:common_2.10:0.2.3 [S] > [info] | +-joda-time:joda-time:2.8.2 > [info] | > [info] +-com.databricks:spark-csv_2.10:1.3.0 [S] > [info] +-com.univocity:univocity-parsers:1.5.1 > [info] +-org.apache.commons:commons-csv:1.1 > [info] > [success] Total time: 9 s, completed 28-Apr-2016 10:40:25 > > > I am assuming I do not need to rebuild spark to use it with hadoop 2.6.1 > and that the spark with user provided hadoop would let me do that, > > > $HADOOP_PREFIX/bin/hadoop classpath expends to: > > > /usr/local/project/hadoop/conf:/usr/local/project/hadoop/share/hadoop/common/lib/*:/usr/local/project/hadoop/share/hadoop/common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/project/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar > > Thanks > > > On Sun, Apr 24, 2016 at 2:20 AM, Ted Yu wrote: > >> Can you check that the DFSClient Spark uses is the same version as on >> the server side ? >> >> The client and server (NameNode) negotiate a "crypto protocol version" - >> this is a forward-looking feature. >> Please note: >> >> bq. Client provided: [] >> >> Meaning client didn't provide any supported crypto protocol version. >> >> Cheers >> >> On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave wrote: >> >>> Hi >>> >>> >>> I am trying to use spark to write to a protected zone in hdfs, I am able to >>> create and list file using the hdfs client but when writing via Spark I get >>> this exception. >>> >>> I could not find any mention of CryptoProtocolVersion in the spark doc. >>> >>> >>> Any idea what could have gone wrong? >>> >>> >>> spark (1.5.0), hadoop (2.6.1) >>> >>> >>> Thanks >>> >>> >>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException): >>> No crypto protocol versions provided by the client are supported. Client >>> provided: [] NameNode supports: >>> [CryptoProtocolVersion{description='Unknown', version=1, >>> unknownValue=null}, CryptoProtocolVersion{description='Encryption zones', >>> version=2, unknownValue=null}] >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394) >>> at >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>
Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException
Thanks Ted, I am actually using the hadoop free version of spark (spark-1.5.0-bin-without-hadoop) over hadoop 2.6.1, so could very well be related indeed. I have configured spark-env.sh with export SPARK_DIST_CLASSPATH=$($HADOOP_PREFIX/bin/hadoop classpath), which is the only version of hadoop on the system (2.6.1) and able to interface with hdfs (on no secured zones) Interestingly running this in the repl works fine // Create a simple DataFrame, stored into a partition directory val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double") df1.write.parquet("/securedzone/test") but if packaged as an app and ran in local or yarn client/cluster mode, it fails with the error described. I am not including any hadoop specific, so not sure where the difference in DFSClient could come from. [info] Loading project definition from /Users/zoidberg/Documents/demo/x/trunk/src/jobs/project [info] Set current project to root (in build file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/) [info] Updating {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}common... [info] com.demo.project:root_2.10:0.2.3 [S] [info] com.demo.project:common_2.10:0.2.3 [S] [info] +-joda-time:joda-time:2.8.2 [info] [info] Resolving org.fusesource.jansi#jansi;1.4 ... [info] Done updating. [info] Updating {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}extract... [info] Resolving org.fusesource.jansi#jansi;1.4 ... [info] Done updating. [info] com.demo.project:extract_2.10:0.2.3 [S] [info] +-com.demo.project:common_2.10:0.2.3 [S] [info] | +-joda-time:joda-time:2.8.2 [info] | [info] +-com.databricks:spark-csv_2.10:1.3.0 [S] [info] +-com.univocity:univocity-parsers:1.5.1 [info] +-org.apache.commons:commons-csv:1.1 [info] [success] Total time: 9 s, completed 28-Apr-2016 10:40:25 I am assuming I do not need to rebuild spark to use it with hadoop 2.6.1 and that the spark with user provided hadoop would let me do that, $HADOOP_PREFIX/bin/hadoop classpath expends to: /usr/local/project/hadoop/conf:/usr/local/project/hadoop/share/hadoop/common/lib/*:/usr/local/project/hadoop/share/hadoop/common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/project/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar Thanks On Sun, Apr 24, 2016 at 2:20 AM, Ted Yu wrote: > Can you check that the DFSClient Spark uses is the same version as on the > server side ? > > The client and server (NameNode) negotiate a "crypto protocol version" - > this is a forward-looking feature. > Please note: > > bq. Client provided: [] > > Meaning client didn't provide any supported crypto protocol version. > > Cheers > > On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave wrote: > >> Hi >> >> >> I am trying to use spark to write to a protected zone in hdfs, I am able to >> create and list file using the hdfs client but when writing via Spark I get >> this exception. >> >> I could not find any mention of CryptoProtocolVersion in the spark doc. >> >> >> Any idea what could have gone wrong? >> >> >> spark (1.5.0), hadoop (2.6.1) >> >> >> Thanks >> >> >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException): >> No crypto protocol versions provided by the client are supported. Client >> provided: [] NameNode supports: >> [CryptoProtocolVersion{description='Unknown', version=1, unknownValue=null}, >> CryptoProtocolVersion{description='Encryption zones', version=2, >> unknownValue=null}] >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579) >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394) >> at >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.do
Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException
Can you check that the DFSClient Spark uses is the same version as on the server side ? The client and server (NameNode) negotiate a "crypto protocol version" - this is a forward-looking feature. Please note: bq. Client provided: [] Meaning client didn't provide any supported crypto protocol version. Cheers On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave wrote: > Hi > > > I am trying to use spark to write to a protected zone in hdfs, I am able to > create and list file using the hdfs client but when writing via Spark I get > this exception. > > I could not find any mention of CryptoProtocolVersion in the spark doc. > > > Any idea what could have gone wrong? > > > spark (1.5.0), hadoop (2.6.1) > > > Thanks > > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException): > No crypto protocol versions provided by the client are supported. Client > provided: [] NameNode supports: [CryptoProtocolVersion{description='Unknown', > version=1, unknownValue=null}, CryptoProtocolVersion{description='Encryption > zones', version=2, unknownValue=null}] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034) > > at org.apache.hadoop.ipc.Client.call(Client.java:1411) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy13.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:264) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1612) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1488) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1413) > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:387) > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:383) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:383) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:327) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) > at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Exec