Thanks Ted,

I am actually using the hadoop free version of spark
(spark-1.5.0-bin-without-hadoop) over hadoop 2.6.1, so could very well be
related indeed.

I have configured spark-env.sh with export
SPARK_DIST_CLASSPATH=$($HADOOP_PREFIX/bin/hadoop classpath), which is the
only version of hadoop on the system (2.6.1) and able to interface with
hdfs (on no secured zones)

Interestingly running this in the repl works fine

// Create a simple DataFrame, stored into a partition directory
val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
df1.write.parquet("/securedzone/test")


but if packaged as an app and ran in local or yarn client/cluster mode, it
fails with the error described.

I am not including any hadoop specific, so not sure where the difference in
DFSClient could come from.

[info] Loading project definition from
/Users/zoidberg/Documents/demo/x/trunk/src/jobs/project
[info] Set current project to root (in build
file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/)
[info] Updating
{file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}common...
[info] com.demo.project:root_2.10:0.2.3 [S]
[info] com.demo.project:common_2.10:0.2.3 [S]
[info]   +-joda-time:joda-time:2.8.2
[info]
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Updating
{file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}extract...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] com.demo.project:extract_2.10:0.2.3 [S]
[info]   +-com.demo.project:common_2.10:0.2.3 [S]
[info]   | +-joda-time:joda-time:2.8.2
[info]   |
[info]   +-com.databricks:spark-csv_2.10:1.3.0 [S]
[info]     +-com.univocity:univocity-parsers:1.5.1
[info]     +-org.apache.commons:commons-csv:1.1
[info]
[success] Total time: 9 s, completed 28-Apr-2016 10:40:25


I am assuming I do not need to rebuild spark to use it with hadoop 2.6.1
and that the spark with user provided hadoop would let me do that,


$HADOOP_PREFIX/bin/hadoop classpath expends to:

/usr/local/project/hadoop/conf:/usr/local/project/hadoop/share/hadoop/common/lib/*:/usr/local/project/hadoop/share/hadoop/common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/project/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar

Thanks


On Sun, Apr 24, 2016 at 2:20 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you check that the DFSClient Spark uses is the same version as on the
> server side ?
>
> The client and server (NameNode) negotiate a "crypto protocol version" -
> this is a forward-looking feature.
> Please note:
>
> bq. Client provided: []
>
> Meaning client didn't provide any supported crypto protocol version.
>
> Cheers
>
> On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave <pie...@lacave.me> wrote:
>
>> Hi
>>
>>
>> I am trying to use spark to write to a protected zone in hdfs, I am able to 
>> create and list file using the hdfs client but when writing via Spark I get 
>> this exception.
>>
>> I could not find any mention of CryptoProtocolVersion in the spark doc.
>>
>>
>> Any idea what could have gone wrong?
>>
>>
>> spark (1.5.0), hadoop (2.6.1)
>>
>>
>> Thanks
>>
>>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException):
>>  No crypto protocol versions provided by the client are supported. Client 
>> provided: [] NameNode supports: 
>> [CryptoProtocolVersion{description='Unknown', version=1, unknownValue=null}, 
>> CryptoProtocolVersion{description='Encryption zones', version=2, 
>> unknownValue=null}]
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
>>      at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579)
>>      at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
>>      at 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>      at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>>      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>>      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
>>
>>      at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>>      at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>>      at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>      at com.sun.proxy.$Proxy13.create(Unknown Source)
>>      at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:264)
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>      at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>      at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>      at java.lang.reflect.Method.invoke(Method.java:606)
>>      at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>>      at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>      at com.sun.proxy.$Proxy14.create(Unknown Source)
>>      at 
>> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1612)
>>      at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1488)
>>      at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1413)
>>      at 
>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:387)
>>      at 
>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:383)
>>      at 
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>      at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:383)
>>      at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:327)
>>      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
>>      at 
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
>>      at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
>>      at 
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104)
>>      at 
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
>>      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>      at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>>
>

Reply via email to