Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException

2016-04-28 Thread Ted Yu
Interesting.

The phoenix dependency wasn't shown in the classpath of your previous email.

On Thu, Apr 28, 2016 at 4:12 AM, pierre lacave  wrote:

> Narrowed down to some version incompatibility with Phoenix 4.7 ,
>
> Including $SPARK_HOME/lib/phoenix-4.7.0-HBase-1.1-client-spark.jar to
> extraClassPath and that trigger the issue above.
>
> I ll have a go at adding the individual dependencies as opposed to this
> fat jar and see how it goes.
>
> Thanks
>
>
> On Thu, Apr 28, 2016 at 10:52 AM, pierre lacave  wrote:
>
>> Thanks Ted,
>>
>> I am actually using the hadoop free version of spark
>> (spark-1.5.0-bin-without-hadoop) over hadoop 2.6.1, so could very well be
>> related indeed.
>>
>> I have configured spark-env.sh with export
>> SPARK_DIST_CLASSPATH=$($HADOOP_PREFIX/bin/hadoop classpath), which is the
>> only version of hadoop on the system (2.6.1) and able to interface with
>> hdfs (on no secured zones)
>>
>> Interestingly running this in the repl works fine
>>
>> // Create a simple DataFrame, stored into a partition directory
>> val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
>> df1.write.parquet("/securedzone/test")
>>
>>
>> but if packaged as an app and ran in local or yarn client/cluster mode,
>> it fails with the error described.
>>
>> I am not including any hadoop specific, so not sure where the difference
>> in DFSClient could come from.
>>
>> [info] Loading project definition from
>> /Users/zoidberg/Documents/demo/x/trunk/src/jobs/project
>> [info] Set current project to root (in build
>> file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/)
>> [info] Updating
>> {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}common...
>> [info] com.demo.project:root_2.10:0.2.3 [S]
>> [info] com.demo.project:common_2.10:0.2.3 [S]
>> [info]   +-joda-time:joda-time:2.8.2
>> [info]
>> [info] Resolving org.fusesource.jansi#jansi;1.4 ...
>> [info] Done updating.
>> [info] Updating
>> {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}extract...
>> [info] Resolving org.fusesource.jansi#jansi;1.4 ...
>> [info] Done updating.
>> [info] com.demo.project:extract_2.10:0.2.3 [S]
>> [info]   +-com.demo.project:common_2.10:0.2.3 [S]
>> [info]   | +-joda-time:joda-time:2.8.2
>> [info]   |
>> [info]   +-com.databricks:spark-csv_2.10:1.3.0 [S]
>> [info] +-com.univocity:univocity-parsers:1.5.1
>> [info] +-org.apache.commons:commons-csv:1.1
>> [info]
>> [success] Total time: 9 s, completed 28-Apr-2016 10:40:25
>>
>>
>> I am assuming I do not need to rebuild spark to use it with hadoop 2.6.1
>> and that the spark with user provided hadoop would let me do that,
>>
>>
>> $HADOOP_PREFIX/bin/hadoop classpath expends to:
>>
>>
>> /usr/local/project/hadoop/conf:/usr/local/project/hadoop/share/hadoop/common/lib/*:/usr/local/project/hadoop/share/hadoop/common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/project/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar
>>
>> Thanks
>>
>>
>> On Sun, Apr 24, 2016 at 2:20 AM, Ted Yu  wrote:
>>
>>> Can you check that the DFSClient Spark uses is the same version as on
>>> the server side ?
>>>
>>> The client and server (NameNode) negotiate a "crypto protocol version" -
>>> this is a forward-looking feature.
>>> Please note:
>>>
>>> bq. Client provided: []
>>>
>>> Meaning client didn't provide any supported crypto protocol version.
>>>
>>> Cheers
>>>
>>> On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave  wrote:
>>>
 Hi


 I am trying to use spark to write to a protected zone in hdfs, I am able 
 to create and list file using the hdfs client but when writing via Spark I 
 get this exception.

 I could not find any mention of CryptoProtocolVersion in the spark doc.


 Any idea what could have gone wrong?


 spark (1.5.0), hadoop (2.6.1)


 Thanks


 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException):
  No crypto protocol versions provided by the client are supported. Client 
 provided: [] NameNode supports: 
 [CryptoProtocolVersion{description='Unknown', version=1, 
 unknownValue=null}, CryptoProtocolVersion{description='Encryption zones', 
 version=2, unknownValue=null}]
at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468)
at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600)
at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579)
at 
 org.apache.hadoop.hdfs.protoco

Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException

2016-04-28 Thread pierre lacave
Narrowed down to some version incompatibility with Phoenix 4.7 ,

Including $SPARK_HOME/lib/phoenix-4.7.0-HBase-1.1-client-spark.jar to
extraClassPath and that trigger the issue above.

I ll have a go at adding the individual dependencies as opposed to this fat
jar and see how it goes.

Thanks


On Thu, Apr 28, 2016 at 10:52 AM, pierre lacave  wrote:

> Thanks Ted,
>
> I am actually using the hadoop free version of spark
> (spark-1.5.0-bin-without-hadoop) over hadoop 2.6.1, so could very well be
> related indeed.
>
> I have configured spark-env.sh with export
> SPARK_DIST_CLASSPATH=$($HADOOP_PREFIX/bin/hadoop classpath), which is the
> only version of hadoop on the system (2.6.1) and able to interface with
> hdfs (on no secured zones)
>
> Interestingly running this in the repl works fine
>
> // Create a simple DataFrame, stored into a partition directory
> val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
> df1.write.parquet("/securedzone/test")
>
>
> but if packaged as an app and ran in local or yarn client/cluster mode, it
> fails with the error described.
>
> I am not including any hadoop specific, so not sure where the difference
> in DFSClient could come from.
>
> [info] Loading project definition from
> /Users/zoidberg/Documents/demo/x/trunk/src/jobs/project
> [info] Set current project to root (in build
> file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/)
> [info] Updating
> {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}common...
> [info] com.demo.project:root_2.10:0.2.3 [S]
> [info] com.demo.project:common_2.10:0.2.3 [S]
> [info]   +-joda-time:joda-time:2.8.2
> [info]
> [info] Resolving org.fusesource.jansi#jansi;1.4 ...
> [info] Done updating.
> [info] Updating
> {file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}extract...
> [info] Resolving org.fusesource.jansi#jansi;1.4 ...
> [info] Done updating.
> [info] com.demo.project:extract_2.10:0.2.3 [S]
> [info]   +-com.demo.project:common_2.10:0.2.3 [S]
> [info]   | +-joda-time:joda-time:2.8.2
> [info]   |
> [info]   +-com.databricks:spark-csv_2.10:1.3.0 [S]
> [info] +-com.univocity:univocity-parsers:1.5.1
> [info] +-org.apache.commons:commons-csv:1.1
> [info]
> [success] Total time: 9 s, completed 28-Apr-2016 10:40:25
>
>
> I am assuming I do not need to rebuild spark to use it with hadoop 2.6.1
> and that the spark with user provided hadoop would let me do that,
>
>
> $HADOOP_PREFIX/bin/hadoop classpath expends to:
>
>
> /usr/local/project/hadoop/conf:/usr/local/project/hadoop/share/hadoop/common/lib/*:/usr/local/project/hadoop/share/hadoop/common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/project/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar
>
> Thanks
>
>
> On Sun, Apr 24, 2016 at 2:20 AM, Ted Yu  wrote:
>
>> Can you check that the DFSClient Spark uses is the same version as on
>> the server side ?
>>
>> The client and server (NameNode) negotiate a "crypto protocol version" -
>> this is a forward-looking feature.
>> Please note:
>>
>> bq. Client provided: []
>>
>> Meaning client didn't provide any supported crypto protocol version.
>>
>> Cheers
>>
>> On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave  wrote:
>>
>>> Hi
>>>
>>>
>>> I am trying to use spark to write to a protected zone in hdfs, I am able to 
>>> create and list file using the hdfs client but when writing via Spark I get 
>>> this exception.
>>>
>>> I could not find any mention of CryptoProtocolVersion in the spark doc.
>>>
>>>
>>> Any idea what could have gone wrong?
>>>
>>>
>>> spark (1.5.0), hadoop (2.6.1)
>>>
>>>
>>> Thanks
>>>
>>>
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException):
>>>  No crypto protocol versions provided by the client are supported. Client 
>>> provided: [] NameNode supports: 
>>> [CryptoProtocolVersion{description='Unknown', version=1, 
>>> unknownValue=null}, CryptoProtocolVersion{description='Encryption zones', 
>>> version=2, unknownValue=null}]
>>> at 
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468)
>>> at 
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600)
>>> at 
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
>>> at 
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579)
>>> at 
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
>>> at 
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>  

Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException

2016-04-28 Thread pierre lacave
Thanks Ted,

I am actually using the hadoop free version of spark
(spark-1.5.0-bin-without-hadoop) over hadoop 2.6.1, so could very well be
related indeed.

I have configured spark-env.sh with export
SPARK_DIST_CLASSPATH=$($HADOOP_PREFIX/bin/hadoop classpath), which is the
only version of hadoop on the system (2.6.1) and able to interface with
hdfs (on no secured zones)

Interestingly running this in the repl works fine

// Create a simple DataFrame, stored into a partition directory
val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
df1.write.parquet("/securedzone/test")


but if packaged as an app and ran in local or yarn client/cluster mode, it
fails with the error described.

I am not including any hadoop specific, so not sure where the difference in
DFSClient could come from.

[info] Loading project definition from
/Users/zoidberg/Documents/demo/x/trunk/src/jobs/project
[info] Set current project to root (in build
file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/)
[info] Updating
{file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}common...
[info] com.demo.project:root_2.10:0.2.3 [S]
[info] com.demo.project:common_2.10:0.2.3 [S]
[info]   +-joda-time:joda-time:2.8.2
[info]
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Updating
{file:/Users/zoidberg/Documents/demo/x/trunk/src/jobs/}extract...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] com.demo.project:extract_2.10:0.2.3 [S]
[info]   +-com.demo.project:common_2.10:0.2.3 [S]
[info]   | +-joda-time:joda-time:2.8.2
[info]   |
[info]   +-com.databricks:spark-csv_2.10:1.3.0 [S]
[info] +-com.univocity:univocity-parsers:1.5.1
[info] +-org.apache.commons:commons-csv:1.1
[info]
[success] Total time: 9 s, completed 28-Apr-2016 10:40:25


I am assuming I do not need to rebuild spark to use it with hadoop 2.6.1
and that the spark with user provided hadoop would let me do that,


$HADOOP_PREFIX/bin/hadoop classpath expends to:

/usr/local/project/hadoop/conf:/usr/local/project/hadoop/share/hadoop/common/lib/*:/usr/local/project/hadoop/share/hadoop/common/*:/usr/local/project/hadoop/share/hadoop/hdfs:/usr/local/project/hadoop/share/hadoop/hdfs/lib/*:/usr/local/project/hadoop/share/hadoop/hdfs/*:/usr/local/project/hadoop/share/hadoop/yarn/lib/*:/usr/local/project/hadoop/share/hadoop/yarn/*:/usr/local/project/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/project/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar

Thanks


On Sun, Apr 24, 2016 at 2:20 AM, Ted Yu  wrote:

> Can you check that the DFSClient Spark uses is the same version as on the
> server side ?
>
> The client and server (NameNode) negotiate a "crypto protocol version" -
> this is a forward-looking feature.
> Please note:
>
> bq. Client provided: []
>
> Meaning client didn't provide any supported crypto protocol version.
>
> Cheers
>
> On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave  wrote:
>
>> Hi
>>
>>
>> I am trying to use spark to write to a protected zone in hdfs, I am able to 
>> create and list file using the hdfs client but when writing via Spark I get 
>> this exception.
>>
>> I could not find any mention of CryptoProtocolVersion in the spark doc.
>>
>>
>> Any idea what could have gone wrong?
>>
>>
>> spark (1.5.0), hadoop (2.6.1)
>>
>>
>> Thanks
>>
>>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException):
>>  No crypto protocol versions provided by the client are supported. Client 
>> provided: [] NameNode supports: 
>> [CryptoProtocolVersion{description='Unknown', version=1, unknownValue=null}, 
>> CryptoProtocolVersion{description='Encryption zones', version=2, 
>> unknownValue=null}]
>>  at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468)
>>  at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600)
>>  at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
>>  at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579)
>>  at 
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
>>  at 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>  at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.do

Re: Spark writing to secure zone throws : UnknownCryptoProtocolVersionException

2016-04-23 Thread Ted Yu
Can you check that the DFSClient Spark uses is the same version as on the
server side ?

The client and server (NameNode) negotiate a "crypto protocol version" -
this is a forward-looking feature.
Please note:

bq. Client provided: []

Meaning client didn't provide any supported crypto protocol version.

Cheers

On Wed, Apr 20, 2016 at 3:27 AM, pierre lacave  wrote:

> Hi
>
>
> I am trying to use spark to write to a protected zone in hdfs, I am able to 
> create and list file using the hdfs client but when writing via Spark I get 
> this exception.
>
> I could not find any mention of CryptoProtocolVersion in the spark doc.
>
>
> Any idea what could have gone wrong?
>
>
> spark (1.5.0), hadoop (2.6.1)
>
>
> Thanks
>
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException):
>  No crypto protocol versions provided by the client are supported. Client 
> provided: [] NameNode supports: [CryptoProtocolVersion{description='Unknown', 
> version=1, unknownValue=null}, CryptoProtocolVersion{description='Encryption 
> zones', version=2, unknownValue=null}]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2468)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2600)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2520)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:579)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
>
>   at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy13.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:264)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy14.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1612)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1488)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1413)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:387)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:383)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:383)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:327)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
>   at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Exec