Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Did you check that the security extensions are installed (JCE)?

KhajaAsmath Mohammed  schrieb am Mi., 22. Nov.
2017 um 19:36 Uhr:

> [image: Inline image 1]
>
> This is what we are on.
>
> On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com> wrote:
>
>> We use oracle JDK. we are on unix.
>>
>> On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler > > wrote:
>>
>>> Do you use oracle or open jdk? We recently had an issue with open jdk:
>>> formerly, java Security extensions were installed by default - no longer so
>>> on centos 7.3
>>>
>>> Are these installed?
>>>
>>> KhajaAsmath Mohammed  schrieb am Mi. 22. Nov.
>>> 2017 um 19:29:
>>>
 I passed keytab, renewal is enabled by running the script every eight
 hours. User gets renewed by the script every eight hours.

 On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler <
 georg.kf.hei...@gmail.com> wrote:

> Did you pass a keytab? Is renewal enabled in your kdc?
> KhajaAsmath Mohammed  schrieb am Mi. 22.
> Nov. 2017 um 19:25:
>
>> Hi,
>>
>> I have written spark stream job and job is running successfully for
>> more than 36 hours. After around 36 hours job gets failed with kerberos
>> issue. Any solution on how to resolve it.
>>
>> org.apache.spark.SparkException: Task failed while wri\
>>
>> ting rows.
>>
>> at
>> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:328)
>>
>> at
>> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
>>
>> at
>> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
>>
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>>
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.io.IOException: org.apache.hadoop.security.authentication.client.\
>>
>> AuthenticationException:
>> org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
>> (kms-dt
>> owner=va_dflt, renewer=yarn, re\
>>
>> alUser=, issueDate=1511262017635, maxDate=1511866817635,
>> sequenceNumber=1854601, masterKeyId=3392) is expired
>>
>> at org.apache.hadoop.hive.ql.io
>> .HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:248)
>>
>> at
>> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.newOutputWriter$1(hiveWriterContainers.scala:346)
>>
>> at
>> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:304)
>>
>> ... 8 more
>>
>> Caused by: java.io.IOException:
>> org.apache.hadoop.security.authentication.client.AuthenticationException:
>> org.apache.hadoop.securit\
>>
>> y.token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
>> renewer=yarn, realUser=, issueDate=1511262017635, maxDate=15118668\
>>
>> 17635, sequenceNumber=1854601, masterKeyId=3392) is expired
>>
>> at
>> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:216)
>>
>> at
>> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>>
>> at
>> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>>
>> at
>> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1542)
>>
>> at
>> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1527)
>>
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:428)
>>
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:421)
>>
>> at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>
>> at
>> 

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
[image: Inline image 1]

This is what we are on.

On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> We use oracle JDK. we are on unix.
>
> On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler 
> wrote:
>
>> Do you use oracle or open jdk? We recently had an issue with open jdk:
>> formerly, java Security extensions were installed by default - no longer so
>> on centos 7.3
>>
>> Are these installed?
>>
>> KhajaAsmath Mohammed  schrieb am Mi. 22. Nov.
>> 2017 um 19:29:
>>
>>> I passed keytab, renewal is enabled by running the script every eight
>>> hours. User gets renewed by the script every eight hours.
>>>
>>> On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler <
>>> georg.kf.hei...@gmail.com> wrote:
>>>
 Did you pass a keytab? Is renewal enabled in your kdc?
 KhajaAsmath Mohammed  schrieb am Mi. 22. Nov.
 2017 um 19:25:

> Hi,
>
> I have written spark stream job and job is running successfully for
> more than 36 hours. After around 36 hours job gets failed with kerberos
> issue. Any solution on how to resolve it.
>
> org.apache.spark.SparkException: Task failed while wri\
>
> ting rows.
>
> at org.apache.spark.sql.hive.Spar
> kHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterC
> ontainers.scala:328)
>
> at org.apache.spark.sql.hive.exec
> ution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply
> (InsertIntoHiveTable.scala:210)
>
> at org.apache.spark.sql.hive.exec
> ution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply
> (InsertIntoHiveTable.scala:210)
>
> at org.apache.spark.scheduler.Res
> ultTask.runTask(ResultTask.scala:87)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>
> at org.apache.spark.executor.Exec
> utor$TaskRunner.run(Executor.scala:322)
>
> at java.util.concurrent.ThreadPoo
> lExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at java.util.concurrent.ThreadPoo
> lExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: org.apache.hadoop.security.aut
> hentication.client.\
>
> AuthenticationException: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken:
> token (kms-dt owner=va_dflt, renewer=yarn, re\
>
> alUser=, issueDate=1511262017635, maxDate=1511866817635,
> sequenceNumber=1854601, masterKeyId=3392) is expired
>
> at org.apache.hadoop.hive.ql.io.H
> iveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:248)
>
> at org.apache.spark.sql.hive.Spar
> kHiveDynamicPartitionWriterContainer.newOutputWriter$1(hiveW
> riterContainers.scala:346)
>
> at org.apache.spark.sql.hive.Spar
> kHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterC
> ontainers.scala:304)
>
> ... 8 more
>
> Caused by: java.io.IOException: org.apache.hadoop.security.aut
> hentication.client.AuthenticationException: org.apache.hadoop.securit\
>
> y.token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
> renewer=yarn, realUser=, issueDate=1511262017635, maxDate=15118668\
>
> 17635, sequenceNumber=1854601, masterKeyId=3392) is expired
>
> at org.apache.hadoop.crypto.key.k
> ms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBa
> lancingKMSClientProvider.java:216)
>
> at org.apache.hadoop.crypto.key.K
> eyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCry
> ptoExtension.java:388)
>
> at org.apache.hadoop.hdfs.DFSClie
> nt.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>
> at org.apache.hadoop.hdfs.DFSClie
> nt.createWrappedOutputStream(DFSClient.java:1542)
>
> at org.apache.hadoop.hdfs.DFSClie
> nt.createWrappedOutputStream(DFSClient.java:1527)
>
> at org.apache.hadoop.hdfs.Distrib
> utedFileSystem$7.doCall(DistributedFileSystem.java:428)
>
> at org.apache.hadoop.hdfs.Distrib
> utedFileSystem$7.doCall(DistributedFileSystem.java:421)
>
> at org.apache.hadoop.fs.FileSyste
> mLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at org.apache.hadoop.hdfs.Distrib
> utedFileSystem.create(DistributedFileSystem.java:421)
>
> at org.apache.hadoop.hdfs.Distrib
> utedFileSystem.create(DistributedFileSystem.java:362)
>
> at 

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
We use oracle JDK. we are on unix.

On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler 
wrote:

> Do you use oracle or open jdk? We recently had an issue with open jdk:
> formerly, java Security extensions were installed by default - no longer so
> on centos 7.3
>
> Are these installed?
>
> KhajaAsmath Mohammed  schrieb am Mi. 22. Nov.
> 2017 um 19:29:
>
>> I passed keytab, renewal is enabled by running the script every eight
>> hours. User gets renewed by the script every eight hours.
>>
>> On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler > > wrote:
>>
>>> Did you pass a keytab? Is renewal enabled in your kdc?
>>> KhajaAsmath Mohammed  schrieb am Mi. 22. Nov.
>>> 2017 um 19:25:
>>>
 Hi,

 I have written spark stream job and job is running successfully for
 more than 36 hours. After around 36 hours job gets failed with kerberos
 issue. Any solution on how to resolve it.

 org.apache.spark.SparkException: Task failed while wri\

 ting rows.

 at org.apache.spark.sql.hive.
 SparkHiveDynamicPartitionWriterContainer.writeToFile(
 hiveWriterContainers.scala:328)

 at org.apache.spark.sql.hive.
 execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.
 apply(InsertIntoHiveTable.scala:210)

 at org.apache.spark.sql.hive.
 execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.
 apply(InsertIntoHiveTable.scala:210)

 at org.apache.spark.scheduler.
 ResultTask.runTask(ResultTask.scala:87)

 at org.apache.spark.scheduler.Task.run(Task.scala:99)

 at org.apache.spark.executor.Executor$TaskRunner.run(
 Executor.scala:322)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.io.IOException: org.apache.hadoop.security.authentication.client.\

 AuthenticationException: org.apache.hadoop.security.
 token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
 renewer=yarn, re\

 alUser=, issueDate=1511262017635, maxDate=1511866817635,
 sequenceNumber=1854601, masterKeyId=3392) is expired

 at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.
 getHiveRecordWriter(HiveFileFormatUtils.java:248)

 at org.apache.spark.sql.hive.
 SparkHiveDynamicPartitionWriterContainer.newOutputWriter$1(
 hiveWriterContainers.scala:346)

 at org.apache.spark.sql.hive.
 SparkHiveDynamicPartitionWriterContainer.writeToFile(
 hiveWriterContainers.scala:304)

 ... 8 more

 Caused by: java.io.IOException: org.apache.hadoop.security.
 authentication.client.AuthenticationException:
 org.apache.hadoop.securit\

 y.token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
 renewer=yarn, realUser=, issueDate=1511262017635, maxDate=15118668\

 17635, sequenceNumber=1854601, masterKeyId=3392) is expired

 at org.apache.hadoop.crypto.key.kms.
 LoadBalancingKMSClientProvider.decryptEncryptedKey(
 LoadBalancingKMSClientProvider.java:216)

 at org.apache.hadoop.crypto.key.
 KeyProviderCryptoExtension.decryptEncryptedKey(
 KeyProviderCryptoExtension.java:388)

 at org.apache.hadoop.hdfs.DFSClient.
 decryptEncryptedDataEncryptionKey(DFSClient.java:1440)

 at org.apache.hadoop.hdfs.DFSClient.
 createWrappedOutputStream(DFSClient.java:1542)

 at org.apache.hadoop.hdfs.DFSClient.
 createWrappedOutputStream(DFSClient.java:1527)

 at org.apache.hadoop.hdfs.DistributedFileSystem$7.
 doCall(DistributedFileSystem.java:428)

 at org.apache.hadoop.hdfs.DistributedFileSystem$7.
 doCall(DistributedFileSystem.java:421)

 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
 FileSystemLinkResolver.java:81)

 at org.apache.hadoop.hdfs.DistributedFileSystem.create(
 DistributedFileSystem.java:421)

 at org.apache.hadoop.hdfs.DistributedFileSystem.create(
 DistributedFileSystem.java:362)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.
 java:925)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.
 java:906)

 at parquet.hadoop.ParquetFileWriter.(
 ParquetFileWriter.java:220)

  

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Do you use oracle or open jdk? We recently had an issue with open jdk:
formerly, java Security extensions were installed by default - no longer so
on centos 7.3

Are these installed?
KhajaAsmath Mohammed  schrieb am Mi. 22. Nov. 2017
um 19:29:

> I passed keytab, renewal is enabled by running the script every eight
> hours. User gets renewed by the script every eight hours.
>
> On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler 
> wrote:
>
>> Did you pass a keytab? Is renewal enabled in your kdc?
>> KhajaAsmath Mohammed  schrieb am Mi. 22. Nov.
>> 2017 um 19:25:
>>
>>> Hi,
>>>
>>> I have written spark stream job and job is running successfully for more
>>> than 36 hours. After around 36 hours job gets failed with kerberos issue.
>>> Any solution on how to resolve it.
>>>
>>> org.apache.spark.SparkException: Task failed while wri\
>>>
>>> ting rows.
>>>
>>> at
>>> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:328)
>>>
>>> at
>>> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
>>>
>>> at
>>> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
>>>
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>>>
>>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>>>
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>>> java.io.IOException: org.apache.hadoop.security.authentication.client.\
>>>
>>> AuthenticationException:
>>> org.apache.hadoop.security.token.SecretManager$InvalidToken: token (kms-dt
>>> owner=va_dflt, renewer=yarn, re\
>>>
>>> alUser=, issueDate=1511262017635, maxDate=1511866817635,
>>> sequenceNumber=1854601, masterKeyId=3392) is expired
>>>
>>> at org.apache.hadoop.hive.ql.io
>>> .HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:248)
>>>
>>> at
>>> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.newOutputWriter$1(hiveWriterContainers.scala:346)
>>>
>>> at
>>> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:304)
>>>
>>> ... 8 more
>>>
>>> Caused by: java.io.IOException:
>>> org.apache.hadoop.security.authentication.client.AuthenticationException:
>>> org.apache.hadoop.securit\
>>>
>>> y.token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
>>> renewer=yarn, realUser=, issueDate=1511262017635, maxDate=15118668\
>>>
>>> 17635, sequenceNumber=1854601, masterKeyId=3392) is expired
>>>
>>> at
>>> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:216)
>>>
>>> at
>>> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>>>
>>> at
>>> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>>>
>>> at
>>> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1542)
>>>
>>> at
>>> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1527)
>>>
>>> at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:428)
>>>
>>> at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:421)
>>>
>>> at
>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>
>>> at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:421)
>>>
>>> at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:362)
>>>
>>> at
>>> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:925)
>>>
>>> at
>>> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>>
>>> at
>>> parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:220)
>>>
>>> at
>>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)
>>>
>>> at
>>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:287)
>>>
>>> at org.apache.hadoop.hive.ql.io
>>> 

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
I passed keytab, renewal is enabled by running the script every eight
hours. User gets renewed by the script every eight hours.

On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler 
wrote:

> Did you pass a keytab? Is renewal enabled in your kdc?
> KhajaAsmath Mohammed  schrieb am Mi. 22. Nov.
> 2017 um 19:25:
>
>> Hi,
>>
>> I have written spark stream job and job is running successfully for more
>> than 36 hours. After around 36 hours job gets failed with kerberos issue.
>> Any solution on how to resolve it.
>>
>> org.apache.spark.SparkException: Task failed while wri\
>>
>> ting rows.
>>
>> at org.apache.spark.sql.hive.
>> SparkHiveDynamicPartitionWriterContainer.writeToFile(
>> hiveWriterContainers.scala:328)
>>
>> at org.apache.spark.sql.hive.
>> execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.
>> apply(InsertIntoHiveTable.scala:210)
>>
>> at org.apache.spark.sql.hive.
>> execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.
>> apply(InsertIntoHiveTable.scala:210)
>>
>> at org.apache.spark.scheduler.
>> ResultTask.runTask(ResultTask.scala:87)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>>
>> at org.apache.spark.executor.Executor$TaskRunner.run(
>> Executor.scala:322)
>>
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.io.IOException: org.apache.hadoop.security.authentication.client.\
>>
>> AuthenticationException: 
>> org.apache.hadoop.security.token.SecretManager$InvalidToken:
>> token (kms-dt owner=va_dflt, renewer=yarn, re\
>>
>> alUser=, issueDate=1511262017635, maxDate=1511866817635,
>> sequenceNumber=1854601, masterKeyId=3392) is expired
>>
>> at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.
>> getHiveRecordWriter(HiveFileFormatUtils.java:248)
>>
>> at org.apache.spark.sql.hive.
>> SparkHiveDynamicPartitionWriterContainer.newOutputWriter$1(
>> hiveWriterContainers.scala:346)
>>
>> at org.apache.spark.sql.hive.
>> SparkHiveDynamicPartitionWriterContainer.writeToFile(
>> hiveWriterContainers.scala:304)
>>
>> ... 8 more
>>
>> Caused by: java.io.IOException: org.apache.hadoop.security.
>> authentication.client.AuthenticationException: org.apache.hadoop.securit\
>>
>> y.token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
>> renewer=yarn, realUser=, issueDate=1511262017635, maxDate=15118668\
>>
>> 17635, sequenceNumber=1854601, masterKeyId=3392) is expired
>>
>> at org.apache.hadoop.crypto.key.kms.
>> LoadBalancingKMSClientProvider.decryptEncryptedKey(
>> LoadBalancingKMSClientProvider.java:216)
>>
>> at org.apache.hadoop.crypto.key.
>> KeyProviderCryptoExtension.decryptEncryptedKey(
>> KeyProviderCryptoExtension.java:388)
>>
>> at org.apache.hadoop.hdfs.DFSClient.
>> decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>>
>> at org.apache.hadoop.hdfs.DFSClient.
>> createWrappedOutputStream(DFSClient.java:1542)
>>
>> at org.apache.hadoop.hdfs.DFSClient.
>> createWrappedOutputStream(DFSClient.java:1527)
>>
>> at org.apache.hadoop.hdfs.DistributedFileSystem$7.
>> doCall(DistributedFileSystem.java:428)
>>
>> at org.apache.hadoop.hdfs.DistributedFileSystem$7.
>> doCall(DistributedFileSystem.java:421)
>>
>> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
>> FileSystemLinkResolver.java:81)
>>
>> at org.apache.hadoop.hdfs.DistributedFileSystem.create(
>> DistributedFileSystem.java:421)
>>
>> at org.apache.hadoop.hdfs.DistributedFileSystem.create(
>> DistributedFileSystem.java:362)
>>
>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.
>> java:925)
>>
>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.
>> java:906)
>>
>> at parquet.hadoop.ParquetFileWriter.(
>> ParquetFileWriter.java:220)
>>
>> at parquet.hadoop.ParquetOutputFormat.getRecordWriter(
>> ParquetOutputFormat.java:311)
>>
>> at parquet.hadoop.ParquetOutputFormat.getRecordWriter(
>> ParquetOutputFormat.java:287)
>>
>> at org.apache.hadoop.hive.ql.io.parquet.write.
>> ParquetRecordWriterWrapper.(ParquetRecordWriterWrapper.java:65)
>>
>> at org.apache.hadoop.hive.ql.io.parquet.
>> MapredParquetOutputFormat.getParquerRecordWriterWrapper(
>> MapredParquetOutputFormat.java:125)
>>
>> at org.apache.hadoop.hive.ql.io.parquet.
>> MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.
>> 

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Did you pass a keytab? Is renewal enabled in your kdc?
KhajaAsmath Mohammed  schrieb am Mi. 22. Nov. 2017
um 19:25:

> Hi,
>
> I have written spark stream job and job is running successfully for more
> than 36 hours. After around 36 hours job gets failed with kerberos issue.
> Any solution on how to resolve it.
>
> org.apache.spark.SparkException: Task failed while wri\
>
> ting rows.
>
> at
> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:328)
>
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
>
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
>
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
>
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: org.apache.hadoop.security.authentication.client.\
>
> AuthenticationException:
> org.apache.hadoop.security.token.SecretManager$InvalidToken: token (kms-dt
> owner=va_dflt, renewer=yarn, re\
>
> alUser=, issueDate=1511262017635, maxDate=1511866817635,
> sequenceNumber=1854601, masterKeyId=3392) is expired
>
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:248)
>
> at
> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.newOutputWriter$1(hiveWriterContainers.scala:346)
>
> at
> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:304)
>
> ... 8 more
>
> Caused by: java.io.IOException:
> org.apache.hadoop.security.authentication.client.AuthenticationException:
> org.apache.hadoop.securit\
>
> y.token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
> renewer=yarn, realUser=, issueDate=1511262017635, maxDate=15118668\
>
> 17635, sequenceNumber=1854601, masterKeyId=3392) is expired
>
> at
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:216)
>
> at
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>
> at
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>
> at
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1542)
>
> at
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1527)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:428)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:421)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:421)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:362)
>
> at
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:925)
>
> at
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>
> at
> parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:220)
>
> at
> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)
>
> at
> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:287)
>
> at
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.(ParquetRecordWriterWrapper.java:65)
>
> at
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:125)
>
> at
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114)
>
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:260)
>
> at
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:245)
>
> ... 10 more
>
> Caused by:
> org.apache.hadoop.security.authentication.client.AuthenticationException:
> 

Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
Hi,

I have written spark stream job and job is running successfully for more
than 36 hours. After around 36 hours job gets failed with kerberos issue.
Any solution on how to resolve it.

org.apache.spark.SparkException: Task failed while wri\

ting rows.

at
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:328)

at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)

at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:99)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: org.apache.hadoop.security.authentication.client.\

AuthenticationException:
org.apache.hadoop.security.token.SecretManager$InvalidToken: token (kms-dt
owner=va_dflt, renewer=yarn, re\

alUser=, issueDate=1511262017635, maxDate=1511866817635,
sequenceNumber=1854601, masterKeyId=3392) is expired

at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:248)

at
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.newOutputWriter$1(hiveWriterContainers.scala:346)

at
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:304)

... 8 more

Caused by: java.io.IOException:
org.apache.hadoop.security.authentication.client.AuthenticationException:
org.apache.hadoop.securit\

y.token.SecretManager$InvalidToken: token (kms-dt owner=va_dflt,
renewer=yarn, realUser=, issueDate=1511262017635, maxDate=15118668\

17635, sequenceNumber=1854601, masterKeyId=3392) is expired

at
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:216)

at
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)

at
org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)

at
org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1542)

at
org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1527)

at
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:428)

at
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:421)

at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:421)

at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:362)

at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:925)

at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)

at
parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:220)

at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)

at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:287)

at
org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.(ParquetRecordWriterWrapper.java:65)

at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:125)

at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114)

at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:260)

at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:245)

... 10 more

Caused by:
org.apache.hadoop.security.authentication.client.AuthenticationException:
org.apache.hadoop.security.token.SecretManager\

$InvalidToken: token (kms-dt owner=va_dflt, renewer=yarn, realUser=,
issueDate=1511262017635, maxDate=1511866817635, sequenceNumber\

=1854601, masterKeyId=3392) is expired

at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at

Re: spark with kerberos

2016-10-19 Thread Steve Loughran

On 19 Oct 2016, at 00:18, Michael Segel 
> wrote:

(Sorry sent reply via wrong account.. )

Steve,

Kinda hijacking the thread, but I promise its still on topic to OP’s issue.. ;-)

Usually you will end up having a local Kerberos set up per cluster.
So your machine accounts (hive, yarn, hbase, etc …) are going to be local  to 
the cluster.


not necessarily...you can share a KDC. And in a land of active directory you'd 
need some trust



So you will have to set up some sort of realm trusts between the clusters.

If you’re going to be setting up security (Kerberos … ick! shivers… ;-) you’re 
going to want to keep the machine accounts isolated to the cluster.
And the OP said that he didn’t control the other cluster which makes me believe 
that they are separate.


good point; you may not be able to get the tickets for cluster C accounts. But 
if you can log in as a user for


I would also think that you would have trouble with the credential… isn’t is 
tied to a user at a specific machine?

there are two types of kerberos identity, simple "hdfs@REALM" and 
server-specific "hdfs/server@REALM". The simple ones work just as well in small 
clusters, it's just that in larger clusters your KDCs (especially AD) tend to 
interpret an attempt by 200 machines to log in as user "hdfs@REALM" in 30s as 
an attempt to brute force a password, and start rejecting logins. The 
separation into hdfs/_HOST_/REALM style avoids that, and may reduce the damage 
if the keytab leaks

If the user submitting work is logged into the KDC of cluster C, e.g:


kinit user@CLUSTERC


and spark is configured to ask for the extra namenode tokens,

spark.yarn.access.namenodes hdfs://cluster-c:8020


..then spark MAY ask for those tokens, pass them up to cluster B and so have 
them available for talking to cluster C. The submitted job is using the block 
tokens, so doesn't need to log in to kerberos itself, and if cluster B is 
insecure, doesn't need to worry about credentials and identity there. The HDFS 
client code just returns the block token to talk to cluster C when an attempt 
to talk to the DN of cluster C is rejected with an "authenticate yourself" 
response.

The main issue to me is: will that token get picked up and propagated to an 
insecure cluster, so as to support this operation? Because there's a risk that 
the ubiquitous static method, UserGroupInformation.isSecurityEnabled() is being 
checked in places, and as the cluster itself isn't secure 
(hadoop.security.authentication  in core-site.xml != "simple"). It looks like 
org.apache.spark.deploy.yarn.security.HDFSCredentialProvider is doing exactly 
that (as does HBase and Hive), meaning job submission doesn't fetch tokens 
unless the destination cluster is secure.

One thing that could be attempted, would be turning authentication on to 
kerberos just in the job launch config, and seeing if that will collect all 
required tokens *without* getting confused by the fact that YARN and HDFS don't 
need them.

spark.hadoop.hadoop.security.authentication

I have no idea if this works; you've have to try it and see

(Its been a while since I looked at this and I drank heavily to forget 
Kerberos… so I may be a bit fuzzy here.)


denying all knowledge of Kerberos is always a good tactic.


Re: spark with kerberos

2016-10-18 Thread Michael Segel
(Sorry sent reply via wrong account.. )

Steve,

Kinda hijacking the thread, but I promise its still on topic to OP’s issue.. ;-)

Usually you will end up having a local Kerberos set up per cluster.
So your machine accounts (hive, yarn, hbase, etc …) are going to be local  to 
the cluster.

So you will have to set up some sort of realm trusts between the clusters.

If you’re going to be setting up security (Kerberos … ick! shivers… ;-) you’re 
going to want to keep the machine accounts isolated to the cluster.
And the OP said that he didn’t control the other cluster which makes me believe 
that they are separate.


I would also think that you would have trouble with the credential… isn’t is 
tied to a user at a specific machine?
(Its been a while since I looked at this and I drank heavily to forget 
Kerberos… so I may be a bit fuzzy here.)

Thx

-Mike
On Oct 18, 2016, at 2:59 PM, Steve Loughran 
> wrote:


On 17 Oct 2016, at 22:11, Michael Segel 
> wrote:

@Steve you are going to have to explain what you mean by ‘turn Kerberos on’.

Taken one way… it could mean making cluster B secure and running Kerberos and 
then you’d have to create some sort of trust between B and C,



I'd imagined making cluster B a kerberized cluster.

I don't think you need to go near trust relations though —ideally you'd just 
want the same accounts everywhere if you can, if not, the main thing is that 
the user submitting the job can get a credential for  that far NN at job 
submission time, and that credential is propagated all the way to the executors.


Did you mean turn on kerberos on the nodes in Cluster B so that each node 
becomes a trusted client that can connect to C

OR

Did you mean to turn on kerberos on the master node (eg edge node) where the 
data persists if you collect() it so its off the cluster on to a single machine 
and then push it from there so that only that machine has to have kerberos 
running and is a trusted server to Cluster C?


Note: In option 3, I hope I said it correctly, but I believe that you would be 
collecting the data to a client (edge node) before pushing it out to the 
secured cluster.





Does that make sense?

On Oct 14, 2016, at 1:32 PM, Steve Loughran 
> wrote:


On 13 Oct 2016, at 10:50, dbolshak 
> wrote:

Hello community,

We've a challenge and no ideas how to solve it.

The problem,

Say we have the following environment:
1. `cluster A`, the cluster does not use kerberos and we use it as a source
of data, important thing is - we don't manage this cluster.
2. `cluster B`, small cluster where our spark application is running and
performing some logic. (we manage this cluster and it does not have
kerberos).
3. `cluster C`, the cluster uses kerberos and we use it to keep results of
our spark application, we manage this cluster

Our requrements and conditions that are not mentioned yet:
1. All clusters are in a single data center, but in the different
subnetworks.
2. We cannot turn on kerberos on `cluster A`
3. We cannot turn off kerberos on `cluster C`
4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
5. Spark app is built on top of RDD and does not depend on spark-sql.

Does anybody know how to write data using RDD api to remote cluster which is
running with Kerberos?

If you want to talk to the secure clsuter, C, from code running in cluster B, 
you'll need to turn kerberos on there. Maybe, maybe, you could just get away 
with kerberos being turned off, but you, the user, launching the application 
while logged in to kerberos yourself and so trusted by Cluster C.

one of the problems you are likely to hit with Spark here is that it's only 
going to collect the tokens you need to talk to HDFS at the time you launch the 
application, and by default, it only knows about the cluster FS. You will need 
to tell spark about the other filesystem at launch time, so it will know to 
authenticate with it as you, then collect the tokens needed for the application 
itself to work with kerberos.

spark.yarn.access.namenodes=hdfs://cluster-c:8080

-Steve

ps: https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/






Re: spark with kerberos

2016-10-18 Thread Steve Loughran

On 17 Oct 2016, at 22:11, Michael Segel 
> wrote:

@Steve you are going to have to explain what you mean by ‘turn Kerberos on’.

Taken one way… it could mean making cluster B secure and running Kerberos and 
then you’d have to create some sort of trust between B and C,



I'd imagined making cluster B a kerberized cluster.

I don't think you need to go near trust relations though —ideally you'd just 
want the same accounts everywhere if you can, if not, the main thing is that 
the user submitting the job can get a credential for  that far NN at job 
submission time, and that credential is propagated all the way to the executors.


Did you mean turn on kerberos on the nodes in Cluster B so that each node 
becomes a trusted client that can connect to C

OR

Did you mean to turn on kerberos on the master node (eg edge node) where the 
data persists if you collect() it so its off the cluster on to a single machine 
and then push it from there so that only that machine has to have kerberos 
running and is a trusted server to Cluster C?


Note: In option 3, I hope I said it correctly, but I believe that you would be 
collecting the data to a client (edge node) before pushing it out to the 
secured cluster.





Does that make sense?

On Oct 14, 2016, at 1:32 PM, Steve Loughran 
> wrote:


On 13 Oct 2016, at 10:50, dbolshak 
> wrote:

Hello community,

We've a challenge and no ideas how to solve it.

The problem,

Say we have the following environment:
1. `cluster A`, the cluster does not use kerberos and we use it as a source
of data, important thing is - we don't manage this cluster.
2. `cluster B`, small cluster where our spark application is running and
performing some logic. (we manage this cluster and it does not have
kerberos).
3. `cluster C`, the cluster uses kerberos and we use it to keep results of
our spark application, we manage this cluster

Our requrements and conditions that are not mentioned yet:
1. All clusters are in a single data center, but in the different
subnetworks.
2. We cannot turn on kerberos on `cluster A`
3. We cannot turn off kerberos on `cluster C`
4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
5. Spark app is built on top of RDD and does not depend on spark-sql.

Does anybody know how to write data using RDD api to remote cluster which is
running with Kerberos?

If you want to talk to the secure clsuter, C, from code running in cluster B, 
you'll need to turn kerberos on there. Maybe, maybe, you could just get away 
with kerberos being turned off, but you, the user, launching the application 
while logged in to kerberos yourself and so trusted by Cluster C.

one of the problems you are likely to hit with Spark here is that it's only 
going to collect the tokens you need to talk to HDFS at the time you launch the 
application, and by default, it only knows about the cluster FS. You will need 
to tell spark about the other filesystem at launch time, so it will know to 
authenticate with it as you, then collect the tokens needed for the application 
itself to work with kerberos.

spark.yarn.access.namenodes=hdfs://cluster-c:8080

-Steve

ps: https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/




Re: spark with kerberos

2016-10-14 Thread Steve Loughran

On 13 Oct 2016, at 10:50, dbolshak 
> wrote:

Hello community,

We've a challenge and no ideas how to solve it.

The problem,

Say we have the following environment:
1. `cluster A`, the cluster does not use kerberos and we use it as a source
of data, important thing is - we don't manage this cluster.
2. `cluster B`, small cluster where our spark application is running and
performing some logic. (we manage this cluster and it does not have
kerberos).
3. `cluster C`, the cluster uses kerberos and we use it to keep results of
our spark application, we manage this cluster

Our requrements and conditions that are not mentioned yet:
1. All clusters are in a single data center, but in the different
subnetworks.
2. We cannot turn on kerberos on `cluster A`
3. We cannot turn off kerberos on `cluster C`
4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
5. Spark app is built on top of RDD and does not depend on spark-sql.

Does anybody know how to write data using RDD api to remote cluster which is
running with Kerberos?

If you want to talk to the secure clsuter, C, from code running in cluster B, 
you'll need to turn kerberos on there. Maybe, maybe, you could just get away 
with kerberos being turned off, but you, the user, launching the application 
while logged in to kerberos yourself and so trusted by Cluster C.

one of the problems you are likely to hit with Spark here is that it's only 
going to collect the tokens you need to talk to HDFS at the time you launch the 
application, and by default, it only knows about the cluster FS. You will need 
to tell spark about the other filesystem at launch time, so it will know to 
authenticate with it as you, then collect the tokens needed for the application 
itself to work with kerberos.

spark.yarn.access.namenodes=hdfs://cluster-c:8080

-Steve

ps: https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/


Re: spark with kerberos

2016-10-13 Thread Saisai Shao
I think security has nothing to do with what API you use, spark sql or RDD
API.

Assuming you're running on yarn cluster (that is the only cluster manager
supports Kerberos currently).

Firstly you need to get Kerberos tgt in your local spark-submit process,
after being authenticated by Kerberos, Spark could get delegation tokens
from HDFS, so that you could communicate with security hadoop cluster. Here
in your case since you have to communicate with other remote HDFS clusters,
so you have to get tokens from all these remote clusters, you could
configure "spark.yarn.access.namenodes" to list all the security hdfs
cluster you want to access, then hadoop client API will get tokens from all
these clusters.

For the details you could refer to
https://spark.apache.org/docs/latest/running-on-yarn.html.

I didn't try personally since I don't have such requirements. It may
requires additional steps which I missed. You could take a a try.


On Thu, Oct 13, 2016 at 6:38 PM, Denis Bolshakov <bolshakov.de...@gmail.com>
wrote:

> The problem happens when writting (reading works fine)
>
> rdd.saveAsNewAPIHadoopFile
>
> We use just RDD and HDFS, no other things.
> Spark 1.6.1 version.
> `Claster A` - CDH 5.7.1
> `Cluster B` - vanilla hadoop 2.6.5
> `Cluster C` - CDH 5.8.0
>
> Best regards,
> Denis
>
> On 13 October 2016 at 13:06, ayan guha <guha.a...@gmail.com> wrote:
>
>> And a little more details on Spark version, hadoop version and
>> distribution would also help...
>>
>> On Thu, Oct 13, 2016 at 9:05 PM, ayan guha <guha.a...@gmail.com> wrote:
>>
>>> I think one point you need to mention is your target - HDFS, Hive or
>>> Hbase (or something else) and which end points are used.
>>>
>>> On Thu, Oct 13, 2016 at 8:50 PM, dbolshak <bolshakov.de...@gmail.com>
>>> wrote:
>>>
>>>> Hello community,
>>>>
>>>> We've a challenge and no ideas how to solve it.
>>>>
>>>> The problem,
>>>>
>>>> Say we have the following environment:
>>>> 1. `cluster A`, the cluster does not use kerberos and we use it as a
>>>> source
>>>> of data, important thing is - we don't manage this cluster.
>>>> 2. `cluster B`, small cluster where our spark application is running and
>>>> performing some logic. (we manage this cluster and it does not have
>>>> kerberos).
>>>> 3. `cluster C`, the cluster uses kerberos and we use it to keep results
>>>> of
>>>> our spark application, we manage this cluster
>>>>
>>>> Our requrements and conditions that are not mentioned yet:
>>>> 1. All clusters are in a single data center, but in the different
>>>> subnetworks.
>>>> 2. We cannot turn on kerberos on `cluster A`
>>>> 3. We cannot turn off kerberos on `cluster C`
>>>> 4. We can turn on/off kerberos on `cluster B`, currently it's turned
>>>> off.
>>>> 5. Spark app is built on top of RDD and does not depend on spark-sql.
>>>>
>>>> Does anybody know how to write data using RDD api to remote cluster
>>>> which is
>>>> running with Kerberos?
>>>>
>>>> --
>>>> //with Best Regards
>>>> --Denis Bolshakov
>>>> e-mail: bolshakov.de...@gmail.com
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/spark-with-kerberos-tp27894.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
>
> --
> //with Best Regards
> --Denis Bolshakov
> e-mail: bolshakov.de...@gmail.com
>
>
>


Re: spark with kerberos

2016-10-13 Thread Denis Bolshakov
The problem happens when writting (reading works fine)

rdd.saveAsNewAPIHadoopFile

We use just RDD and HDFS, no other things.
Spark 1.6.1 version.
`Claster A` - CDH 5.7.1
`Cluster B` - vanilla hadoop 2.6.5
`Cluster C` - CDH 5.8.0

Best regards,
Denis

On 13 October 2016 at 13:06, ayan guha <guha.a...@gmail.com> wrote:

> And a little more details on Spark version, hadoop version and
> distribution would also help...
>
> On Thu, Oct 13, 2016 at 9:05 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> I think one point you need to mention is your target - HDFS, Hive or
>> Hbase (or something else) and which end points are used.
>>
>> On Thu, Oct 13, 2016 at 8:50 PM, dbolshak <bolshakov.de...@gmail.com>
>> wrote:
>>
>>> Hello community,
>>>
>>> We've a challenge and no ideas how to solve it.
>>>
>>> The problem,
>>>
>>> Say we have the following environment:
>>> 1. `cluster A`, the cluster does not use kerberos and we use it as a
>>> source
>>> of data, important thing is - we don't manage this cluster.
>>> 2. `cluster B`, small cluster where our spark application is running and
>>> performing some logic. (we manage this cluster and it does not have
>>> kerberos).
>>> 3. `cluster C`, the cluster uses kerberos and we use it to keep results
>>> of
>>> our spark application, we manage this cluster
>>>
>>> Our requrements and conditions that are not mentioned yet:
>>> 1. All clusters are in a single data center, but in the different
>>> subnetworks.
>>> 2. We cannot turn on kerberos on `cluster A`
>>> 3. We cannot turn off kerberos on `cluster C`
>>> 4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
>>> 5. Spark app is built on top of RDD and does not depend on spark-sql.
>>>
>>> Does anybody know how to write data using RDD api to remote cluster
>>> which is
>>> running with Kerberos?
>>>
>>> --
>>> //with Best Regards
>>> --Denis Bolshakov
>>> e-mail: bolshakov.de...@gmail.com
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/spark-with-kerberos-tp27894.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.de...@gmail.com


Re: spark with kerberos

2016-10-13 Thread ayan guha
And a little more details on Spark version, hadoop version and distribution
would also help...

On Thu, Oct 13, 2016 at 9:05 PM, ayan guha <guha.a...@gmail.com> wrote:

> I think one point you need to mention is your target - HDFS, Hive or Hbase
> (or something else) and which end points are used.
>
> On Thu, Oct 13, 2016 at 8:50 PM, dbolshak <bolshakov.de...@gmail.com>
> wrote:
>
>> Hello community,
>>
>> We've a challenge and no ideas how to solve it.
>>
>> The problem,
>>
>> Say we have the following environment:
>> 1. `cluster A`, the cluster does not use kerberos and we use it as a
>> source
>> of data, important thing is - we don't manage this cluster.
>> 2. `cluster B`, small cluster where our spark application is running and
>> performing some logic. (we manage this cluster and it does not have
>> kerberos).
>> 3. `cluster C`, the cluster uses kerberos and we use it to keep results of
>> our spark application, we manage this cluster
>>
>> Our requrements and conditions that are not mentioned yet:
>> 1. All clusters are in a single data center, but in the different
>> subnetworks.
>> 2. We cannot turn on kerberos on `cluster A`
>> 3. We cannot turn off kerberos on `cluster C`
>> 4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
>> 5. Spark app is built on top of RDD and does not depend on spark-sql.
>>
>> Does anybody know how to write data using RDD api to remote cluster which
>> is
>> running with Kerberos?
>>
>> --
>> //with Best Regards
>> --Denis Bolshakov
>> e-mail: bolshakov.de...@gmail.com
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/spark-with-kerberos-tp27894.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 
Best Regards,
Ayan Guha


Re: spark with kerberos

2016-10-13 Thread ayan guha
I think one point you need to mention is your target - HDFS, Hive or Hbase
(or something else) and which end points are used.

On Thu, Oct 13, 2016 at 8:50 PM, dbolshak <bolshakov.de...@gmail.com> wrote:

> Hello community,
>
> We've a challenge and no ideas how to solve it.
>
> The problem,
>
> Say we have the following environment:
> 1. `cluster A`, the cluster does not use kerberos and we use it as a source
> of data, important thing is - we don't manage this cluster.
> 2. `cluster B`, small cluster where our spark application is running and
> performing some logic. (we manage this cluster and it does not have
> kerberos).
> 3. `cluster C`, the cluster uses kerberos and we use it to keep results of
> our spark application, we manage this cluster
>
> Our requrements and conditions that are not mentioned yet:
> 1. All clusters are in a single data center, but in the different
> subnetworks.
> 2. We cannot turn on kerberos on `cluster A`
> 3. We cannot turn off kerberos on `cluster C`
> 4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
> 5. Spark app is built on top of RDD and does not depend on spark-sql.
>
> Does anybody know how to write data using RDD api to remote cluster which
> is
> running with Kerberos?
>
> --
> //with Best Regards
> --Denis Bolshakov
> e-mail: bolshakov.de...@gmail.com
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/spark-with-kerberos-tp27894.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha


spark with kerberos

2016-10-13 Thread dbolshak
Hello community,

We've a challenge and no ideas how to solve it.

The problem,

Say we have the following environment:
1. `cluster A`, the cluster does not use kerberos and we use it as a source
of data, important thing is - we don't manage this cluster. 
2. `cluster B`, small cluster where our spark application is running and
performing some logic. (we manage this cluster and it does not have
kerberos).
3. `cluster C`, the cluster uses kerberos and we use it to keep results of
our spark application, we manage this cluster

Our requrements and conditions that are not mentioned yet:
1. All clusters are in a single data center, but in the different
subnetworks.
2. We cannot turn on kerberos on `cluster A`
3. We cannot turn off kerberos on `cluster C`
4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
5. Spark app is built on top of RDD and does not depend on spark-sql.

Does anybody know how to write data using RDD api to remote cluster which is
running with Kerberos?

-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.de...@gmail.com



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-with-kerberos-tp27894.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark with kerberos

2016-10-13 Thread Denis Bolshakov
Hello community,

We've a challenge and no ideas how to solve it.

The problem,

Say we have the following environment:
1. `cluster A`, the cluster does not use kerberos and we use it as a source
of data, important thing is - we don't manage this cluster.
2. `cluster B`, small cluster where our spark application is running and
performing some logic. (we manage this cluster and it does not have
kerberos).
3. `cluster C`, the cluster uses kerberos and we use it to keep results of
our spark application, we manage this cluster

Our requrements and conditions that are not mentioned yet:
1. All clusters are in a single data center, but in the different
subnetworks.
2. We cannot turn on kerberos on `cluster A`
3. We cannot turn off kerberos on `cluster C`
4. We can turn on/off kerberos on `cluster B`, currently it's turned off.
5. Spark app is built on top of RDD and does not depend on spark-sql.

Does anybody know how to write data using RDD api to remote cluster which
is running with Kerberos?

-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.de...@gmail.com


Re: Spark + Sentry + Kerberos don't add up?

2016-02-24 Thread Ruslan Dautkhanov
Turns to be it is a Spark issue

https://issues.apache.org/jira/browse/SPARK-13478




-- 
Ruslan Dautkhanov

On Mon, Jan 18, 2016 at 4:25 PM, Ruslan Dautkhanov 
wrote:

> Hi Romain,
>
> Thank you for your response.
>
> Adding Kerberos support might be as simple as
> https://issues.cloudera.org/browse/LIVY-44 ? I.e. add Livy --principal
> and --keytab parameters to be passed to spark-submit.
>
> As a workaround I just did kinit (using hues' keytab) and then launched
> Livy Server. It probably will work as long as kerberos ticket doesn't
> expire. That's it would be great to have support for --principal and
> --keytab parameters for spark-submit as explined in
> http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_sg_yarn_long_jobs.html
>
>
> The only problem I have currently is the above error stack in my previous
> email:
>
> The Spark session could not be created in the cluster:
>> at org.apache.hadoop.security.*UserGroupInformation.doAs*(
>> UserGroupInformation.java:1671)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
>> SparkSubmit.scala:160)
>
>
>
> >> AFAIK Hive impersonation should be turned off when using Sentry
>
> Yep, exactly. That's what I did. It is disabled now. But looks like on
> other hand, Spark or Spark Notebook want to have that enabled?
> It tries to do org.apache.hadoop.security.UserGroupInformation.doAs()
> hence the error.
>
> So Sentry isn't compatible with Spark in kerberized clusters? Is any
> workaround for this problem?
>
>
> --
> Ruslan Dautkhanov
>
> On Mon, Jan 18, 2016 at 3:52 PM, Romain Rigaux 
> wrote:
>
>> Livy does not support any Kerberos yet
>> https://issues.cloudera.org/browse/LIVY-3
>>
>> Are you focusing instead about HS2 + Kerberos with Sentry?
>>
>> AFAIK Hive impersonation should be turned off when using Sentry:
>> http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/sg_sentry_service_config.html
>>
>> On Sun, Jan 17, 2016 at 10:04 PM, Ruslan Dautkhanov > > wrote:
>>
>>> Getting following error stack
>>>
>>> The Spark session could not be created in the cluster:
 at org.apache.hadoop.security.*UserGroupInformation.doAs*
 (UserGroupInformation.java:1671)
 at
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:160)
 at
 org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) )
 at org.*apache.hadoop.hive.metastore.HiveMetaStoreClient*
 .open(HiveMetaStoreClient.java:466)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:234)
 at
 org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
 ... 35 more
>>>
>>>
>>> My understanding that hive.server2.enable.impersonation and
>>> hive.server2.enable.doAs should be enabled to make
>>> UserGroupInformation.doAs() work?
>>>
>>> When I try to enable these parameters, Cloudera Manager shows error
>>>
>>> Hive Impersonation is enabled for Hive Server2 role 'HiveServer2
 (hostname)'.
 Hive Impersonation should be disabled to enable Hive authorization
 using Sentry
>>>
>>>
>>> So Spark-Hive conflicts with Sentry!?
>>>
>>> Environment: Hue 3.9 Spark Notebooks + Livy Server (built from master).
>>> CDH 5.5.
>>>
>>> This is a kerberized cluster with Sentry.
>>>
>>> I was using hue's keytab as hue user is normally (by default in CDH) is
>>> allowed to impersonate to other users.
>>> So very convenient for Spark Notebooks.
>>>
>>> Any information to help solve this will be highly appreciated.
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Hue-Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to hue-user+unsubscr...@cloudera.org.
>>>
>>
>>
>


Re: Spark + Sentry + Kerberos don't add up?

2016-01-20 Thread Ruslan Dautkhanov
I took liberty and created a JIRA https://github.com/cloudera/livy/issues/36
Feel free to close it if doesn't belong to Livy project.
I really don't know if this is a Spark or a Livy/Sentry problem.

Any ideas for possible workarounds?

Thank you.



-- 
Ruslan Dautkhanov

On Mon, Jan 18, 2016 at 4:25 PM, Ruslan Dautkhanov 
wrote:

> Hi Romain,
>
> Thank you for your response.
>
> Adding Kerberos support might be as simple as
> https://issues.cloudera.org/browse/LIVY-44 ? I.e. add Livy --principal
> and --keytab parameters to be passed to spark-submit.
>
> As a workaround I just did kinit (using hues' keytab) and then launched
> Livy Server. It probably will work as long as kerberos ticket doesn't
> expire. That's it would be great to have support for --principal and
> --keytab parameters for spark-submit as explined in
> http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_sg_yarn_long_jobs.html
>
>
> The only problem I have currently is the above error stack in my previous
> email:
>
> The Spark session could not be created in the cluster:
>> at org.apache.hadoop.security.*UserGroupInformation.doAs*(
>> UserGroupInformation.java:1671)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
>> SparkSubmit.scala:160)
>
>
>
> >> AFAIK Hive impersonation should be turned off when using Sentry
>
> Yep, exactly. That's what I did. It is disabled now. But looks like on
> other hand, Spark or Spark Notebook want to have that enabled?
> It tries to do org.apache.hadoop.security.UserGroupInformation.doAs()
> hence the error.
>
> So Sentry isn't compatible with Spark in kerberized clusters? Is any
> workaround for this problem?
>
>
> --
> Ruslan Dautkhanov
>
> On Mon, Jan 18, 2016 at 3:52 PM, Romain Rigaux 
> wrote:
>
>> Livy does not support any Kerberos yet
>> https://issues.cloudera.org/browse/LIVY-3
>>
>> Are you focusing instead about HS2 + Kerberos with Sentry?
>>
>> AFAIK Hive impersonation should be turned off when using Sentry:
>> http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/sg_sentry_service_config.html
>>
>> On Sun, Jan 17, 2016 at 10:04 PM, Ruslan Dautkhanov > > wrote:
>>
>>> Getting following error stack
>>>
>>> The Spark session could not be created in the cluster:
 at org.apache.hadoop.security.*UserGroupInformation.doAs*
 (UserGroupInformation.java:1671)
 at
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:160)
 at
 org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) )
 at org.*apache.hadoop.hive.metastore.HiveMetaStoreClient*
 .open(HiveMetaStoreClient.java:466)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:234)
 at
 org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
 ... 35 more
>>>
>>>
>>> My understanding that hive.server2.enable.impersonation and
>>> hive.server2.enable.doAs should be enabled to make
>>> UserGroupInformation.doAs() work?
>>>
>>> When I try to enable these parameters, Cloudera Manager shows error
>>>
>>> Hive Impersonation is enabled for Hive Server2 role 'HiveServer2
 (hostname)'.
 Hive Impersonation should be disabled to enable Hive authorization
 using Sentry
>>>
>>>
>>> So Spark-Hive conflicts with Sentry!?
>>>
>>> Environment: Hue 3.9 Spark Notebooks + Livy Server (built from master).
>>> CDH 5.5.
>>>
>>> This is a kerberized cluster with Sentry.
>>>
>>> I was using hue's keytab as hue user is normally (by default in CDH) is
>>> allowed to impersonate to other users.
>>> So very convenient for Spark Notebooks.
>>>
>>> Any information to help solve this will be highly appreciated.
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Hue-Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to hue-user+unsubscr...@cloudera.org.
>>>
>>
>>
>


Re: Error in Spark Executors when trying to read HBase table from Spark with Kerberos enabled

2016-01-18 Thread Vinay Kashyap
Hi Guys,

Any help regarding this issue..??



On Wed, Jan 13, 2016 at 6:39 PM, Vinay Kashyap  wrote:

> Hi all,
>
> I am using  *Spark 1.5.1 in YARN cluster mode in CDH 5.5.*
> I am trying to create an RDD by reading HBase table with kerberos enabled.
> I am able to launch the spark job to read the HBase table, but I notice
> that the executors launched for the job cannot proceed due to an issue with
> Kerberos and they are stuck indefinitely.
>
> Below is my code to read a HBase table.
>
>
> *Configuration configuration = HBaseConfiguration.create();*
> *  configuration.set(TableInputFormat.INPUT_TABLE,
> frameStorage.getHbaseStorage().getTableId());*
> *  String hbaseKerberosUser = "sparkUser";*
> *  String hbaseKerberosKeytab = "";*
> *  if (!hbaseKerberosUser.trim().isEmpty() &&
> !hbaseKerberosKeytab.trim().isEmpty()) {*
> *configuration.set("hadoop.security.authentication", "kerberos");*
> *configuration.set("hbase.security.authentication", "kerberos");*
> *configuration.set("hbase.security.authorization", "true");*
> *configuration.set("hbase.rpc.protection", "authentication");*
> *configuration.set("hbase.master.kerberos.principal",
> "hbase/_HOST@CERT.LOCAL");*
> *configuration.set("hbase.regionserver.kerberos.principal",
> "hbase/_HOST@CERT.LOCAL");*
> *configuration.set("hbase.rest.kerberos.principal",
> "hbase/_HOST@CERT.LOCAL");*
> *configuration.set("hbase.thrift.kerberos.principal",
> "hbase/_HOST@CERT.LOCAL");*
> *configuration.set("hbase.master.keytab.file",
> hbaseKerberosKeytab);*
> *configuration.set("hbase.regionserver.keytab.file",
> hbaseKerberosKeytab);*
> *configuration.set("hbase.rest.authentication.kerberos.keytab",
> hbaseKerberosKeytab);*
> *configuration.set("hbase.thrift.keytab.file",
> hbaseKerberosKeytab);*
> *UserGroupInformation.setConfiguration(configuration);*
> *if (UserGroupInformation.isSecurityEnabled()) {*
> *  UserGroupInformation ugi = UserGroupInformation*
> *  .loginUserFromKeytabAndReturnUGI(hbaseKerberosUser,
> hbaseKerberosKeytab);*
> *  TokenUtil.obtainAndCacheToken(configuration, ugi);*
> *}*
> *  }*
>
> *  System.out.println("loading HBase Table RDD ...");*
> *  JavaPairRDD hbaseTableRDD =
> this.sparkContext.newAPIHadoopRDD(*
> *  configuration, TableInputFormat.class,
> ImmutableBytesWritable.class, Result.class);*
> *  JavaRDD tableRDD = getTableRDD(hbaseTableRDD, dataFrameModel);*
> *  System.out.println("Count :: " + tableRDD.count());*
> Following is the error which I can see in the container logs
>
> *16/01/13 10:01:42 WARN security.UserGroupInformation:
> PriviledgedActionException as:sparkUser (auth:SIMPLE)
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]*
> *16/01/13 10:01:42 WARN ipc.RpcClient: Exception encountered while
> connecting to the server : javax.security.sasl.SaslException: GSS initiate
> failed [Caused by GSSException: No valid credentials provided (Mechanism
> level: Failed to find any Kerberos tgt)]*
> *16/01/13 10:01:42 ERROR ipc.RpcClient: SASL authentication failed. The
> most likely cause is missing or invalid credentials. Consider 'kinit'.*
> *javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]*
> * at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)*
> * at
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)*
> * at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:770)*
> * at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$600(RpcClient.java:357)*
> * at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:891)*
> * at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:888)*
> * at java.security.AccessController.doPrivileged(Native Method)*
> * at javax.security.auth.Subject.doAs(Subject.java:415)*
>
> Have valid Kerberos Token as can be seen below:
>
> sparkUser@infra:/ebs1/agent$ klist
> Ticket cache: FILE:/tmp/krb5cc_1001
> Default principal: sparkUser@CERT.LOCAL
>
> Valid startingExpires   Service principal
> 13/01/2016 12:07  14/01/2016 12:07  krbtgt/CERT.LOCAL@CERT.LOCAL
>
> Also, I confirmed that only reading from HBase is giving this problem.
> Because I can read a simple file in HDFS and I am able to create the RDD as
> required.
> After digging through some contents in the net, found that there is a
> ticket in JIRA which is logged which is similar to what I am experiencing
> *https://issues.apache.org/jira/browse/SPARK-12279
> 

Re: Spark + Sentry + Kerberos don't add up?

2016-01-18 Thread Ruslan Dautkhanov
Hi Romain,

Thank you for your response.

Adding Kerberos support might be as simple as
https://issues.cloudera.org/browse/LIVY-44 ? I.e. add Livy --principal and
--keytab parameters to be passed to spark-submit.

As a workaround I just did kinit (using hues' keytab) and then launched
Livy Server. It probably will work as long as kerberos ticket doesn't
expire. That's it would be great to have support for --principal and
--keytab parameters for spark-submit as explined in
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_sg_yarn_long_jobs.html


The only problem I have currently is the above error stack in my previous
email:

The Spark session could not be created in the cluster:
> at org.apache.hadoop.security.*UserGroupInformation.doAs*(
> UserGroupInformation.java:1671)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:160)



>> AFAIK Hive impersonation should be turned off when using Sentry

Yep, exactly. That's what I did. It is disabled now. But looks like on
other hand, Spark or Spark Notebook want to have that enabled?
It tries to do org.apache.hadoop.security.UserGroupInformation.doAs() hence
the error.

So Sentry isn't compatible with Spark in kerberized clusters? Is any
workaround for this problem?


-- 
Ruslan Dautkhanov

On Mon, Jan 18, 2016 at 3:52 PM, Romain Rigaux  wrote:

> Livy does not support any Kerberos yet
> https://issues.cloudera.org/browse/LIVY-3
>
> Are you focusing instead about HS2 + Kerberos with Sentry?
>
> AFAIK Hive impersonation should be turned off when using Sentry:
> http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/sg_sentry_service_config.html
>
> On Sun, Jan 17, 2016 at 10:04 PM, Ruslan Dautkhanov 
> wrote:
>
>> Getting following error stack
>>
>> The Spark session could not be created in the cluster:
>>> at org.apache.hadoop.security.*UserGroupInformation.doAs*
>>> (UserGroupInformation.java:1671)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:160)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) )
>>> at org.*apache.hadoop.hive.metastore.HiveMetaStoreClient*
>>> .open(HiveMetaStoreClient.java:466)
>>> at
>>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:234)
>>> at
>>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>>> ... 35 more
>>
>>
>> My understanding that hive.server2.enable.impersonation and
>> hive.server2.enable.doAs should be enabled to make
>> UserGroupInformation.doAs() work?
>>
>> When I try to enable these parameters, Cloudera Manager shows error
>>
>> Hive Impersonation is enabled for Hive Server2 role 'HiveServer2
>>> (hostname)'.
>>> Hive Impersonation should be disabled to enable Hive authorization using
>>> Sentry
>>
>>
>> So Spark-Hive conflicts with Sentry!?
>>
>> Environment: Hue 3.9 Spark Notebooks + Livy Server (built from master).
>> CDH 5.5.
>>
>> This is a kerberized cluster with Sentry.
>>
>> I was using hue's keytab as hue user is normally (by default in CDH) is
>> allowed to impersonate to other users.
>> So very convenient for Spark Notebooks.
>>
>> Any information to help solve this will be highly appreciated.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Hue-Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to hue-user+unsubscr...@cloudera.org.
>>
>
>


Spark + Sentry + Kerberos don't add up?

2016-01-17 Thread Ruslan Dautkhanov
Getting following error stack

The Spark session could not be created in the cluster:
> at org.apache.hadoop.security.*UserGroupInformation.doAs*
> (UserGroupInformation.java:1671)
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:160)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) )
> at org.*apache.hadoop.hive.metastore.HiveMetaStoreClient*
> .open(HiveMetaStoreClient.java:466)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:234)
> at
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
> ... 35 more


My understanding that hive.server2.enable.impersonation and
hive.server2.enable.doAs should be enabled to make
UserGroupInformation.doAs() work?

When I try to enable these parameters, Cloudera Manager shows error

Hive Impersonation is enabled for Hive Server2 role 'HiveServer2
> (hostname)'.
> Hive Impersonation should be disabled to enable Hive authorization using
> Sentry


So Spark-Hive conflicts with Sentry!?

Environment: Hue 3.9 Spark Notebooks + Livy Server (built from master). CDH
5.5.

This is a kerberized cluster with Sentry.

I was using hue's keytab as hue user is normally (by default in CDH) is
allowed to impersonate to other users.
So very convenient for Spark Notebooks.

Any information to help solve this will be highly appreciated.


-- 
Ruslan Dautkhanov


Error in Spark Executors when trying to read HBase table from Spark with Kerberos enabled

2016-01-13 Thread Vinay Kashyap
Hi all,

I am using  *Spark 1.5.1 in YARN cluster mode in CDH 5.5.*
I am trying to create an RDD by reading HBase table with kerberos enabled.
I am able to launch the spark job to read the HBase table, but I notice
that the executors launched for the job cannot proceed due to an issue with
Kerberos and they are stuck indefinitely.

Below is my code to read a HBase table.


*Configuration configuration = HBaseConfiguration.create();*
*  configuration.set(TableInputFormat.INPUT_TABLE,
frameStorage.getHbaseStorage().getTableId());*
*  String hbaseKerberosUser = "sparkUser";*
*  String hbaseKerberosKeytab = "";*
*  if (!hbaseKerberosUser.trim().isEmpty() &&
!hbaseKerberosKeytab.trim().isEmpty()) {*
*configuration.set("hadoop.security.authentication", "kerberos");*
*configuration.set("hbase.security.authentication", "kerberos");*
*configuration.set("hbase.security.authorization", "true");*
*configuration.set("hbase.rpc.protection", "authentication");*
*configuration.set("hbase.master.kerberos.principal",
"hbase/_HOST@CERT.LOCAL");*
*configuration.set("hbase.regionserver.kerberos.principal",
"hbase/_HOST@CERT.LOCAL");*
*configuration.set("hbase.rest.kerberos.principal",
"hbase/_HOST@CERT.LOCAL");*
*configuration.set("hbase.thrift.kerberos.principal",
"hbase/_HOST@CERT.LOCAL");*
*configuration.set("hbase.master.keytab.file",
hbaseKerberosKeytab);*
*configuration.set("hbase.regionserver.keytab.file",
hbaseKerberosKeytab);*
*configuration.set("hbase.rest.authentication.kerberos.keytab",
hbaseKerberosKeytab);*
*configuration.set("hbase.thrift.keytab.file",
hbaseKerberosKeytab);*
*UserGroupInformation.setConfiguration(configuration);*
*if (UserGroupInformation.isSecurityEnabled()) {*
*  UserGroupInformation ugi = UserGroupInformation*
*  .loginUserFromKeytabAndReturnUGI(hbaseKerberosUser,
hbaseKerberosKeytab);*
*  TokenUtil.obtainAndCacheToken(configuration, ugi);*
*}*
*  }*

*  System.out.println("loading HBase Table RDD ...");*
*  JavaPairRDD hbaseTableRDD =
this.sparkContext.newAPIHadoopRDD(*
*  configuration, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);*
*  JavaRDD tableRDD = getTableRDD(hbaseTableRDD, dataFrameModel);*
*  System.out.println("Count :: " + tableRDD.count());*
Following is the error which I can see in the container logs

*16/01/13 10:01:42 WARN security.UserGroupInformation:
PriviledgedActionException as:sparkUser (auth:SIMPLE)
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]*
*16/01/13 10:01:42 WARN ipc.RpcClient: Exception encountered while
connecting to the server : javax.security.sasl.SaslException: GSS initiate
failed [Caused by GSSException: No valid credentials provided (Mechanism
level: Failed to find any Kerberos tgt)]*
*16/01/13 10:01:42 ERROR ipc.RpcClient: SASL authentication failed. The
most likely cause is missing or invalid credentials. Consider 'kinit'.*
*javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]*
* at
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)*
* at
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)*
* at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:770)*
* at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$600(RpcClient.java:357)*
* at
org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:891)*
* at
org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:888)*
* at java.security.AccessController.doPrivileged(Native Method)*
* at javax.security.auth.Subject.doAs(Subject.java:415)*

Have valid Kerberos Token as can be seen below:

sparkUser@infra:/ebs1/agent$ klist
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: sparkUser@CERT.LOCAL

Valid startingExpires   Service principal
13/01/2016 12:07  14/01/2016 12:07  krbtgt/CERT.LOCAL@CERT.LOCAL

Also, I confirmed that only reading from HBase is giving this problem.
Because I can read a simple file in HDFS and I am able to create the RDD as
required.
After digging through some contents in the net, found that there is a
ticket in JIRA which is logged which is similar to what I am experiencing
*https://issues.apache.org/jira/browse/SPARK-12279
*

Wanted to know if the issue is the same as I am facing..??
And any workaround for the same so that I can proceed with my requirement
reading from HBase table.??

-- 
*Thanks and regards*
*Vinay Kashyap*


Re: Spark + HBase + Kerberos

2015-03-18 Thread Eric Walk
Hi Ted,

The spark executors and hbase regions/masters are all collocated. This is a 2 
node test environment.

Best,
Eric

Eric Walk, Sr. Technical Consultant
p: 617.855.9255 |  NASDAQ: PRFT  |  Perficient.comhttp://www.perficient.com/








From: Ted Yu yuzhih...@gmail.com
Sent: Mar 18, 2015 2:46 PM
To: Eric Walk
Cc: user@spark.apache.org;Bill Busch
Subject: Re: Spark + HBase + Kerberos

Are hbase config / keytab files deployed on executor machines ?

Consider adding -Dsun.security.krb5.debug=true for debug purpose.

Cheers

On Wed, Mar 18, 2015 at 11:39 AM, Eric Walk 
eric.w...@perficient.commailto:eric.w...@perficient.com wrote:
Having an issue connecting to HBase from a Spark container in a Secure Cluster. 
Haven’t been able to get past this issue, any thoughts would be appreciated.

We’re able to perform some operations like “CreateTable” in the driver thread 
successfully. Read requests (always in the executor threads) are always failing 
with:
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]

Logs and scala are attached, the names of the innocent have masked for their 
protection (in a consistent manner).

Executing the following spark job (using HDP 2.2, Spark 1.2.0, HBase 0.98.4, 
Kerberos on AD):
export 
SPARK_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-server.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-hadoop2-compat.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-client.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-common.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/htrace-core-3.0.4.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/guava-12.0.1.jar:/usr/hdp/2.2.0.0-2041/hbase/conf

/usr/hdp/2.2.0.0-2041/spark/bin/spark-submit --class HBaseTest --driver-memory 
2g --executor-memory 1g --executor-cores 1 --num-executors 1 --master 
yarn-client ~/spark-test_2.10-1.0.jar

We see this error in the executor processes (attached as yarn log.txt):
2015-03-18 17:34:15,121 DEBUG [Executor task launch worker-0] 
security.HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos 
principal name is hbase/ldevawshdp0002.dc1.pvc@dc1.PVC
2015-03-18 17:34:15,128 WARN  [Executor task launch worker-0] ipc.RpcClient: 
Exception encountered while connecting to the server : 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
2015-03-18 17:34:15,129 ERROR [Executor task launch worker-0] ipc.RpcClient: 
SASL authentication failed. The most likely cause is missing or invalid 
credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]

The HBase Master Logs show success:
2015-03-18 17:34:12,861 DEBUG [RpcServer.listener,port=6] ipc.RpcServer: 
RpcServer.listener,port=6: connection from 
10.4.0.6:46636http://10.4.0.6:46636; # active connections: 3
2015-03-18 17:34:12,872 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
Kerberos principal name is hbase/ldevawshdp0001.dc1.pvc@DC1.PVC
2015-03-18 17:34:12,875 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
Created SASL server with mechanism = GSSAPI
2015-03-18 17:34:12,875 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
Have read input token of size 1501 for processing by 
saslServer.evaluateResponse()
2015-03-18 17:34:12,876 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
Will send token of size 108 from saslServer.
2015-03-18 17:34:12,877 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
Have read input token of size 0 for processing by saslServer.evaluateResponse()
2015-03-18 17:34:12,878 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
Will send token of size 32 from saslServer.
2015-03-18 17:34:12,878 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
Have read input token of size 32 for processing by saslServer.evaluateResponse()
2015-03-18 17:34:12,879 DEBUG [RpcServer.reader=3,port=6] 
security.HBaseSaslRpcServer: SASL server GSSAPI callback: setting canonicalized 
client ID: user1@DC1.PVC
2015-03-18 17:34:12,895 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
SASL server context established. Authenticated client: user1@DC1.PVC 
(auth:SIMPLE). Negotiated QoP is auth
2015-03-18 17:34:29,313 DEBUG [RpcServer.reader=3,port=6] ipc.RpcServer: 
RpcServer.listener,port=6: DISCONNECTING client 
10.4.0.6:46636http://10.4.0.6:46636 because read count=-1. Number of active 
connections: 3
2015-03-18 17:34:37,102 DEBUG [RpcServer.listener,port=6] ipc.RpcServer: 
RpcServer.listener,port=6: connection from 
10.4.0.6:46733http://10.4.0.6:46733; # active connections: 3
2015-03-18 17:34:37,102 DEBUG [RpcServer.reader=4,port=6] ipc.RpcServer: 
RpcServer.listener,port=6: DISCONNECTING client 
10.4.0.6:46733http://10.4.0.6:46733 because read count=-1. Number of active 
connections: 3

The Spark Driver

Re: Spark + HBase + Kerberos

2015-03-18 Thread Ted Yu
Are hbase config / keytab files deployed on executor machines ?

Consider adding -Dsun.security.krb5.debug=true for debug purpose.

Cheers

On Wed, Mar 18, 2015 at 11:39 AM, Eric Walk eric.w...@perficient.com
wrote:

  Having an issue connecting to HBase from a Spark container in a Secure
 Cluster. Haven’t been able to get past this issue, any thoughts would be
 appreciated.



 We’re able to perform some operations like “CreateTable” in the driver
 thread successfully. Read requests (always in the executor threads) are
 always failing with:

 No valid credentials provided (Mechanism level: Failed to find any
 Kerberos tgt)]



 Logs and scala are attached, the names of the innocent have masked for
 their protection (in a consistent manner).



 Executing the following spark job (using HDP 2.2, Spark 1.2.0, HBase
 0.98.4, Kerberos on AD):

 export
 SPARK_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-server.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-hadoop2-compat.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-client.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-common.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/htrace-core-3.0.4.jar:/usr/hdp/2.2.0.0-2041/hbase/lib/guava-12.0.1.jar:/usr/hdp/2.2.0.0-2041/hbase/conf



 /usr/hdp/2.2.0.0-2041/spark/bin/spark-submit --class HBaseTest
 --driver-memory 2g --executor-memory 1g --executor-cores 1 --num-executors
 1 --master yarn-client ~/spark-test_2.10-1.0.jar



 We see this error in the executor processes (attached as yarn log.txt):

 2015-03-18 17:34:15,121 DEBUG [Executor task launch worker-0]
 security.HBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos
 principal name is hbase/ldevawshdp0002.dc1.pvc@dc1.PVC

 2015-03-18 17:34:15,128 WARN  [Executor task launch worker-0]
 ipc.RpcClient: Exception encountered while connecting to the server :
 javax.security.sasl.SaslException: GSS initiate failed [Caused by
 GSSException: No valid credentials provided (Mechanism level: Failed to
 find any Kerberos tgt)]

 2015-03-18 17:34:15,129 ERROR [Executor task launch worker-0]
 ipc.RpcClient: SASL authentication failed. The most likely cause is missing
 or invalid credentials. Consider 'kinit'.

 javax.security.sasl.SaslException: GSS initiate failed [Caused by
 GSSException: No valid credentials provided (Mechanism level: Failed to
 find any Kerberos tgt)]



 The HBase Master Logs show success:

 2015-03-18 17:34:12,861 DEBUG [RpcServer.listener,port=6]
 ipc.RpcServer: RpcServer.listener,port=6: connection from
 10.4.0.6:46636; # active connections: 3

 2015-03-18 17:34:12,872 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: Kerberos principal name is hbase/ldevawshdp0001.dc1.pvc@
 DC1.PVC

 2015-03-18 17:34:12,875 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: Created SASL server with mechanism = GSSAPI

 2015-03-18 17:34:12,875 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: Have read input token of size 1501 for processing by
 saslServer.evaluateResponse()

 2015-03-18 17:34:12,876 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: Will send token of size 108 from saslServer.

 2015-03-18 17:34:12,877 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: Have read input token of size 0 for processing by
 saslServer.evaluateResponse()

 2015-03-18 17:34:12,878 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: Will send token of size 32 from saslServer.

 2015-03-18 17:34:12,878 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: Have read input token of size 32 for processing by
 saslServer.evaluateResponse()

 2015-03-18 17:34:12,879 DEBUG [RpcServer.reader=3,port=6]
 security.HBaseSaslRpcServer: SASL server GSSAPI callback: setting
 canonicalized client ID: user1@DC1.PVC

 2015-03-18 17:34:12,895 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: SASL server context established. Authenticated client:
 user1@DC1.PVC (auth:SIMPLE). Negotiated QoP is auth

 2015-03-18 17:34:29,313 DEBUG [RpcServer.reader=3,port=6]
 ipc.RpcServer: RpcServer.listener,port=6: DISCONNECTING client
 10.4.0.6:46636 because read count=-1. Number of active connections: 3

 2015-03-18 17:34:37,102 DEBUG [RpcServer.listener,port=6]
 ipc.RpcServer: RpcServer.listener,port=6: connection from
 10.4.0.6:46733; # active connections: 3

 2015-03-18 17:34:37,102 DEBUG [RpcServer.reader=4,port=6]
 ipc.RpcServer: RpcServer.listener,port=6: DISCONNECTING client
 10.4.0.6:46733 because read count=-1. Number of active connections: 3



 The Spark Driver Console Output hangs at this point:

 2015-03-18 17:34:13,337 INFO  [main] spark.DefaultExecutionContext:
 Starting job: count at HBaseTest.scala:63

 2015-03-18 17:34:13,349 INFO
 [sparkDriver-akka.actor.default-dispatcher-4] scheduler.DAGScheduler: Got
 job 0 (count at HBaseTest.scala:63) with 1 output partitions
 (allowLocal=false)

 2015-03-18 17:34:13,350 INFO
 [sparkDriver-akka.actor.default-dispatcher-4] scheduler.DAGScheduler: Final
 

Re: Whether standalone spark support kerberos?

2015-02-05 Thread Kostas Sakellis
Standalone mode does not support talking to a kerberized HDFS. If you want
to talk to a kerberized (secure) HDFS cluster i suggest you use Spark on
Yarn.

On Wed, Feb 4, 2015 at 2:29 AM, Jander g jande...@gmail.com wrote:

 Hope someone helps me. Thanks.

 On Wed, Feb 4, 2015 at 6:14 PM, Jander g jande...@gmail.com wrote:

 We have a standalone spark cluster for kerberos test.

 But when reading from hdfs, i get error output: Can't get Master Kerberos
 principal for use as renewer.

 So Whether standalone spark support kerberos? can anyone confirm it? or
 what i missed?

 Thanks in advance.

 --
 Thanks,
 Jander




 --
 Thanks,
 Jander



Re: Whether standalone spark support kerberos?

2015-02-04 Thread Jander g
Hope someone helps me. Thanks.

On Wed, Feb 4, 2015 at 6:14 PM, Jander g jande...@gmail.com wrote:

 We have a standalone spark cluster for kerberos test.

 But when reading from hdfs, i get error output: Can't get Master Kerberos
 principal for use as renewer.

 So Whether standalone spark support kerberos? can anyone confirm it? or
 what i missed?

 Thanks in advance.

 --
 Thanks,
 Jander




-- 
Thanks,
Jander


Whether standalone spark support kerberos?

2015-02-04 Thread Jander g
We have a standalone spark cluster for kerberos test.

But when reading from hdfs, i get error output: Can't get Master Kerberos
principal for use as renewer.

So Whether standalone spark support kerberos? can anyone confirm it? or
what i missed?

Thanks in advance.

-- 
Thanks,
Jander


Re: [SPARK SQL] kerberos error when creating database from beeline/ThriftServer2

2014-10-28 Thread Cheng Lian
Which version of Spark and Hadoop are you using? Could you please provide
the full stack trace of the exception?

On Tue, Oct 28, 2014 at 5:48 AM, Du Li l...@yahoo-inc.com.invalid wrote:

   Hi,

  I was trying to set up Spark SQL on a private cluster. I configured a
 hive-site.xml under spark/conf that uses a local metestore with warehouse
 and default FS name set to HDFS on one of my corporate cluster. Then I
 started spark master, worker and thrift server. However, when creating a
 database on beeline, I got the following error:

  org.apache.hive.service.cli.HiveSQLException:
 org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution
 Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
 MetaException(message:Got exception: java.io.IOException Failed on local
 exception: java.io.IOException:
 org.apache.hadoop.security.AccessControlException: Client cannot
 authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
 “spark-master-host; destination host is: “HDFS-namenode:port; )

  It occurred when spark was trying to create a hdfs directory under the
 warehouse in order to create the database. All processes (spark master,
 worker, thrift server, beeline) were run as a user with the right access
 permissions. My spark classpaths have /home/y/conf/hadoop in the front. I
 was able to read and write files from hadoop fs command line under the same
 directory and also from the spark-shell without any issue.

  Any hints regarding the right way of configuration would be appreciated.

  Thanks,
 Du



Re: [SPARK SQL] kerberos error when creating database from beeline/ThriftServer2

2014-10-28 Thread Du Li
)

at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)

at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)

at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:103)

at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:98)

at 
org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:172)

at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193)

at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175)

at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150)

at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207)

at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)

at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)

at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

at 
org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)

at 
org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)

at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)

at 
org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)

at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)

Caused by: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]

at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:657)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)

at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:621)

at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)

at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:368)

at org.apache.hadoop.ipc.Client.getConnection(Client.java:1423)

at org.apache.hadoop.ipc.Client.call(Client.java:1342)

... 71 more

Caused by: org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]

at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:171)

at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:388)

at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:702)

at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:698)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)

at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:697)

... 74 more




From: Cheng Lian lian.cs@gmail.commailto:lian.cs@gmail.com
Date: Tuesday, October 28, 2014 at 2:50 AM
To: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: [SPARK SQL] kerberos error when creating database from 
beeline/ThriftServer2

Which version of Spark and Hadoop are you using? Could you please provide the 
full stack trace of the exception?

On Tue, Oct 28, 2014 at 5:48 AM, Du Li 
l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid wrote:
Hi,

I was trying to set up Spark SQL on a private cluster. I configured a 
hive-site.xml under spark/conf that uses a local metestore with warehouse and 
default FS name set to HDFS on one of my corporate cluster. Then I started 
spark master, worker and thrift server. However, when creating a database on 
beeline, I got the following error:

org.apache.hive.service.cli.HiveSQLException: 
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:Got exception: java.io.IOException Failed on local 
exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]; Host Details : local host is: “spark-master-host; 
destination host is: “HDFS-namenode:port

Re: [SPARK SQL] kerberos error when creating database from beeline/ThriftServer2

2014-10-28 Thread Du Li
If I put all the jar files from my local hive in the front of the spark class 
path, a different error was reported, as follows:


14/10/28 18:29:40 ERROR transport.TSaslTransport: SASL negotiation failure

javax.security.sasl.SaslException: PLAIN auth failed: null

at 
org.apache.hadoop.security.SaslPlainServer.evaluateResponse(SaslPlainServer.java:108)

at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:528)

at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:272)

at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)

at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)

at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:190)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)

14/10/28 18:29:40 ERROR server.TThreadPoolServer: Error occurred during 
processing of message.

java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
PLAIN auth failed: null

at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)

at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:190)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)

Caused by: org.apache.thrift.transport.TTransportException: PLAIN auth failed: 
null

at 
org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:221)

at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:305)

at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)

at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)

... 4 more




From: Cheng Lian lian.cs@gmail.commailto:lian.cs@gmail.com
Date: Tuesday, October 28, 2014 at 2:50 AM
To: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: [SPARK SQL] kerberos error when creating database from 
beeline/ThriftServer2

Which version of Spark and Hadoop are you using? Could you please provide the 
full stack trace of the exception?

On Tue, Oct 28, 2014 at 5:48 AM, Du Li 
l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid wrote:
Hi,

I was trying to set up Spark SQL on a private cluster. I configured a 
hive-site.xml under spark/conf that uses a local metestore with warehouse and 
default FS name set to HDFS on one of my corporate cluster. Then I started 
spark master, worker and thrift server. However, when creating a database on 
beeline, I got the following error:

org.apache.hive.service.cli.HiveSQLException: 
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:Got exception: java.io.IOException Failed on local 
exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]; Host Details : local host is: “spark-master-host; 
destination host is: “HDFS-namenode:port; )

It occurred when spark was trying to create a hdfs directory under the 
warehouse in order to create the database. All processes (spark master, worker, 
thrift server, beeline) were run as a user with the right access permissions. 
My spark classpaths have /home/y/conf/hadoop in the front. I was able to read 
and write files from hadoop fs command line under the same directory and also 
from the spark-shell without any issue.

Any hints regarding the right way of configuration would be appreciated.

Thanks,
Du



Re: [SPARK SQL] kerberos error when creating database from beeline/ThriftServer2

2014-10-28 Thread Du Li
To clarify, this error was thrown from the thrift server when beeline was 
started to establish the connection, as follows:
$ beeline -u jdbc:hive2://`hostname`:4080 –n username

From: Du Li l...@yahoo-inc.com.INVALIDmailto:l...@yahoo-inc.com.INVALID
Date: Tuesday, October 28, 2014 at 11:35 AM
To: Cheng Lian lian.cs@gmail.commailto:lian.cs@gmail.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: [SPARK SQL] kerberos error when creating database from 
beeline/ThriftServer2

If I put all the jar files from my local hive in the front of the spark class 
path, a different error was reported, as follows:


14/10/28 18:29:40 ERROR transport.TSaslTransport: SASL negotiation failure

javax.security.sasl.SaslException: PLAIN auth failed: null

at 
org.apache.hadoop.security.SaslPlainServer.evaluateResponse(SaslPlainServer.java:108)

at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:528)

at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:272)

at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)

at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)

at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:190)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)

14/10/28 18:29:40 ERROR server.TThreadPoolServer: Error occurred during 
processing of message.

java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
PLAIN auth failed: null

at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)

at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:190)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)

Caused by: org.apache.thrift.transport.TTransportException: PLAIN auth failed: 
null

at 
org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:221)

at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:305)

at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)

at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)

... 4 more




From: Cheng Lian lian.cs@gmail.commailto:lian.cs@gmail.com
Date: Tuesday, October 28, 2014 at 2:50 AM
To: Du Li l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: [SPARK SQL] kerberos error when creating database from 
beeline/ThriftServer2

Which version of Spark and Hadoop are you using? Could you please provide the 
full stack trace of the exception?

On Tue, Oct 28, 2014 at 5:48 AM, Du Li 
l...@yahoo-inc.com.invalidmailto:l...@yahoo-inc.com.invalid wrote:
Hi,

I was trying to set up Spark SQL on a private cluster. I configured a 
hive-site.xml under spark/conf that uses a local metestore with warehouse and 
default FS name set to HDFS on one of my corporate cluster. Then I started 
spark master, worker and thrift server. However, when creating a database on 
beeline, I got the following error:

org.apache.hive.service.cli.HiveSQLException: 
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:Got exception: java.io.IOException Failed on local 
exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]; Host Details : local host is: “spark-master-host; 
destination host is: “HDFS-namenode:port; )

It occurred when spark was trying to create a hdfs directory under the 
warehouse in order to create the database. All processes (spark master, worker, 
thrift server, beeline) were run as a user with the right access permissions. 
My spark classpaths have /home/y/conf/hadoop in the front. I was able to read 
and write files from hadoop fs command line under the same directory and also 
from the spark-shell without any issue.

Any hints regarding the right way of configuration would be appreciated.

Thanks,
Du



[SPARK SQL] kerberos error when creating database from beeline/ThriftServer2

2014-10-27 Thread Du Li
Hi,

I was trying to set up Spark SQL on a private cluster. I configured a 
hive-site.xml under spark/conf that uses a local metestore with warehouse and 
default FS name set to HDFS on one of my corporate cluster. Then I started 
spark master, worker and thrift server. However, when creating a database on 
beeline, I got the following error:

org.apache.hive.service.cli.HiveSQLException: 
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:Got exception: java.io.IOException Failed on local 
exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]; Host Details : local host is: “spark-master-host; 
destination host is: “HDFS-namenode:port; )

It occurred when spark was trying to create a hdfs directory under the 
warehouse in order to create the database. All processes (spark master, worker, 
thrift server, beeline) were run as a user with the right access permissions. 
My spark classpaths have /home/y/conf/hadoop in the front. I was able to read 
and write files from hadoop fs command line under the same directory and also 
from the spark-shell without any issue.

Any hints regarding the right way of configuration would be appreciated.

Thanks,
Du