Re: hdfs-ha on mesos - odd bug

Adrian Bridgett Tue, 15 Sep 2015 00:56:28 -0700

Hi Sam, in short, no, it's a traditional install as we plan to use spotinstances and didn't want price spikes to kill off HDFS.

We're actually doing a bit of a hybrid, using spot instances for themesos slaves, ondemand for the mesos masters. So for the time being,putting hdfs on the masters (we'll probably move to multiple slaveinstance types to avoid losing too many when spot price spikes, but fornow this is acceptable). Masters running CDH5.

Using hdfs://current-hdfs-master:8020 works fine, however usinghdfs://nameservice1 fails in the rather odd way described (well, morethat the workaround actually works!) I think there's some underlyingbug here that's being exposed.



On 14/09/2015 22:27, Sam Bessalah wrote:

I don't know about the broken url. But are you running HDFS as a mesosframework? If so is it using mesos-dns?Then you should resolve the namenode via hdfs://<activenamenode:8020>/....

On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett<adr...@opensignal.com <mailto:adr...@opensignal.com>> wrote:


    I'm hitting an odd issue with running spark on mesos together with
    HA-HDFS, with an even odder workaround.

    In particular I get an error that it can't find the HDFS
    nameservice unless I put in a _broken_ url (discovered that
    workaround by mistake!).  core-site.xml, hdfs-site.xml is
    distributed to the slave node - and that file is read since I
    deliberately break the file then I get an error as you'd expect.

    NB: This is a bit different to
    
http://mail-archives.us.apache.org/mod_mbox/spark-user/201402.mbox/%3c1392442185079-1549.p...@n3.nabble.com%3E


    Spark 1.5.0:

    t=sc.textFile("hdfs://nameservice1/tmp/issue")
    t.count()
    (fails)

    t=sc.textFile("file://etc/passwd")
    t.count()
    (errors about bad url - should have an extra / of course)
    t=sc.textFile("hdfs://nameservice1/tmp/issue")
    t.count()
    then it works!!!

    I should say that using file:///etc/passwd or hdfs:///tmp/issue
    both fail as well.  Unless preceded by a broken url.    I've tried
    setting spark.hadoop.cloneConf to true, no change.

    Sample (broken) run:
    15/09/14 13:00:14 DEBUG HadoopRDD: Creating new JobConf and
    caching it for later re-use
    15/09/14 13:00:14 DEBUG : address: ip-10-1-200-165/10.1.200.165
    <http://10.1.200.165> isLoopbackAddress: false, with host
    10.1.200.165 ip-10-1-200-165
    15/09/14 13:00:14 DEBUG BlockReaderLocal:
    dfs.client.use.legacy.blockreader.local = false
    15/09/14 13:00:14 DEBUG BlockReaderLocal:
    dfs.client.read.shortcircuit = false
    15/09/14 13:00:14 DEBUG BlockReaderLocal:
    dfs.client.domain.socket.data.traffic = false
    15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
    /var/run/hdfs-sockets/dn
    15/09/14 13:00:14 DEBUG HAUtil: No HA service delegation token
    found for logical URI hdfs://nameservice1
    15/09/14 13:00:14 DEBUG BlockReaderLocal:
    dfs.client.use.legacy.blockreader.local = false
    15/09/14 13:00:14 DEBUG BlockReaderLocal:
    dfs.client.read.shortcircuit = false
    15/09/14 13:00:14 DEBUG BlockReaderLocal:
    dfs.client.domain.socket.data.traffic = false
    15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
    /var/run/hdfs-sockets/dn
    15/09/14 13:00:14 DEBUG RetryUtils: multipleLinearRandomRetry = null
    15/09/14 13:00:14 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
    rpcRequestWrapperClass=class
    org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
    
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@6245f50b
    15/09/14 13:00:14 DEBUG Client: getting client out of cache:
    org.apache.hadoop.ipc.Client@267f0fd3
    15/09/14 13:00:14 DEBUG NativeCodeLoader: Trying to load the
    custom-built native-hadoop library...
    15/09/14 13:00:14 DEBUG NativeCodeLoader: Loaded the native-hadoop
    library
    ...
    15/09/14 13:00:14 DEBUG Client: Connecting to
    mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020>
    15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
    starting, having connections 1
    15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #0
    15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
    value #0
    15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getFileInfo took 36ms
    15/09/14 13:00:14 DEBUG FileInputFormat: Time taken to get
    FileStatuses: 69
    15/09/14 13:00:14 INFO FileInputFormat: Total input paths to
    process : 1
    15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #1
    15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
    value #1
    15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getBlockLocations
    took 1ms
    15/09/14 13:00:14 DEBUG FileInputFormat: Total # of splits
    generated by getSplits: 2, TimeTaken: 104
    ...
    15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu: closed
    15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
    stopped, remaining connections 0
    15/09/14 13:00:24 DEBUG
    AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor]
    received message
    AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
    from Actor[akka://sparkDriver/temp/$g]
    15/09/14 13:00:24 DEBUG
    AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC
    message:
    AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
    15/09/14 13:00:24 DEBUG
    AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor]
    handled message (0.513851 ms)
    AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
    from Actor[akka://sparkDriver/temp/$g]
    15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0
    (TID 0, 10.1.200.245): java.lang.IllegalArgumentException:
    java.net.UnknownHostException: nameservice1
        at
    
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at
    
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
        at
    org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
        at
    
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
        at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at
    org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at
    org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
        at
    
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
        at
    
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
        at
    
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
        at
    
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
        at
    
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
        at
    
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
        at scala.Option.map(Option.scala:145)
        at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
        at
    org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at
    org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at
    
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
        at
    org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
        at
    org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
    Caused by: java.net.UnknownHostException: nameservice1
        ... 32 more


    Sample working run:
    15/09/14 13:00:43 DEBUG HadoopRDD: Creating new JobConf and
    caching it for later re-use
    15/09/14 13:00:43 DEBUG : address: ip-10-1-200-165/10.1.200.165
    <http://10.1.200.165> isLoopbackAddress: false, with host
    10.1.200.165 ip-10-1-200-165
    15/09/14 13:00:43 DEBUG BlockReaderLocal:
    dfs.client.use.legacy.blockreader.local = false
    15/09/14 13:00:43 DEBUG BlockReaderLocal:
    dfs.client.read.shortcircuit = false
    15/09/14 13:00:43 DEBUG BlockReaderLocal:
    dfs.client.domain.socket.data.traffic = false
    15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
    /var/run/hdfs-sockets/dn
    15/09/14 13:00:43 DEBUG HAUtil: No HA service delegation token
    found for logical URI hdfs://nameservice1
    15/09/14 13:00:43 DEBUG BlockReaderLocal:
    dfs.client.use.legacy.blockreader.local = false
    15/09/14 13:00:43 DEBUG BlockReaderLocal:
    dfs.client.read.shortcircuit = false
    15/09/14 13:00:43 DEBUG BlockReaderLocal:
    dfs.client.domain.socket.data.traffic = false
    15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
    /var/run/hdfs-sockets/dn
    15/09/14 13:00:43 DEBUG RetryUtils: multipleLinearRandomRetry = null
    15/09/14 13:00:43 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
    rpcRequestWrapperClass=class
    org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
    
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@114b3357
    15/09/14 13:00:43 DEBUG Client: getting client out of cache:
    org.apache.hadoop.ipc.Client@28a248cd
    15/09/14 13:00:44 DEBUG NativeCodeLoader: Trying to load the
    custom-built native-hadoop library...
    15/09/14 13:00:44 DEBUG NativeCodeLoader: Loaded the native-hadoop
    library
    15/09/14 13:00:44 DEBUG DomainSocketWatcher:
    org.apache.hadoop.net.unix.DomainSocketWatcher$2@3962387d:
    starting with interruptCheckPeriodMs = 60000
    15/09/14 13:00:44 DEBUG PerformanceAdvisory: Both short-circuit
    local reads and UNIX domain socket are disabled.
    15/09/14 13:00:44 DEBUG DataTransferSaslUtil: DataTransferProtocol
    not using SaslPropertiesResolver, no QOP found in configuration
    for dfs.data.transfer.protection
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 1006, in
    count
        return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
      File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 997, in sum
        return self.mapPartitions(lambda x: [sum(x)]).fold(0,
    operator.add)
      File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 871, in fold
        vals = self.mapPartitions(func).collect()
      File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 773, in
    collect
        port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
      File
    "/home/ubuntu/spark15/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
    line 538, in __call__
      File "/home/ubuntu/spark15/python/pyspark/sql/utils.py", line
    42, in deco
        raise IllegalArgumentException(s.split(': ', 1)[1])
    pyspark.sql.utils.IllegalArgumentException: Wrong FS:
    file://etc/passwd, expected: file:///
    ...
    15/09/14 13:00:51 DEBUG HadoopRDD: Creating new JobConf and
    caching it for later re-use
    15/09/14 13:00:51 DEBUG Client: The ping interval is 60000 ms.
    15/09/14 13:00:51 DEBUG Client: Connecting to
    mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020>
    15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
    starting, having connections 1
    15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #0
    15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
    value #0
    15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getFileInfo took 32ms
    15/09/14 13:00:51 DEBUG FileInputFormat: Time taken to get
    FileStatuses: 64
    15/09/14 13:00:51 INFO FileInputFormat: Total input paths to
    process : 1
    15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #1
    15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
    to mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
    value #1
    15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getBlockLocations
    took 2ms
    15/09/14 13:00:51 DEBUG FileInputFormat: Total # of splits
    generated by getSplits: 2, TimeTaken: 95
    2
    (the answer!)


    The mesos logs are very slightly different (apologies - this was
    for a different run). Notice that dfs.domain.socket.path is blank
    (or cut-off by the exception?) in the broken run.

    Broken:
    15/09/14 13:48:30 DEBUG HadoopRDD: Cloning Hadoop Configuration
    15/09/14 13:48:30 DEBUG : address: ip-10-1-200-245/10.1.200.245
    <http://10.1.200.245> isLoopbackAddress: false, with host
    10.1.200.245 ip-10-1-200-245
    15/09/14 13:48:30 DEBUG BlockReaderLocal:
    dfs.client.use.legacy.blockreader.local = false
    15/09/14 13:48:30 DEBUG BlockReaderLocal:
    dfs.client.read.shortcircuit = false
    15/09/14 13:48:30 DEBUG BlockReaderLocal:
    dfs.client.domain.socket.data.traffic = false
    15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.domain.socket.path =
    15/09/14 13:48:30 ERROR PythonRDD: Python worker exited
    unexpectedly (crashed)
    org.apache.spark.api.python.PythonException: Traceback (most
    recent call last):
      File
    
"/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/worker.py",
    line 98, in main
        command = pickleSer._read_with_length(infile)
      File
    
"/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
    line 156, in _read_with_length
        length = read_int(stream)
      File
    
"/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
    line 544, in read_int
        raise EOFError
    EOFError

        at
    org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
        at
    org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
        at
    org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at
    org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at
    org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.IllegalArgumentException:
    java.net.UnknownHostException: nameservice1
        at
    
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at
    
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
        at
    org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
        at
    
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
        at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at
    org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at
    org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
        at
    
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
        at
    
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
        at
    
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
        at
    
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
        at
    
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
        at
    
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
        at scala.Option.map(Option.scala:145)
        at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
        at
    org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at
    org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at
    
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
        at
    org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
        at
    org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
    Caused by: java.net.UnknownHostException: nameservice1
        ... 32 more
    15/09/14 13:48:30 ERROR PythonRDD: This may have been caused by a
    prior exception:
    java.lang.IllegalArgumentException: java.net.UnknownHostException:
    nameservice1
        at
    
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at
    
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
        at
    org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
        at
    
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
        at
    org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at
    org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at
    org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
        at
    
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
        at
    
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
        at
    
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
        at
    
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
        at
    
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
        at
    
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
        at scala.Option.map(Option.scala:145)
        at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
        at
    org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at
    org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at
    
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
        at
    org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
        at
    org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
    Caused by: java.net.UnknownHostException: nameservice1
        ... 32 more

    Working:
    15/09/14 13:47:17 DEBUG HadoopRDD: Cloning Hadoop Configuration
    15/09/14 13:47:17 DEBUG : address: ip-10-1-200-245/10.1.200.245
    <http://10.1.200.245> isLoopbackAddress: false, with host
    10.1.200.245 ip-10-1-200-245
    15/09/14 13:47:17 DEBUG BlockReaderLocal:
    dfs.client.use.legacy.blockreader.local = false
    15/09/14 13:47:17 DEBUG BlockReaderLocal:
    dfs.client.read.shortcircuit = false
    15/09/14 13:47:17 DEBUG BlockReaderLocal:
    dfs.client.domain.socket.data.traffic = false
    15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
    /var/run/hdfs-sockets/dn
    15/09/14 13:47:17 DEBUG HAUtil: No HA service delegation token
    found for logical URI hdfs://nameservice1
    15/09/14 13:47:17 DEBUG BlockReaderLocal:
    dfs.client.use.legacy.blockreader.local = false
    15/09/14 13:47:17 DEBUG BlockReaderLocal:
    dfs.client.read.shortcircuit = false
    15/09/14 13:47:17 DEBUG BlockReaderLocal:
    dfs.client.domain.socket.data.traffic = false
    15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
    /var/run/hdfs-sockets/dn
    15/09/14 13:47:17 DEBUG RetryUtils: multipleLinearRandomRetry = null
    15/09/14 13:47:17 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
    rpcRequestWrapperClass=class
    org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
    
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@30b68416
    15/09/14 13:47:17 DEBUG Client: getting client out of cache:
    org.apache.hadoop.ipc.Client@4599b420
    15/09/14 13:47:18 DEBUG NativeCodeLoader: Trying to load the
    custom-built native-hadoop library...
    15/09/14 13:47:18 DEBUG NativeCodeLoader: Loaded the native-hadoop
    library
    15/09/14 13:47:18 DEBUG DomainSocketWatcher:
    org.apache.hadoop.net.unix.DomainSocketWatcher$2@4ed189cf:
    starting with interruptCheckPeriodMs = 60000
    15/09/14 13:47:18 DEBUG PerformanceAdvisory: Both short-circuit
    local reads and UNIX domain socket are disabled.
    15/09/14 13:47:18 DEBUG DataTransferSaslUtil: DataTransferProtocol
    not using SaslPropertiesResolver, no QOP found in configuration
    for dfs.data.transfer.protection
    15/09/14 13:47:18 INFO deprecation: mapred.tip.id
    <http://mapred.tip.id> is deprecated. Instead, use
    mapreduce.task.id <http://mapreduce.task.id>
    15/09/14 13:47:18 INFO deprecation: mapred.task.id
    <http://mapred.task.id> is deprecated. Instead, use
    mapreduce.task.attempt.id <http://mapreduce.task.attempt.id>
    15/09/14 13:47:18 INFO deprecation: mapred.task.is.map is
    deprecated. Instead, use mapreduce.task.ismap
    15/09/14 13:47:18 INFO deprecation: mapred.task.partition is
    deprecated. Instead, use mapreduce.task.partition
    15/09/14 13:47:18 INFO deprecation: mapred.job.id
    <http://mapred.job.id> is deprecated. Instead, use
    mapreduce.job.id <http://mapreduce.job.id>
    15/09/14 13:47:18 DEBUG Client: The ping interval is 60000 ms.
    15/09/14 13:47:18 DEBUG Client: Connecting to
    mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020>
    15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800
    <tel:%282055067800>) connection to
    mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
    starting, having connections 1
    15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800
    <tel:%282055067800>) connection to
    mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #0
    15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800
    <tel:%282055067800>) connection to
    mesos-1.example.com/10.1.200.165:8020
    <http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
    value #0
    15/09/14 13:47:18 DEBUG ProtobufRpcEngine: Call: getBlockLocations
    took 28ms
    15/09/14 13:47:18 DEBUG DFSClient: newInfo = LocatedBlocks{

--*Adrian Bridgett* | Sysadmin Engineer, OpenSignal

    <http://www.opensignal.com>
    _____________________________________________________
    Office: First Floor, Scriptor Court, 155-157 Farringdon Road,
    Clerkenwell, London, EC1R 3AD
    Phone #: +44 777-377-8251
    Skype: abridgett  |@adrianbridgett
    <http://twitter.com/adrianbridgett>| LinkedIn link
    <https://uk.linkedin.com/in/abridgett>
    _____________________________________________________

--

*Adrian Bridgett* | Sysadmin Engineer, OpenSignal<http://www.opensignal.com>

_____________________________________________________

Office: First Floor, Scriptor Court, 155-157 Farringdon Road,Clerkenwell, London, EC1R 3AD

Phone #: +44 777-377-8251

Skype: abridgett |@adrianbridgett <http://twitter.com/adrianbridgett>|LinkedIn link <https://uk.linkedin.com/in/abridgett>

_____________________________________________________

Re: hdfs-ha on mesos - odd bug

Reply via email to