Re: hdfs-ha on mesos - odd bug

Sam Bessalah Mon, 14 Sep 2015 14:29:45 -0700

I don't know about the broken url. But are you running HDFS as a mesos
framework? If so is it using mesos-dns?
Then you should resolve the namenode via hdfs://<activenamenode:8020>/ ....


On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett <adr...@opensignal.com>
wrote:

> I'm hitting an odd issue with running spark on mesos together with
> HA-HDFS, with an even odder workaround.
>
> In particular I get an error that it can't find the HDFS nameservice
> unless I put in a _broken_ url (discovered that workaround by mistake!).
> core-site.xml, hdfs-site.xml is distributed to the slave node - and that
> file is read since I deliberately break the file then I get an error as
> you'd expect.
>
> NB: This is a bit different to
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201402.mbox/%3c1392442185079-1549.p...@n3.nabble.com%3E
>
>
> Spark 1.5.0:
>
> t=sc.textFile("hdfs://nameservice1/tmp/issue")
> t.count()
> (fails)
>
> t=sc.textFile("file://etc/passwd")
> t.count()
> (errors about bad url - should have an extra / of course)
> t=sc.textFile("hdfs://nameservice1/tmp/issue")
> t.count()
> then it works!!!
>
> I should say that using file:///etc/passwd or hdfs:///tmp/issue both fail
> as well.  Unless preceded by a broken url.    I've tried setting
> spark.hadoop.cloneConf to true, no change.
>
> Sample (broken) run:
> 15/09/14 13:00:14 DEBUG HadoopRDD: Creating new JobConf and caching it for
> later re-use
> 15/09/14 13:00:14 DEBUG : address: ip-10-1-200-165/10.1.200.165
> isLoopbackAddress: false, with host 10.1.200.165 ip-10-1-200-165
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:14 DEBUG HAUtil: No HA service delegation token found for
> logical URI hdfs://nameservice1
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:14 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 15/09/14 13:00:14 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
> rpcRequestWrapperClass=class
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@6245f50b
> 15/09/14 13:00:14 DEBUG Client: getting client out of cache:
> org.apache.hadoop.ipc.Client@267f0fd3
> 15/09/14 13:00:14 DEBUG NativeCodeLoader: Trying to load the custom-built
> native-hadoop library...
> 15/09/14 13:00:14 DEBUG NativeCodeLoader: Loaded the native-hadoop library
> ...
> 15/09/14 13:00:14 DEBUG Client: Connecting to
> mesos-1.example.com/10.1.200.165:8020
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having
> connections 1
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0
> 15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getFileInfo took 36ms
> 15/09/14 13:00:14 DEBUG FileInputFormat: Time taken to get FileStatuses: 69
> 15/09/14 13:00:14 INFO FileInputFormat: Total input paths to process : 1
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #1
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #1
> 15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getBlockLocations took 1ms
> 15/09/14 13:00:14 DEBUG FileInputFormat: Total # of splits generated by
> getSplits: 2, TimeTaken: 104
> ...
> 15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: closed
> 15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: stopped, remaining
> connections 0
> 15/09/14 13:00:24 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
> message
> AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
> from Actor[akka://sparkDriver/temp/$g]
> 15/09/14 13:00:24 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
> AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
> 15/09/14 13:00:24 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
> message (0.513851 ms)
> AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
> from Actor[akka://sparkDriver/temp/$g]
> 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> 10.1.200.245): java.lang.IllegalArgumentException:
> java.net.UnknownHostException: nameservice1
>     at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>     at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>     at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
>     at
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>     at scala.Option.map(Option.scala:145)
>     at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
>     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
>     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> Caused by: java.net.UnknownHostException: nameservice1
>     ... 32 more
>
>
> Sample working run:
> 15/09/14 13:00:43 DEBUG HadoopRDD: Creating new JobConf and caching it for
> later re-use
> 15/09/14 13:00:43 DEBUG : address: ip-10-1-200-165/10.1.200.165
> isLoopbackAddress: false, with host 10.1.200.165 ip-10-1-200-165
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:43 DEBUG HAUtil: No HA service delegation token found for
> logical URI hdfs://nameservice1
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:43 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 15/09/14 13:00:43 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
> rpcRequestWrapperClass=class
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@114b3357
> 15/09/14 13:00:43 DEBUG Client: getting client out of cache:
> org.apache.hadoop.ipc.Client@28a248cd
> 15/09/14 13:00:44 DEBUG NativeCodeLoader: Trying to load the custom-built
> native-hadoop library...
> 15/09/14 13:00:44 DEBUG NativeCodeLoader: Loaded the native-hadoop library
> 15/09/14 13:00:44 DEBUG DomainSocketWatcher:
> org.apache.hadoop.net.unix.DomainSocketWatcher$2@3962387d: starting with
> interruptCheckPeriodMs = 60000
> 15/09/14 13:00:44 DEBUG PerformanceAdvisory: Both short-circuit local
> reads and UNIX domain socket are disabled.
> 15/09/14 13:00:44 DEBUG DataTransferSaslUtil: DataTransferProtocol not
> using SaslPropertiesResolver, no QOP found in configuration for
> dfs.data.transfer.protection
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 1006, in count
>     return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 997, in sum
>     return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 871, in fold
>     vals = self.mapPartitions(func).collect()
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 773, in collect
>     port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File
> "/home/ubuntu/spark15/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>   File "/home/ubuntu/spark15/python/pyspark/sql/utils.py", line 42, in deco
>     raise IllegalArgumentException(s.split(': ', 1)[1])
> pyspark.sql.utils.IllegalArgumentException: Wrong FS: file://etc/passwd,
> expected: file:///
> ...
> 15/09/14 13:00:51 DEBUG HadoopRDD: Creating new JobConf and caching it for
> later re-use
> 15/09/14 13:00:51 DEBUG Client: The ping interval is 60000 ms.
> 15/09/14 13:00:51 DEBUG Client: Connecting to
> mesos-1.example.com/10.1.200.165:8020
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having
> connections 1
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0
> 15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getFileInfo took 32ms
> 15/09/14 13:00:51 DEBUG FileInputFormat: Time taken to get FileStatuses: 64
> 15/09/14 13:00:51 INFO FileInputFormat: Total input paths to process : 1
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #1
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #1
> 15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getBlockLocations took 2ms
> 15/09/14 13:00:51 DEBUG FileInputFormat: Total # of splits generated by
> getSplits: 2, TimeTaken: 95
> 2
> (the answer!)
>
>
> The mesos logs are very slightly different (apologies - this was for a
> different run). Notice that dfs.domain.socket.path is blank (or cut-off by
> the exception?) in the broken run.
>
> Broken:
> 15/09/14 13:48:30 DEBUG HadoopRDD: Cloning Hadoop Configuration
> 15/09/14 13:48:30 DEBUG : address: ip-10-1-200-245/10.1.200.245
> isLoopbackAddress: false, with host 10.1.200.245 ip-10-1-200-245
> 15/09/14 13:48:30 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:48:30 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> 15/09/14 13:48:30 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/worker.py",
> line 98, in main
>     command = pickleSer._read_with_length(infile)
>   File
> "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
> line 156, in _read_with_length
>     length = read_int(stream)
>   File
> "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
> line 544, in read_int
>     raise EOFError
> EOFError
>
>     at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
>     at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
>     at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>     at org.apache.spark.scheduler.Task.run(Task.scala:88)
>     at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException:
> java.net.UnknownHostException: nameservice1
>     at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>     at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>     at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
>     at
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at scala.Option.map(Option.scala:145)
>     at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
>     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
>     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> Caused by: java.net.UnknownHostException: nameservice1
>     ... 32 more
> 15/09/14 13:48:30 ERROR PythonRDD: This may have been caused by a prior
> exception:
> java.lang.IllegalArgumentException: java.net.UnknownHostException:
> nameservice1
>     at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>     at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>     at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
>     at
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at scala.Option.map(Option.scala:145)
>     at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
>     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
>     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> Caused by: java.net.UnknownHostException: nameservice1
>     ... 32 more
>
> Working:
> 15/09/14 13:47:17 DEBUG HadoopRDD: Cloning Hadoop Configuration
> 15/09/14 13:47:17 DEBUG : address: ip-10-1-200-245/10.1.200.245
> isLoopbackAddress: false, with host 10.1.200.245 ip-10-1-200-245
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:47:17 DEBUG HAUtil: No HA service delegation token found for
> logical URI hdfs://nameservice1
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:47:17 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 15/09/14 13:47:17 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
> rpcRequestWrapperClass=class
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@30b68416
> 15/09/14 13:47:17 DEBUG Client: getting client out of cache:
> org.apache.hadoop.ipc.Client@4599b420
> 15/09/14 13:47:18 DEBUG NativeCodeLoader: Trying to load the custom-built
> native-hadoop library...
> 15/09/14 13:47:18 DEBUG NativeCodeLoader: Loaded the native-hadoop library
> 15/09/14 13:47:18 DEBUG DomainSocketWatcher:
> org.apache.hadoop.net.unix.DomainSocketWatcher$2@4ed189cf: starting with
> interruptCheckPeriodMs = 60000
> 15/09/14 13:47:18 DEBUG PerformanceAdvisory: Both short-circuit local
> reads and UNIX domain socket are disabled.
> 15/09/14 13:47:18 DEBUG DataTransferSaslUtil: DataTransferProtocol not
> using SaslPropertiesResolver, no QOP found in configuration for
> dfs.data.transfer.protection
> 15/09/14 13:47:18 INFO deprecation: mapred.tip.id is deprecated. Instead,
> use mapreduce.task.id
> 15/09/14 13:47:18 INFO deprecation: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> 15/09/14 13:47:18 INFO deprecation: mapred.task.is.map is deprecated.
> Instead, use mapreduce.task.ismap
> 15/09/14 13:47:18 INFO deprecation: mapred.task.partition is deprecated.
> Instead, use mapreduce.task.partition
> 15/09/14 13:47:18 INFO deprecation: mapred.job.id is deprecated. Instead,
> use mapreduce.job.id
> 15/09/14 13:47:18 DEBUG Client: The ping interval is 60000 ms.
> 15/09/14 13:47:18 DEBUG Client: Connecting to
> mesos-1.example.com/10.1.200.165:8020
> 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having
> connections 1
> 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0
> 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0
> 15/09/14 13:47:18 DEBUG ProtobufRpcEngine: Call: getBlockLocations took
> 28ms
> 15/09/14 13:47:18 DEBUG DFSClient: newInfo = LocatedBlocks{
>
>
> --
> *Adrian Bridgett* |  Sysadmin Engineer, OpenSignal
> <http://www.opensignal.com>
> _____________________________________________________
> Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Clerkenwell,
> London, EC1R 3AD
> Phone #: +44 777-377-8251
> Skype: abridgett  |  @adrianbridgett <http://twitter.com/adrianbridgett>  |
>  LinkedIn link  <https://uk.linkedin.com/in/abridgett>
> _____________________________________________________
>

Re: hdfs-ha on mesos - odd bug

Reply via email to