Re: hdfs-ha on mesos - odd bug

Iulian Dragoș Tue, 15 Sep 2015 09:00:22 -0700

I've seen similar traces, but couldn't track down the failure completely.
You are using Kerberos for your HDFS cluster, right? AFAIK Kerberos isn't
supported in Mesos deployments.


Can you resolve that host name (nameservice1) from the driver machine (ping
nameservice1)? Can it be resolved from the other machines in the cluster?

Does it help if you read using `newAPIHadoopFile` instead of `textFile`?

On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett <adr...@opensignal.com>
wrote:

> I'm hitting an odd issue with running spark on mesos together with
> HA-HDFS, with an even odder workaround.
>
> In particular I get an error that it can't find the HDFS nameservice
> unless I put in a _broken_ url (discovered that workaround by mistake!).
> core-site.xml, hdfs-site.xml is distributed to the slave node - and that
> file is read since I deliberately break the file then I get an error as
> you'd expect.
>
> NB: This is a bit different to
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201402.mbox/%3c1392442185079-1549.p...@n3.nabble.com%3E
>
>
> Spark 1.5.0:
>
> t=sc.textFile("hdfs://nameservice1/tmp/issue")
> t.count()
> (fails)
>
> t=sc.textFile("file://etc/passwd")
> t.count()
> (errors about bad url - should have an extra / of course)
> t=sc.textFile("hdfs://nameservice1/tmp/issue")
> t.count()
> then it works!!!
>
> I should say that using file:///etc/passwd or hdfs:///tmp/issue both fail
> as well.  Unless preceded by a broken url.    I've tried setting
> spark.hadoop.cloneConf to true, no change.
>
> Sample (broken) run:
> 15/09/14 13:00:14 DEBUG HadoopRDD: Creating new JobConf and caching it for
> later re-use
> 15/09/14 13:00:14 DEBUG : address: ip-10-1-200-165/10.1.200.165
> isLoopbackAddress: false, with host 10.1.200.165 ip-10-1-200-165
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:14 DEBUG HAUtil: No HA service delegation token found for
> logical URI hdfs://nameservice1
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:14 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 15/09/14 13:00:14 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
> rpcRequestWrapperClass=class
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@6245f50b
> 15/09/14 13:00:14 DEBUG Client: getting client out of cache:
> org.apache.hadoop.ipc.Client@267f0fd3
> 15/09/14 13:00:14 DEBUG NativeCodeLoader: Trying to load the custom-built
> native-hadoop library...
> 15/09/14 13:00:14 DEBUG NativeCodeLoader: Loaded the native-hadoop library
> ...
> 15/09/14 13:00:14 DEBUG Client: Connecting to
> mesos-1.example.com/10.1.200.165:8020
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having
> connections 1
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0
> 15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getFileInfo took 36ms
> 15/09/14 13:00:14 DEBUG FileInputFormat: Time taken to get FileStatuses: 69
> 15/09/14 13:00:14 INFO FileInputFormat: Total input paths to process : 1
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #1
> 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #1
> 15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getBlockLocations took 1ms
> 15/09/14 13:00:14 DEBUG FileInputFormat: Total # of splits generated by
> getSplits: 2, TimeTaken: 104
> ...
> 15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: closed
> 15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: stopped, remaining
> connections 0
> 15/09/14 13:00:24 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
> message
> AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
> from Actor[akka://sparkDriver/temp/$g]
> 15/09/14 13:00:24 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message:
> AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
> 15/09/14 13:00:24 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled
> message (0.513851 ms)
> AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
> from Actor[akka://sparkDriver/temp/$g]
> 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> 10.1.200.245): java.lang.IllegalArgumentException:
> java.net.UnknownHostException: nameservice1
>     at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>     at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>     at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
>     at
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>     at scala.Option.map(Option.scala:145)
>     at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
>     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
>     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> Caused by: java.net.UnknownHostException: nameservice1
>     ... 32 more
>
>
> Sample working run:
> 15/09/14 13:00:43 DEBUG HadoopRDD: Creating new JobConf and caching it for
> later re-use
> 15/09/14 13:00:43 DEBUG : address: ip-10-1-200-165/10.1.200.165
> isLoopbackAddress: false, with host 10.1.200.165 ip-10-1-200-165
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:43 DEBUG HAUtil: No HA service delegation token found for
> logical URI hdfs://nameservice1
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:00:43 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 15/09/14 13:00:43 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
> rpcRequestWrapperClass=class
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@114b3357
> 15/09/14 13:00:43 DEBUG Client: getting client out of cache:
> org.apache.hadoop.ipc.Client@28a248cd
> 15/09/14 13:00:44 DEBUG NativeCodeLoader: Trying to load the custom-built
> native-hadoop library...
> 15/09/14 13:00:44 DEBUG NativeCodeLoader: Loaded the native-hadoop library
> 15/09/14 13:00:44 DEBUG DomainSocketWatcher:
> org.apache.hadoop.net.unix.DomainSocketWatcher$2@3962387d: starting with
> interruptCheckPeriodMs = 60000
> 15/09/14 13:00:44 DEBUG PerformanceAdvisory: Both short-circuit local
> reads and UNIX domain socket are disabled.
> 15/09/14 13:00:44 DEBUG DataTransferSaslUtil: DataTransferProtocol not
> using SaslPropertiesResolver, no QOP found in configuration for
> dfs.data.transfer.protection
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 1006, in count
>     return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 997, in sum
>     return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 871, in fold
>     vals = self.mapPartitions(func).collect()
>   File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 773, in collect
>     port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File
> "/home/ubuntu/spark15/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>   File "/home/ubuntu/spark15/python/pyspark/sql/utils.py", line 42, in deco
>     raise IllegalArgumentException(s.split(': ', 1)[1])
> pyspark.sql.utils.IllegalArgumentException: Wrong FS: file://etc/passwd,
> expected: file:///
> ...
> 15/09/14 13:00:51 DEBUG HadoopRDD: Creating new JobConf and caching it for
> later re-use
> 15/09/14 13:00:51 DEBUG Client: The ping interval is 60000 ms.
> 15/09/14 13:00:51 DEBUG Client: Connecting to
> mesos-1.example.com/10.1.200.165:8020
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having
> connections 1
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0
> 15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getFileInfo took 32ms
> 15/09/14 13:00:51 DEBUG FileInputFormat: Time taken to get FileStatuses: 64
> 15/09/14 13:00:51 INFO FileInputFormat: Total input paths to process : 1
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #1
> 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #1
> 15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getBlockLocations took 2ms
> 15/09/14 13:00:51 DEBUG FileInputFormat: Total # of splits generated by
> getSplits: 2, TimeTaken: 95
> 2
> (the answer!)
>
>
> The mesos logs are very slightly different (apologies - this was for a
> different run). Notice that dfs.domain.socket.path is blank (or cut-off by
> the exception?) in the broken run.
>
> Broken:
> 15/09/14 13:48:30 DEBUG HadoopRDD: Cloning Hadoop Configuration
> 15/09/14 13:48:30 DEBUG : address: ip-10-1-200-245/10.1.200.245
> isLoopbackAddress: false, with host 10.1.200.245 ip-10-1-200-245
> 15/09/14 13:48:30 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:48:30 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> 15/09/14 13:48:30 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/worker.py",
> line 98, in main
>     command = pickleSer._read_with_length(infile)
>   File
> "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
> line 156, in _read_with_length
>     length = read_int(stream)
>   File
> "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
> line 544, in read_int
>     raise EOFError
> EOFError
>
>     at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
>     at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
>     at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>     at org.apache.spark.scheduler.Task.run(Task.scala:88)
>     at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException:
> java.net.UnknownHostException: nameservice1
>     at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>     at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>     at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
>     at
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at scala.Option.map(Option.scala:145)
>     at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
>     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
>     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> Caused by: java.net.UnknownHostException: nameservice1
>     ... 32 more
> 15/09/14 13:48:30 ERROR PythonRDD: This may have been caused by a prior
> exception:
> java.lang.IllegalArgumentException: java.net.UnknownHostException:
> nameservice1
>     at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>     at
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>     at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>     at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
>     at
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
>     at
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
>     at scala.Option.map(Option.scala:145)
>     at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
>     at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
>     at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
>     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>     at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
> Caused by: java.net.UnknownHostException: nameservice1
>     ... 32 more
>
> Working:
> 15/09/14 13:47:17 DEBUG HadoopRDD: Cloning Hadoop Configuration
> 15/09/14 13:47:17 DEBUG : address: ip-10-1-200-245/10.1.200.245
> isLoopbackAddress: false, with host 10.1.200.245 ip-10-1-200-245
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:47:17 DEBUG HAUtil: No HA service delegation token found for
> logical URI hdfs://nameservice1
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.use.legacy.blockreader.local = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit =
> false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal:
> dfs.client.domain.socket.data.traffic = false
> 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
> /var/run/hdfs-sockets/dn
> 15/09/14 13:47:17 DEBUG RetryUtils: multipleLinearRandomRetry = null
> 15/09/14 13:47:17 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
> rpcRequestWrapperClass=class
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@30b68416
> 15/09/14 13:47:17 DEBUG Client: getting client out of cache:
> org.apache.hadoop.ipc.Client@4599b420
> 15/09/14 13:47:18 DEBUG NativeCodeLoader: Trying to load the custom-built
> native-hadoop library...
> 15/09/14 13:47:18 DEBUG NativeCodeLoader: Loaded the native-hadoop library
> 15/09/14 13:47:18 DEBUG DomainSocketWatcher:
> org.apache.hadoop.net.unix.DomainSocketWatcher$2@4ed189cf: starting with
> interruptCheckPeriodMs = 60000
> 15/09/14 13:47:18 DEBUG PerformanceAdvisory: Both short-circuit local
> reads and UNIX domain socket are disabled.
> 15/09/14 13:47:18 DEBUG DataTransferSaslUtil: DataTransferProtocol not
> using SaslPropertiesResolver, no QOP found in configuration for
> dfs.data.transfer.protection
> 15/09/14 13:47:18 INFO deprecation: mapred.tip.id is deprecated. Instead,
> use mapreduce.task.id
> 15/09/14 13:47:18 INFO deprecation: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> 15/09/14 13:47:18 INFO deprecation: mapred.task.is.map is deprecated.
> Instead, use mapreduce.task.ismap
> 15/09/14 13:47:18 INFO deprecation: mapred.task.partition is deprecated.
> Instead, use mapreduce.task.partition
> 15/09/14 13:47:18 INFO deprecation: mapred.job.id is deprecated. Instead,
> use mapreduce.job.id
> 15/09/14 13:47:18 DEBUG Client: The ping interval is 60000 ms.
> 15/09/14 13:47:18 DEBUG Client: Connecting to
> mesos-1.example.com/10.1.200.165:8020
> 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having
> connections 1
> 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0
> 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to
> mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0
> 15/09/14 13:47:18 DEBUG ProtobufRpcEngine: Call: getBlockLocations took
> 28ms
> 15/09/14 13:47:18 DEBUG DFSClient: newInfo = LocatedBlocks{
>
>
> --
> *Adrian Bridgett* |  Sysadmin Engineer, OpenSignal
> <http://www.opensignal.com>
> _____________________________________________________
> Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Clerkenwell,
> London, EC1R 3AD
> Phone #: +44 777-377-8251
> Skype: abridgett  |  @adrianbridgett <http://twitter.com/adrianbridgett>  |
>  LinkedIn link  <https://uk.linkedin.com/in/abridgett>
> _____________________________________________________
>



-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Re: hdfs-ha on mesos - odd bug

Reply via email to