I don't know about the broken url. But are you running HDFS as a mesos framework? If so is it using mesos-dns? Then you should resolve the namenode via hdfs://<activenamenode:8020>/ ....
On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett <adr...@opensignal.com> wrote: > I'm hitting an odd issue with running spark on mesos together with > HA-HDFS, with an even odder workaround. > > In particular I get an error that it can't find the HDFS nameservice > unless I put in a _broken_ url (discovered that workaround by mistake!). > core-site.xml, hdfs-site.xml is distributed to the slave node - and that > file is read since I deliberately break the file then I get an error as > you'd expect. > > NB: This is a bit different to > http://mail-archives.us.apache.org/mod_mbox/spark-user/201402.mbox/%3c1392442185079-1549.p...@n3.nabble.com%3E > > > Spark 1.5.0: > > t=sc.textFile("hdfs://nameservice1/tmp/issue") > t.count() > (fails) > > t=sc.textFile("file://etc/passwd") > t.count() > (errors about bad url - should have an extra / of course) > t=sc.textFile("hdfs://nameservice1/tmp/issue") > t.count() > then it works!!! > > I should say that using file:///etc/passwd or hdfs:///tmp/issue both fail > as well. Unless preceded by a broken url. I've tried setting > spark.hadoop.cloneConf to true, no change. > > Sample (broken) run: > 15/09/14 13:00:14 DEBUG HadoopRDD: Creating new JobConf and caching it for > later re-use > 15/09/14 13:00:14 DEBUG : address: ip-10-1-200-165/10.1.200.165 > isLoopbackAddress: false, with host 10.1.200.165 ip-10-1-200-165 > 15/09/14 13:00:14 DEBUG BlockReaderLocal: > dfs.client.use.legacy.blockreader.local = false > 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = > false > 15/09/14 13:00:14 DEBUG BlockReaderLocal: > dfs.client.domain.socket.data.traffic = false > 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path = > /var/run/hdfs-sockets/dn > 15/09/14 13:00:14 DEBUG HAUtil: No HA service delegation token found for > logical URI hdfs://nameservice1 > 15/09/14 13:00:14 DEBUG BlockReaderLocal: > dfs.client.use.legacy.blockreader.local = false > 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = > false > 15/09/14 13:00:14 DEBUG BlockReaderLocal: > dfs.client.domain.socket.data.traffic = false > 15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path = > /var/run/hdfs-sockets/dn > 15/09/14 13:00:14 DEBUG RetryUtils: multipleLinearRandomRetry = null > 15/09/14 13:00:14 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER, > rpcRequestWrapperClass=class > org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, > rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@6245f50b > 15/09/14 13:00:14 DEBUG Client: getting client out of cache: > org.apache.hadoop.ipc.Client@267f0fd3 > 15/09/14 13:00:14 DEBUG NativeCodeLoader: Trying to load the custom-built > native-hadoop library... > 15/09/14 13:00:14 DEBUG NativeCodeLoader: Loaded the native-hadoop library > ... > 15/09/14 13:00:14 DEBUG Client: Connecting to > mesos-1.example.com/10.1.200.165:8020 > 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having > connections 1 > 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0 > 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0 > 15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getFileInfo took 36ms > 15/09/14 13:00:14 DEBUG FileInputFormat: Time taken to get FileStatuses: 69 > 15/09/14 13:00:14 INFO FileInputFormat: Total input paths to process : 1 > 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #1 > 15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #1 > 15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getBlockLocations took 1ms > 15/09/14 13:00:14 DEBUG FileInputFormat: Total # of splits generated by > getSplits: 2, TimeTaken: 104 > ... > 15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu: closed > 15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu: stopped, remaining > connections 0 > 15/09/14 13:00:24 DEBUG > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received > message > AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true) > from Actor[akka://sparkDriver/temp/$g] > 15/09/14 13:00:24 DEBUG > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC message: > AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true) > 15/09/14 13:00:24 DEBUG > AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] handled > message (0.513851 ms) > AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true) > from Actor[akka://sparkDriver/temp/$g] > 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > 10.1.200.245): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208) > Caused by: java.net.UnknownHostException: nameservice1 > ... 32 more > > > Sample working run: > 15/09/14 13:00:43 DEBUG HadoopRDD: Creating new JobConf and caching it for > later re-use > 15/09/14 13:00:43 DEBUG : address: ip-10-1-200-165/10.1.200.165 > isLoopbackAddress: false, with host 10.1.200.165 ip-10-1-200-165 > 15/09/14 13:00:43 DEBUG BlockReaderLocal: > dfs.client.use.legacy.blockreader.local = false > 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = > false > 15/09/14 13:00:43 DEBUG BlockReaderLocal: > dfs.client.domain.socket.data.traffic = false > 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path = > /var/run/hdfs-sockets/dn > 15/09/14 13:00:43 DEBUG HAUtil: No HA service delegation token found for > logical URI hdfs://nameservice1 > 15/09/14 13:00:43 DEBUG BlockReaderLocal: > dfs.client.use.legacy.blockreader.local = false > 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = > false > 15/09/14 13:00:43 DEBUG BlockReaderLocal: > dfs.client.domain.socket.data.traffic = false > 15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path = > /var/run/hdfs-sockets/dn > 15/09/14 13:00:43 DEBUG RetryUtils: multipleLinearRandomRetry = null > 15/09/14 13:00:43 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER, > rpcRequestWrapperClass=class > org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, > rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@114b3357 > 15/09/14 13:00:43 DEBUG Client: getting client out of cache: > org.apache.hadoop.ipc.Client@28a248cd > 15/09/14 13:00:44 DEBUG NativeCodeLoader: Trying to load the custom-built > native-hadoop library... > 15/09/14 13:00:44 DEBUG NativeCodeLoader: Loaded the native-hadoop library > 15/09/14 13:00:44 DEBUG DomainSocketWatcher: > org.apache.hadoop.net.unix.DomainSocketWatcher$2@3962387d: starting with > interruptCheckPeriodMs = 60000 > 15/09/14 13:00:44 DEBUG PerformanceAdvisory: Both short-circuit local > reads and UNIX domain socket are disabled. > 15/09/14 13:00:44 DEBUG DataTransferSaslUtil: DataTransferProtocol not > using SaslPropertiesResolver, no QOP found in configuration for > dfs.data.transfer.protection > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 1006, in count > return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() > File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 997, in sum > return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add) > File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 871, in fold > vals = self.mapPartitions(func).collect() > File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 773, in collect > port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File > "/home/ubuntu/spark15/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File "/home/ubuntu/spark15/python/pyspark/sql/utils.py", line 42, in deco > raise IllegalArgumentException(s.split(': ', 1)[1]) > pyspark.sql.utils.IllegalArgumentException: Wrong FS: file://etc/passwd, > expected: file:/// > ... > 15/09/14 13:00:51 DEBUG HadoopRDD: Creating new JobConf and caching it for > later re-use > 15/09/14 13:00:51 DEBUG Client: The ping interval is 60000 ms. > 15/09/14 13:00:51 DEBUG Client: Connecting to > mesos-1.example.com/10.1.200.165:8020 > 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having > connections 1 > 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0 > 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0 > 15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getFileInfo took 32ms > 15/09/14 13:00:51 DEBUG FileInputFormat: Time taken to get FileStatuses: 64 > 15/09/14 13:00:51 INFO FileInputFormat: Total input paths to process : 1 > 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #1 > 15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #1 > 15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getBlockLocations took 2ms > 15/09/14 13:00:51 DEBUG FileInputFormat: Total # of splits generated by > getSplits: 2, TimeTaken: 95 > 2 > (the answer!) > > > The mesos logs are very slightly different (apologies - this was for a > different run). Notice that dfs.domain.socket.path is blank (or cut-off by > the exception?) in the broken run. > > Broken: > 15/09/14 13:48:30 DEBUG HadoopRDD: Cloning Hadoop Configuration > 15/09/14 13:48:30 DEBUG : address: ip-10-1-200-245/10.1.200.245 > isLoopbackAddress: false, with host 10.1.200.245 ip-10-1-200-245 > 15/09/14 13:48:30 DEBUG BlockReaderLocal: > dfs.client.use.legacy.blockreader.local = false > 15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = > false > 15/09/14 13:48:30 DEBUG BlockReaderLocal: > dfs.client.domain.socket.data.traffic = false > 15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.domain.socket.path = > 15/09/14 13:48:30 ERROR PythonRDD: Python worker exited unexpectedly > (crashed) > org.apache.spark.api.python.PythonException: Traceback (most recent call > last): > File > "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/worker.py", > line 98, in main > command = pickleSer._read_with_length(infile) > File > "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py", > line 156, in _read_with_length > length = read_int(stream) > File > "/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py", > line 544, in read_int > raise EOFError > EOFError > > at > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138) > at > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208) > Caused by: java.net.UnknownHostException: nameservice1 > ... 32 more > 15/09/14 13:48:30 ERROR PythonRDD: This may have been caused by a prior > exception: > java.lang.IllegalArgumentException: java.net.UnknownHostException: > nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208) > Caused by: java.net.UnknownHostException: nameservice1 > ... 32 more > > Working: > 15/09/14 13:47:17 DEBUG HadoopRDD: Cloning Hadoop Configuration > 15/09/14 13:47:17 DEBUG : address: ip-10-1-200-245/10.1.200.245 > isLoopbackAddress: false, with host 10.1.200.245 ip-10-1-200-245 > 15/09/14 13:47:17 DEBUG BlockReaderLocal: > dfs.client.use.legacy.blockreader.local = false > 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = > false > 15/09/14 13:47:17 DEBUG BlockReaderLocal: > dfs.client.domain.socket.data.traffic = false > 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path = > /var/run/hdfs-sockets/dn > 15/09/14 13:47:17 DEBUG HAUtil: No HA service delegation token found for > logical URI hdfs://nameservice1 > 15/09/14 13:47:17 DEBUG BlockReaderLocal: > dfs.client.use.legacy.blockreader.local = false > 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = > false > 15/09/14 13:47:17 DEBUG BlockReaderLocal: > dfs.client.domain.socket.data.traffic = false > 15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path = > /var/run/hdfs-sockets/dn > 15/09/14 13:47:17 DEBUG RetryUtils: multipleLinearRandomRetry = null > 15/09/14 13:47:17 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER, > rpcRequestWrapperClass=class > org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, > rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@30b68416 > 15/09/14 13:47:17 DEBUG Client: getting client out of cache: > org.apache.hadoop.ipc.Client@4599b420 > 15/09/14 13:47:18 DEBUG NativeCodeLoader: Trying to load the custom-built > native-hadoop library... > 15/09/14 13:47:18 DEBUG NativeCodeLoader: Loaded the native-hadoop library > 15/09/14 13:47:18 DEBUG DomainSocketWatcher: > org.apache.hadoop.net.unix.DomainSocketWatcher$2@4ed189cf: starting with > interruptCheckPeriodMs = 60000 > 15/09/14 13:47:18 DEBUG PerformanceAdvisory: Both short-circuit local > reads and UNIX domain socket are disabled. > 15/09/14 13:47:18 DEBUG DataTransferSaslUtil: DataTransferProtocol not > using SaslPropertiesResolver, no QOP found in configuration for > dfs.data.transfer.protection > 15/09/14 13:47:18 INFO deprecation: mapred.tip.id is deprecated. Instead, > use mapreduce.task.id > 15/09/14 13:47:18 INFO deprecation: mapred.task.id is deprecated. > Instead, use mapreduce.task.attempt.id > 15/09/14 13:47:18 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 15/09/14 13:47:18 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 15/09/14 13:47:18 INFO deprecation: mapred.job.id is deprecated. Instead, > use mapreduce.job.id > 15/09/14 13:47:18 DEBUG Client: The ping interval is 60000 ms. > 15/09/14 13:47:18 DEBUG Client: Connecting to > mesos-1.example.com/10.1.200.165:8020 > 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu: starting, having > connections 1 > 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0 > 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to > mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0 > 15/09/14 13:47:18 DEBUG ProtobufRpcEngine: Call: getBlockLocations took > 28ms > 15/09/14 13:47:18 DEBUG DFSClient: newInfo = LocatedBlocks{ > > > -- > *Adrian Bridgett* | Sysadmin Engineer, OpenSignal > <http://www.opensignal.com> > _____________________________________________________ > Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Clerkenwell, > London, EC1R 3AD > Phone #: +44 777-377-8251 > Skype: abridgett | @adrianbridgett <http://twitter.com/adrianbridgett> | > LinkedIn link <https://uk.linkedin.com/in/abridgett> > _____________________________________________________ >