Hi Sam, in short, no, it's a traditional install as we plan to use spot
instances and didn't want price spikes to kill off HDFS.
We're actually doing a bit of a hybrid, using spot instances for the
mesos slaves, ondemand for the mesos masters. So for the time being,
putting hdfs on the masters (we'll probably move to multiple slave
instance types to avoid losing too many when spot price spikes, but for
now this is acceptable). Masters running CDH5.
Using hdfs://current-hdfs-master:8020 works fine, however using
hdfs://nameservice1 fails in the rather odd way described (well, more
that the workaround actually works!) I think there's some underlying
bug here that's being exposed.
On 14/09/2015 22:27, Sam Bessalah wrote:
I don't know about the broken url. But are you running HDFS as a mesos
framework? If so is it using mesos-dns?
Then you should resolve the namenode via hdfs://<activenamenode:8020>/
....
On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett
<adr...@opensignal.com <mailto:adr...@opensignal.com>> wrote:
I'm hitting an odd issue with running spark on mesos together with
HA-HDFS, with an even odder workaround.
In particular I get an error that it can't find the HDFS
nameservice unless I put in a _broken_ url (discovered that
workaround by mistake!). core-site.xml, hdfs-site.xml is
distributed to the slave node - and that file is read since I
deliberately break the file then I get an error as you'd expect.
NB: This is a bit different to
http://mail-archives.us.apache.org/mod_mbox/spark-user/201402.mbox/%3c1392442185079-1549.p...@n3.nabble.com%3E
Spark 1.5.0:
t=sc.textFile("hdfs://nameservice1/tmp/issue")
t.count()
(fails)
t=sc.textFile("file://etc/passwd")
t.count()
(errors about bad url - should have an extra / of course)
t=sc.textFile("hdfs://nameservice1/tmp/issue")
t.count()
then it works!!!
I should say that using file:///etc/passwd or hdfs:///tmp/issue
both fail as well. Unless preceded by a broken url. I've tried
setting spark.hadoop.cloneConf to true, no change.
Sample (broken) run:
15/09/14 13:00:14 DEBUG HadoopRDD: Creating new JobConf and
caching it for later re-use
15/09/14 13:00:14 DEBUG : address: ip-10-1-200-165/10.1.200.165
<http://10.1.200.165> isLoopbackAddress: false, with host
10.1.200.165 ip-10-1-200-165
15/09/14 13:00:14 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
15/09/14 13:00:14 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
15/09/14 13:00:14 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
/var/run/hdfs-sockets/dn
15/09/14 13:00:14 DEBUG HAUtil: No HA service delegation token
found for logical URI hdfs://nameservice1
15/09/14 13:00:14 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
15/09/14 13:00:14 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
15/09/14 13:00:14 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
15/09/14 13:00:14 DEBUG BlockReaderLocal: dfs.domain.socket.path =
/var/run/hdfs-sockets/dn
15/09/14 13:00:14 DEBUG RetryUtils: multipleLinearRandomRetry = null
15/09/14 13:00:14 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
rpcRequestWrapperClass=class
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@6245f50b
15/09/14 13:00:14 DEBUG Client: getting client out of cache:
org.apache.hadoop.ipc.Client@267f0fd3
15/09/14 13:00:14 DEBUG NativeCodeLoader: Trying to load the
custom-built native-hadoop library...
15/09/14 13:00:14 DEBUG NativeCodeLoader: Loaded the native-hadoop
library
...
15/09/14 13:00:14 DEBUG Client: Connecting to
mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020>
15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
starting, having connections 1
15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #0
15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
value #0
15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getFileInfo took 36ms
15/09/14 13:00:14 DEBUG FileInputFormat: Time taken to get
FileStatuses: 69
15/09/14 13:00:14 INFO FileInputFormat: Total input paths to
process : 1
15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #1
15/09/14 13:00:14 DEBUG Client: IPC Client (1739425103) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
value #1
15/09/14 13:00:14 DEBUG ProtobufRpcEngine: Call: getBlockLocations
took 1ms
15/09/14 13:00:14 DEBUG FileInputFormat: Total # of splits
generated by getSplits: 2, TimeTaken: 104
...
15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu: closed
15/09/14 13:00:24 DEBUG Client: IPC Client (1739425103) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
stopped, remaining connections 0
15/09/14 13:00:24 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor]
received message
AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
from Actor[akka://sparkDriver/temp/$g]
15/09/14 13:00:24 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: Received RPC
message:
AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
15/09/14 13:00:24 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor]
handled message (0.513851 ms)
AkkaMessage(ExecutorRemoved(20150826-133446-3217621258-5050-4064-S1),true)
from Actor[akka://sparkDriver/temp/$g]
15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0
(TID 0, 10.1.200.245): java.lang.IllegalArgumentException:
java.net.UnknownHostException: nameservice1
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
Caused by: java.net.UnknownHostException: nameservice1
... 32 more
Sample working run:
15/09/14 13:00:43 DEBUG HadoopRDD: Creating new JobConf and
caching it for later re-use
15/09/14 13:00:43 DEBUG : address: ip-10-1-200-165/10.1.200.165
<http://10.1.200.165> isLoopbackAddress: false, with host
10.1.200.165 ip-10-1-200-165
15/09/14 13:00:43 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
15/09/14 13:00:43 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
15/09/14 13:00:43 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
/var/run/hdfs-sockets/dn
15/09/14 13:00:43 DEBUG HAUtil: No HA service delegation token
found for logical URI hdfs://nameservice1
15/09/14 13:00:43 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
15/09/14 13:00:43 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
15/09/14 13:00:43 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
15/09/14 13:00:43 DEBUG BlockReaderLocal: dfs.domain.socket.path =
/var/run/hdfs-sockets/dn
15/09/14 13:00:43 DEBUG RetryUtils: multipleLinearRandomRetry = null
15/09/14 13:00:43 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
rpcRequestWrapperClass=class
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@114b3357
15/09/14 13:00:43 DEBUG Client: getting client out of cache:
org.apache.hadoop.ipc.Client@28a248cd
15/09/14 13:00:44 DEBUG NativeCodeLoader: Trying to load the
custom-built native-hadoop library...
15/09/14 13:00:44 DEBUG NativeCodeLoader: Loaded the native-hadoop
library
15/09/14 13:00:44 DEBUG DomainSocketWatcher:
org.apache.hadoop.net.unix.DomainSocketWatcher$2@3962387d:
starting with interruptCheckPeriodMs = 60000
15/09/14 13:00:44 DEBUG PerformanceAdvisory: Both short-circuit
local reads and UNIX domain socket are disabled.
15/09/14 13:00:44 DEBUG DataTransferSaslUtil: DataTransferProtocol
not using SaslPropertiesResolver, no QOP found in configuration
for dfs.data.transfer.protection
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 1006, in
count
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 997, in sum
return self.mapPartitions(lambda x: [sum(x)]).fold(0,
operator.add)
File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 871, in fold
vals = self.mapPartitions(func).collect()
File "/home/ubuntu/spark15/python/pyspark/rdd.py", line 773, in
collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File
"/home/ubuntu/spark15/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
File "/home/ubuntu/spark15/python/pyspark/sql/utils.py", line
42, in deco
raise IllegalArgumentException(s.split(': ', 1)[1])
pyspark.sql.utils.IllegalArgumentException: Wrong FS:
file://etc/passwd, expected: file:///
...
15/09/14 13:00:51 DEBUG HadoopRDD: Creating new JobConf and
caching it for later re-use
15/09/14 13:00:51 DEBUG Client: The ping interval is 60000 ms.
15/09/14 13:00:51 DEBUG Client: Connecting to
mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020>
15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
starting, having connections 1
15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #0
15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
value #0
15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getFileInfo took 32ms
15/09/14 13:00:51 DEBUG FileInputFormat: Time taken to get
FileStatuses: 64
15/09/14 13:00:51 INFO FileInputFormat: Total input paths to
process : 1
15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #1
15/09/14 13:00:51 DEBUG Client: IPC Client (24266793) connection
to mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
value #1
15/09/14 13:00:51 DEBUG ProtobufRpcEngine: Call: getBlockLocations
took 2ms
15/09/14 13:00:51 DEBUG FileInputFormat: Total # of splits
generated by getSplits: 2, TimeTaken: 95
2
(the answer!)
The mesos logs are very slightly different (apologies - this was
for a different run). Notice that dfs.domain.socket.path is blank
(or cut-off by the exception?) in the broken run.
Broken:
15/09/14 13:48:30 DEBUG HadoopRDD: Cloning Hadoop Configuration
15/09/14 13:48:30 DEBUG : address: ip-10-1-200-245/10.1.200.245
<http://10.1.200.245> isLoopbackAddress: false, with host
10.1.200.245 ip-10-1-200-245
15/09/14 13:48:30 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
15/09/14 13:48:30 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
15/09/14 13:48:30 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
15/09/14 13:48:30 DEBUG BlockReaderLocal: dfs.domain.socket.path =
15/09/14 13:48:30 ERROR PythonRDD: Python worker exited
unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most
recent call last):
File
"/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/worker.py",
line 98, in main
command = pickleSer._read_with_length(infile)
File
"/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
line 156, in _read_with_length
length = read_int(stream)
File
"/tmp/mesos/slaves/20150826-133446-3217621258-5050-4064-S2/frameworks/20150826-133446-3217621258-5050-4064-216556/executors/20150826-133446-3217621258-5050-4064-S2/runs/b31501ae-22d0-47dd-b4b6-2fb17717e1f8/spark15/python/lib/pyspark.zip/pyspark/serializers.py",
line 544, in read_int
raise EOFError
EOFError
at
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:138)
at
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:179)
at
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException:
java.net.UnknownHostException: nameservice1
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
Caused by: java.net.UnknownHostException: nameservice1
... 32 more
15/09/14 13:48:30 ERROR PythonRDD: This may have been caused by a
prior exception:
java.lang.IllegalArgumentException: java.net.UnknownHostException:
nameservice1
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at
org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$2.apply(HadoopRDD.scala:157)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:157)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:249)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
Caused by: java.net.UnknownHostException: nameservice1
... 32 more
Working:
15/09/14 13:47:17 DEBUG HadoopRDD: Cloning Hadoop Configuration
15/09/14 13:47:17 DEBUG : address: ip-10-1-200-245/10.1.200.245
<http://10.1.200.245> isLoopbackAddress: false, with host
10.1.200.245 ip-10-1-200-245
15/09/14 13:47:17 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
15/09/14 13:47:17 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
15/09/14 13:47:17 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
/var/run/hdfs-sockets/dn
15/09/14 13:47:17 DEBUG HAUtil: No HA service delegation token
found for logical URI hdfs://nameservice1
15/09/14 13:47:17 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
15/09/14 13:47:17 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
15/09/14 13:47:17 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
15/09/14 13:47:17 DEBUG BlockReaderLocal: dfs.domain.socket.path =
/var/run/hdfs-sockets/dn
15/09/14 13:47:17 DEBUG RetryUtils: multipleLinearRandomRetry = null
15/09/14 13:47:17 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
rpcRequestWrapperClass=class
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@30b68416
15/09/14 13:47:17 DEBUG Client: getting client out of cache:
org.apache.hadoop.ipc.Client@4599b420
15/09/14 13:47:18 DEBUG NativeCodeLoader: Trying to load the
custom-built native-hadoop library...
15/09/14 13:47:18 DEBUG NativeCodeLoader: Loaded the native-hadoop
library
15/09/14 13:47:18 DEBUG DomainSocketWatcher:
org.apache.hadoop.net.unix.DomainSocketWatcher$2@4ed189cf:
starting with interruptCheckPeriodMs = 60000
15/09/14 13:47:18 DEBUG PerformanceAdvisory: Both short-circuit
local reads and UNIX domain socket are disabled.
15/09/14 13:47:18 DEBUG DataTransferSaslUtil: DataTransferProtocol
not using SaslPropertiesResolver, no QOP found in configuration
for dfs.data.transfer.protection
15/09/14 13:47:18 INFO deprecation: mapred.tip.id
<http://mapred.tip.id> is deprecated. Instead, use
mapreduce.task.id <http://mapreduce.task.id>
15/09/14 13:47:18 INFO deprecation: mapred.task.id
<http://mapred.task.id> is deprecated. Instead, use
mapreduce.task.attempt.id <http://mapreduce.task.attempt.id>
15/09/14 13:47:18 INFO deprecation: mapred.task.is.map is
deprecated. Instead, use mapreduce.task.ismap
15/09/14 13:47:18 INFO deprecation: mapred.task.partition is
deprecated. Instead, use mapreduce.task.partition
15/09/14 13:47:18 INFO deprecation: mapred.job.id
<http://mapred.job.id> is deprecated. Instead, use
mapreduce.job.id <http://mapreduce.job.id>
15/09/14 13:47:18 DEBUG Client: The ping interval is 60000 ms.
15/09/14 13:47:18 DEBUG Client: Connecting to
mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020>
15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800
<tel:%282055067800>) connection to
mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu:
starting, having connections 1
15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800
<tel:%282055067800>) connection to
mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu sending #0
15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800
<tel:%282055067800>) connection to
mesos-1.example.com/10.1.200.165:8020
<http://mesos-1.example.com/10.1.200.165:8020> from ubuntu got
value #0
15/09/14 13:47:18 DEBUG ProtobufRpcEngine: Call: getBlockLocations
took 28ms
15/09/14 13:47:18 DEBUG DFSClient: newInfo = LocatedBlocks{
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal
<http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road,
Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett
<http://twitter.com/adrianbridgett>| LinkedIn link
<https://uk.linkedin.com/in/abridgett>
_____________________________________________________
--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal
<http://www.opensignal.com>
_____________________________________________________
Office: First Floor, Scriptor Court, 155-157 Farringdon Road,
Clerkenwell, London, EC1R 3AD
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett <http://twitter.com/adrianbridgett>|
LinkedIn link <https://uk.linkedin.com/in/abridgett>
_____________________________________________________