Hi David,

We run a great mix of configurations of NVMf and RDMA storage tiers with different storage classes, e.g. 3 storage classes where a group of NVMf datanodes is 0, another group of NVMf server is 1 and the RDMA datanodes are storage class 2. So this should work. I understand that the setup might be a bit tricky in the beginning.

From your logs I see that you do not use the same configuration file for all containers. It is crucial that e.g. the order of storage types etc is the same in all configuration files. They have to be identical. To specify a storage class for a datanode you need to append "-c 1" (storage class 1) when starting the datanode. You can find the details of how exactly this works here: https://incubator-crail.readthedocs.io/en/latest/run.html
The last example in "Starting Crail manually" talks about this.

Regarding the patched version, I have to take another look. Please use the Apache Crail master for now (It will hang with Spark at the end of your job but it should run through).

Regards,
Jonas

 On Tue, 2 Jul 2019 00:27:33 +0000
 David Crespi <[email protected]> wrote:
Jonas,

Just wanted to be sure I’m doing things correctly. It runs okay without adding in the NVMf datanode (i.e.

completes teragen). When I add the NVMf node in, even without using it on the run, it hangs during the

terasort, with nothing being written to the datanode – only the metadata is created (i.e. /spark).


My config is:

1 namenode container

1 rdma datanode storage class 1 container

1 nvmf datanode storage class 1 container.


The namenode is showing that both datanode are starting up as

Type 0 to storage class 0… is that correct?


NameNode log at startup:

19/07/01 17:18:16 INFO crail: initalizing namenode

19/07/01 17:18:16 INFO crail: crail.version 3101

19/07/01 17:18:16 INFO crail: crail.directorydepth 16

19/07/01 17:18:16 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:16 INFO crail: crail.blocksize 1048576

19/07/01 17:18:16 INFO crail: crail.cachelimit 0

19/07/01 17:18:16 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:16 INFO crail: crail.user crail

19/07/01 17:18:16 INFO crail: crail.shadowreplication 1

19/07/01 17:18:16 INFO crail: crail.debug true

19/07/01 17:18:16 INFO crail: crail.statistics false

19/07/01 17:18:16 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:16 INFO crail: crail.datatimeout 1000

19/07/01 17:18:16 INFO crail: crail.buffersize 1048576

19/07/01 17:18:16 INFO crail: crail.slicesize 65536

19/07/01 17:18:16 INFO crail: crail.singleton true

19/07/01 17:18:16 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:16 INFO crail: crail.directoryrecord 512

19/07/01 17:18:16 INFO crail: crail.directoryrandomize true

19/07/01 17:18:16 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:16 INFO crail: crail.locationmap

19/07/01 17:18:16 INFO crail: crail.namenode.address crail://minnie:9060?id=0&size=1

19/07/01 17:18:16 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:16 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:16 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:16 INFO crail: crail.namenode.log

19/07/01 17:18:16 INFO crail: crail.storage.types org.apache.crail.storage.nvmf.NvmfStorageTier,org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:16 INFO crail: crail.storage.classes 2

19/07/01 17:18:16 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:16 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO crail: round robin block selection

19/07/01 17:18:16 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true, cores 2

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:16 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39260

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39260

19/07/01 17:18:17 INFO crail: adding datanode /192.168.3.100:4420 of type 0 to storage class 0

19/07/01 17:18:17 INFO crail: new connection from /192.168.1.164:39262

19/07/01 17:18:17 INFO narpc: adding new channel to selector, from /192.168.1.164:39262

19/07/01 17:18:18 INFO crail: adding datanode /192.168.3.100:50020 of type 0 to storage class 0


The RDMA datanode – it is set to have 4x1GB hugepages:

19/07/01 17:18:17 INFO crail: crail.version 3101

19/07/01 17:18:17 INFO crail: crail.directorydepth 16

19/07/01 17:18:17 INFO crail: crail.tokenexpiration 10

19/07/01 17:18:17 INFO crail: crail.blocksize 1048576

19/07/01 17:18:17 INFO crail: crail.cachelimit 0

19/07/01 17:18:17 INFO crail: crail.cachepath /dev/hugepages/cache

19/07/01 17:18:17 INFO crail: crail.user crail

19/07/01 17:18:17 INFO crail: crail.shadowreplication 1

19/07/01 17:18:17 INFO crail: crail.debug true

19/07/01 17:18:17 INFO crail: crail.statistics false

19/07/01 17:18:17 INFO crail: crail.rpctimeout 1000

19/07/01 17:18:17 INFO crail: crail.datatimeout 1000

19/07/01 17:18:17 INFO crail: crail.buffersize 1048576

19/07/01 17:18:17 INFO crail: crail.slicesize 65536

19/07/01 17:18:17 INFO crail: crail.singleton true

19/07/01 17:18:17 INFO crail: crail.regionsize 1073741824

19/07/01 17:18:17 INFO crail: crail.directoryrecord 512

19/07/01 17:18:17 INFO crail: crail.directoryrandomize true

19/07/01 17:18:17 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache

19/07/01 17:18:17 INFO crail: crail.locationmap

19/07/01 17:18:17 INFO crail: crail.namenode.address crail://minnie:9060

19/07/01 17:18:17 INFO crail: crail.namenode.blockselection roundrobin

19/07/01 17:18:17 INFO crail: crail.namenode.fileblocks 16

19/07/01 17:18:17 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode

19/07/01 17:18:17 INFO crail: crail.namenode.log

19/07/01 17:18:17 INFO crail: crail.storage.types org.apache.crail.storage.rdma.RdmaStorageTier

19/07/01 17:18:17 INFO crail: crail.storage.classes 1

19/07/01 17:18:17 INFO crail: crail.storage.rootclass 1

19/07/01 17:18:17 INFO crail: crail.storage.keepalive 2

19/07/01 17:18:17 INFO disni: creating  RdmaProvider of type 'nat'

19/07/01 17:18:17 INFO disni: jverbs jni version 32

19/07/01 17:18:17 INFO disni: sock_addr_in size mismatch, jverbs size 28, native size 16

19/07/01 17:18:17 INFO disni: IbvRecvWR size match, jverbs size 32, native size 32

19/07/01 17:18:17 INFO disni: IbvSendWR size mismatch, jverbs size 72, native size 128

19/07/01 17:18:17 INFO disni: IbvWC size match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: IbvSge size match, jverbs size 16, native size 16

19/07/01 17:18:17 INFO disni: Remote addr offset match, jverbs size 40, native size 40

19/07/01 17:18:17 INFO disni: Rkey offset match, jverbs size 48, native size 48

19/07/01 17:18:17 INFO disni: createEventChannel, objId 140349068383088

19/07/01 17:18:17 INFO disni: passive endpoint group, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: createId, id 140349068429968

19/07/01 17:18:17 INFO disni: new server endpoint, id 0

19/07/01 17:18:17 INFO disni: launching cm processor, cmChannel 0

19/07/01 17:18:17 INFO disni: bindAddr, address /192.168.3.100:50020

19/07/01 17:18:17 INFO disni: listen, id 0

19/07/01 17:18:17 INFO disni: allocPd, objId 140349068679808

19/07/01 17:18:17 INFO disni: setting up protection domain, context 100, pd 1

19/07/01 17:18:17 INFO disni: PD value 1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.interface enp94s0f1

19/07/01 17:18:17 INFO crail: crail.storage.rdma.port 50020

19/07/01 17:18:17 INFO crail: crail.storage.rdma.storagelimit 4294967296

19/07/01 17:18:17 INFO crail: crail.storage.rdma.allocationsize 1073741824

19/07/01 17:18:17 INFO crail: crail.storage.rdma.datapath /dev/hugepages/rdma

19/07/01 17:18:17 INFO crail: crail.storage.rdma.localmap true

19/07/01 17:18:17 INFO crail: crail.storage.rdma.queuesize 32

19/07/01 17:18:17 INFO crail: crail.storage.rdma.type passive

19/07/01 17:18:17 INFO crail: crail.storage.rdma.backlog 100

19/07/01 17:18:17 INFO crail: crail.storage.rdma.connecttimeout 1000

19/07/01 17:18:17 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.queueDepth 32

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.messageSize 512

19/07/01 17:18:17 INFO crail: crail.namenode.tcp.cores 2

19/07/01 17:18:17 INFO crail: rdma storage server started, address /192.168.3.100:50020, persistent false, maxWR 32, maxSge 4, cqSize 3200

19/07/01 17:18:17 INFO disni: starting accept

19/07/01 17:18:18 INFO crail: connected to namenode(s) minnie/192.168.1.164:9060

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 1024

19/07/01 17:18:18 INFO crail: datanode statistics, freeBlocks 2048

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 3072

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096

19/07/01 17:18:19 INFO crail: datanode statistics, freeBlocks 4096


NVMf datanode is showing 1TB.

19/07/01 17:23:57 INFO crail: datanode statistics, freeBlocks 1048576


Regards,


          David


________________________________
From: David Crespi <[email protected]>
Sent: Monday, July 1, 2019 3:57:42 PM
To: Jonas Pfefferle; [email protected]
Subject: RE: Setting up storage class 1 and 2

A standard pull from the repo, one that didn’t have the patches from your private repo.

I can put patches back in both the client and server containers if you really think it

would make a difference.


Are you guys running multiple types together? I’m running a RDMA storage class 1,

a NVMf Storage Class 1 and NVMf Storage Class 2 together. I get errors when the

RDMA is introduced into the mix. I have a small amount of memory (4GB) assigned

with the RDMA tier, and looking for it to fall into the NVMf class 1 tier. It appears to want

to do that, but gets screwed up… it looks like it’s trying to create another set of qp’s for

an RDMA connection.  It even blew up spdk trying to accomplish that.


Do you guys have some documentation that shows what’s been tested (mixes/variations) so far?


Regards,


          David


________________________________
From: Jonas Pfefferle <[email protected]>
Sent: Monday, July 1, 2019 12:51:09 AM
To: [email protected]; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


Can you clarify which unpatched version you are talking about? Are you talking about the NVMf thread fix where I send you a link to a branch in my repository or the fix we provided earlier for the Spark hang in the Crail
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

 On Fri, 28 Jun 2019 17:59:32 +0000
 David Crespi <[email protected]> wrote:
Jonas,
FYI - I went back to using the unpatched version of crail on the
clients and it appears to work
okay now with the shuffle and RDMA, with only the RDMA containers
running on the server.

Regards,

          David


________________________________
From: David Crespi
Sent: Friday, June 28, 2019 7:49:51 AM
To: Jonas Pfefferle; [email protected]
Subject: RE: Setting up storage class 1 and 2


Oh, and while I’m thinking about it Jonas, when I added the patches
you provided the other day, I only

added them to the spark containers (clients) not to my crail
containers running on my storage server.

Should the patches been added to all of the containers?


Regards,


          David


________________________________
From: Jonas Pfefferle <[email protected]>
Sent: Friday, June 28, 2019 12:54:27 AM
To: [email protected]; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


At the moment, it is possible to add a NVMf datanode even if only
the RDMA
storage type is specified in the config. As you have seen this will
go wrong
as soon as a client tries to connect to the datanode. Make sure to
start the
RDMA datanode with the appropriate classname, see:
https://incubator-crail.readthedocs.io/en/latest/run.html
The correct classname is
org.apache.crail.storage.rdma.RdmaStorageTier.

Regards,
Jonas

 On Thu, 27 Jun 2019 23:09:26 +0000
 David Crespi <[email protected]> wrote:
Hi,
I’m trying to integrate the storage classes and I’m hitting another
issue when running terasort and just
using the crail-shuffle with HDFS as the tmp storage.  The program
just sits, after the following
message:
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser: closed
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
connections 0

During this run, I’ve removed the two crail nvmf (class 1 and 2)
containers from the server, and I’m only running
the namenode and a rdma storage class 1 datanode.  My spark
configuration is also now only looking at
the rdma class.  It looks as though it’s picking up the NVMf IP and
port in the INFO messages seen below.
I must be configuring something wrong, but I’ve not been able to
track it down.  Any thoughts?


************************************
        TeraSort
************************************
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
hduser
19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
hduser
19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
to:
19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
to:
19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users  with view
permissions: Set(hduser); groups with view permissions: Set(); users
with modify permissions: Set(hduser); groups with modify
permissions: Set()
19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
default logging framework
19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
-Dio.netty.eventLoopThreads: 112
19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
false
19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
19/06/27 15:59:08 DEBUG PlatformDependent0:
sun.misc.Unsafe.theUnsafe: available
19/06/27 15:59:08 DEBUG PlatformDependent0:
sun.misc.Unsafe.copyMemory: available
19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
available
19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
constructor: available
19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
available, true
19/06/27 15:59:08 DEBUG PlatformDependent0:
jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
prior to Java9
19/06/27 15:59:08 DEBUG PlatformDependent0:
java.nio.DirectByteBuffer.<init>(long, int): available
19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
available
19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
(java.io.tmpdir)
19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
(sun.arch.data.model)
19/06/27 15:59:08 DEBUG PlatformDependent:
-Dio.netty.noPreferDirect: false
19/06/27 15:59:08 DEBUG PlatformDependent:
-Dio.netty.maxDirectMemory: 1029177344 bytes
19/06/27 15:59:08 DEBUG PlatformDependent:
-Dio.netty.uninitializedArrayAllocationThreshold: -1
19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
available
19/06/27 15:59:08 DEBUG NioEventLoop:
-Dio.netty.noKeySetOptimization: false
19/06/27 15:59:08 DEBUG NioEventLoop:
-Dio.netty.selectorAutoRebuildThreshold: 512
19/06/27 15:59:08 DEBUG PlatformDependent:
org.jctools-core.MpscChunkedArrayQueue: available
19/06/27 15:59:08 DEBUG ResourceLeakDetector:
-Dio.netty.leakDetection.level: simple
19/06/27 15:59:08 DEBUG ResourceLeakDetector:
-Dio.netty.leakDetection.targetRecords: 4
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.numHeapArenas: 9
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.numDirectArenas: 10
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.pageSize: 8192
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.maxOrder: 11
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.chunkSize: 16777216
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.tinyCacheSize: 512
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.smallCacheSize: 256
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.normalCacheSize: 64
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.maxCachedBufferCapacity: 32768
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.cacheTrimInterval: 8192
19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
-Dio.netty.allocator.useCacheForAllThreads: true
19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
(auto-detected)
19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
false
19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
127.0.0.1)
19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
02:42:ac:ff:fe:1b:00:02 (auto-detected)
19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
pooled
19/06/27 15:59:08 DEBUG ByteBufUtil:
-Dio.netty.threadLocalDirectBufferSize: 65536
19/06/27 15:59:08 DEBUG ByteBufUtil:
-Dio.netty.maxThreadLocalCharBufferSize: 16384
19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
port: 36915
19/06/27 15:59:08 INFO Utils: Successfully started service
'sparkDriver' on port 36915.
19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
org.apache.spark.serializer.KryoSerializer
19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
capacity 366.3 MB
19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
19/06/27 15:59:08 DEBUG
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
SSLOptions{enabled=false, port=None, keyStore=None,
keyStorePassword=None, trustStore=None, trustStorePassword=None,
protocol=None, enabledAlgorithms=Set()}
19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
on port 4040.
19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
started at http://192.168.1.161:4040
19/06/27 15:59:08 INFO SparkContext: Added JAR
file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
at
spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
with timestamp 1561676348562
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
Connecting to master spark://master:7077...
19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
connection to master/192.168.3.13:7077
19/06/27 15:59:08 DEBUG AbstractByteBuf:
-Dio.netty.buffer.bytebuf.checkAccessible: true
19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
master/192.168.3.13:7077 successful, running bootstraps...
19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
bootstraps)
19/06/27 15:59:08 DEBUG Recycler:
-Dio.netty.recycler.maxCapacityPerThread: 32768
19/06/27 15:59:08 DEBUG Recycler:
-Dio.netty.recycler.maxSharedCapacityFactor: 2
19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
16
19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
Spark cluster with app ID app-20190627155908-0005
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
added: app-20190627155908-0005/0 on
worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
core(s)
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
added: app-20190627155908-0005/1 on
worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
core(s)
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
core(s), 1024.0 MB RAM
19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
port: 39189
19/06/27 15:59:08 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port
39189.
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
added: app-20190627155908-0005/2 on
worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
core(s)
19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
master:39189
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
added: app-20190627155908-0005/3 on
worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
core(s)
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
added: app-20190627155908-0005/4 on
worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
core(s)
19/06/27 15:59:08 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block
replication policy
19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
core(s), 1024.0 MB RAM
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
updated: app-20190627155908-0005/0 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
updated: app-20190627155908-0005/3 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
updated: app-20190627155908-0005/4 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
updated: app-20190627155908-0005/1 is now RUNNING
19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
updated: app-20190627155908-0005/2 is now RUNNING
19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, master, 39189, None)
19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
master
19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
master, 39189, None)
19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, master, 39189, None)
19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, master, 39189, None)
19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.0
19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
19/06/27 15:59:09 DEBUG BlockReaderLocal:
dfs.client.use.legacy.blockreader.local = false
19/06/27 15:59:09 DEBUG BlockReaderLocal:
dfs.client.read.shortcircuit = false
19/06/27 15:59:09 DEBUG BlockReaderLocal:
dfs.client.domain.socket.data.traffic = false
19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
rpcRequestWrapperClass=class
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
19/06/27 15:59:09 DEBUG Client: getting client out of cache:
org.apache.hadoop.ipc.Client@3ed03652
19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
local reads and UNIX domain socket are disabled.
19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
not using SaslPropertiesResolver, no QOP found in configuration for
dfs.data.transfer.protection
19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
values in memory (estimated size 288.9 KB, free 366.0 MB)
19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
took  115 ms
19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
without replication took  117 ms
19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
broadcast_0_piece0
19/06/27 15:59:10 DEBUG BlockManager: Told master about block
broadcast_0_piece0
19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
locally took  6 ms
19/06/27 15:59:10 DEBUG BlockManager: Putting block
broadcast_0_piece0 without replication took  6 ms
19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
newAPIHadoopFile at TeraSort.scala:60
19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
19/06/27 15:59:10 DEBUG Client: Connecting to
NameNode-1/192.168.3.7:54310
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser: starting, having
connections 1
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser sending #0
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser got value #0
19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
31ms
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser sending #1
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser got value #1
19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
FileStatuses: 134
19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
: 2
19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
by getSplits: 2, TimeTaken: 139
19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
Boolean) constructor
19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
Algorithm version is 1
19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
masked=rwxr-xr-x
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser sending #2
19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser got value #2
19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
$anonfun$write$1
19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
($anonfun$write$1) is now cleaned +++
19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
SparkHadoopWriter.scala:78
19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
400
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
false
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
true
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
org.apache.spark.serializer.CrailSparkSerializer
19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
true
19/06/27 15:59:10 INFO CrailDispatcher:
spark.crail.shuffle.outstanding 1
19/06/27 15:59:10 INFO CrailDispatcher:
spark.crail.shuffle.storageclass 0
19/06/27 15:59:10 INFO CrailDispatcher:
spark.crail.broadcast.storageclass 0
19/06/27 15:59:10 INFO crail: creating singleton crail file system
19/06/27 15:59:10 INFO crail: crail.version 3101
19/06/27 15:59:10 INFO crail: crail.directorydepth 16
19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
19/06/27 15:59:10 INFO crail: crail.cachelimit 0
19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
19/06/27 15:59:10 INFO crail: crail.user crail
19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
19/06/27 15:59:10 INFO crail: crail.debug true
19/06/27 15:59:10 INFO crail: crail.statistics true
19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
19/06/27 15:59:10 INFO crail: crail.slicesize 65536
19/06/27 15:59:10 INFO crail: crail.singleton true
19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
19/06/27 15:59:10 INFO crail: crail.cacheimpl
org.apache.crail.memory.MappedBufferCache
19/06/27 15:59:10 INFO crail: crail.locationmap
19/06/27 15:59:10 INFO crail: crail.namenode.address
crail://192.168.1.164:9060
19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
roundrobin
19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
org.apache.crail.namenode.rpc.tcp.TcpNameNode
19/06/27 15:59:10 INFO crail: crail.namenode.log
19/06/27 15:59:10 INFO crail: crail.storage.types
org.apache.crail.storage.rdma.RdmaStorageTier
19/06/27 15:59:10 INFO crail: crail.storage.classes 1
19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
bufferCount 1024
19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
4294967296
19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
1073741824
19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
/dev/hugepages/rdma
19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
queueDepth 32, messageSize 512, nodealy true
19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
19/06/27 15:59:10 INFO crail: connected to namenode(s)
/192.168.1.164:9060
19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
19/06/27 15:59:10 INFO crail: createNode: name /spark, type
DIRECTORY, storageAffinity 0, locationAffinity 0
19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
streamId 1, isDir true, writeHint 0
19/06/27 15:59:10 INFO crail: passive data client
19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
19/06/27 15:59:10 INFO disni: jverbs jni version 32
19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
size 28, native size 16
19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
native size 32
19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
72, native size 128
19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
native size 48
19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
native size 16
19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
40, native size 40
19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
native size 48
19/06/27 15:59:10 INFO disni: createEventChannel, objId
139811924587312
19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
maxSge 4, cqSize 64
19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
19/06/27 15:59:10 INFO disni: createId, id 139811924676432
19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
19/06/27 15:59:10 INFO disni: resolveAddr, addres
/192.168.3.100:4420
19/06/27 15:59:10 INFO disni: resolveRoute, id 0
19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
19/06/27 15:59:10 INFO disni: setting up protection domain, context
467, pd 1
19/06/27 15:59:10 INFO disni: setting up cq processor
19/06/27 15:59:10 INFO disni: new endpoint CQ processor
19/06/27 15:59:10 INFO disni: createCompChannel, context
139810647883744
19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
64
19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
send_wr size 32, recv_wr_size 32
19/06/27 15:59:10 INFO disni: connect, id 0
19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
/192.168.3.13:43273, dstAddress /192.168.3.100:4420
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
Registered executor NettyRpcEndpointRef(spark-client://Executor)
(192.168.3.11:35854) with ID 0
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
Registered executor NettyRpcEndpointRef(spark-client://Executor)
(192.168.3.12:44312) with ID 1
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
Registered executor NettyRpcEndpointRef(spark-client://Executor)
(192.168.3.8:34774) with ID 4
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
Registered executor NettyRpcEndpointRef(spark-client://Executor)
(192.168.3.9:58808) with ID 2
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
192.168.3.11
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
192.168.3.11, 41919, None)
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
192.168.3.12
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
192.168.3.12, 46697, None)
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
192.168.3.8
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
192.168.3.8, 37281, None)
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
192.168.3.9
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
192.168.3.9, 43857, None)
19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
Registered executor NettyRpcEndpointRef(spark-client://Executor)
(192.168.3.10:40100) with ID 3
19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
192.168.3.10
19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
192.168.3.10, 38527, None)
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser: closed
19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
connections 0


Regards,

          David





Reply via email to