[GitHub] [incubator-mxnet] leezu commented on issue #17596: Fix transformer.cu interleaved matmul for cuda arch < 5

GitBox Fri, 14 Feb 2020 17:29:40 -0800

leezu commented on issue #17596: Fix transformer.cu interleaved matmul for cuda 
arch < 5
URL: https://github.com/apache/incubator-mxnet/pull/17596#issuecomment-586539281
 
 
   Verified this patch by finetuning Bert on P2 instance.
   
   Verification was initially blocked / delayed by 
https://github.com/apache/incubator-mxnet/pull/17576 ...
   
   ```
   % python finetune_classifier.py --task_name RTE --batch_size 32 --epochs 3 
--gpu 0 --lr 2e-5
   INFO:root:01:21:10 Namespace(accumulate=None, batch_size=32, 
bert_dataset='book_corpus_wiki_en_uncased', bert_model='bert_12_768_12', 
calib_mode='customize', deploy=False, dev_batch_size=8, dtype='float32', 
early_stop=None, epochs=3, epsilon=1e-06, gpu=0, log_interval=10, lr=2e-05, 
max_len=128, model_parameters=None, model_prefix=None, num_calib_batches=5, 
only_calibration=False, only_inference=False, optimizer='bertadam', 
output_dir='./output_dir', pretrained_bert_parameters=None, 
quantized_dtype='auto', round_to=None, seed=2, task_name='RTE', 
training_steps=None, warmup_ratio=0.1)
   [01:21:12] ../src/base.cc:84: Upgrade advisory: this mxnet has been built 
against cuDNN lib version 7501, which is older than the oldest version tested 
by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
   INFO:root:01:21:26 processing dataset...
   INFO:root:01:21:35 Now we are doing BERT classification training on gpu(0)!
   INFO:root:01:21:35 training steps=233
   INFO:root:01:21:45 [Epoch 1 Batch 10/82] loss=0.7479, lr=0.0000078, 
metrics:accuracy:0.5507
   INFO:root:01:21:54 [Epoch 1 Batch 20/82] loss=0.7263, lr=0.0000165, 
metrics:accuracy:0.5235
   INFO:root:01:22:02 [Epoch 1 Batch 30/82] loss=0.6821, lr=0.0000194, 
metrics:accuracy:0.5306
   INFO:root:01:22:12 [Epoch 1 Batch 40/82] loss=0.6718, lr=0.0000185, 
metrics:accuracy:0.5370
   INFO:root:01:22:21 [Epoch 1 Batch 50/82] loss=0.6743, lr=0.0000175, 
metrics:accuracy:0.5518
   INFO:root:01:22:31 [Epoch 1 Batch 60/82] loss=0.6894, lr=0.0000166, 
metrics:accuracy:0.5551
   INFO:root:01:22:39 [Epoch 1 Batch 70/82] loss=0.6872, lr=0.0000156, 
metrics:accuracy:0.5587
   INFO:root:01:22:48 [Epoch 1 Batch 80/82] loss=0.6626, lr=0.0000147, 
metrics:accuracy:0.5693
   INFO:root:01:22:50 Now we are doing evaluation on dev with gpu(0).
   INFO:root:01:22:51 [Batch 10/35] loss=0.6449, metrics:accuracy:0.6750
   INFO:root:01:22:52 [Batch 20/35] loss=0.6266, metrics:accuracy:0.6813
   INFO:root:01:22:54 [Batch 30/35] loss=0.6930, metrics:accuracy:0.6625
   INFO:root:01:22:54 validation metrics:accuracy:0.6715
   INFO:root:01:22:54 Time cost=4.00s, throughput=69.97 samples/s
   INFO:root:01:22:55 params saved in: ./output_dir/model_bert_RTE_0.params
   INFO:root:01:22:55 Time cost=79.30s
   INFO:root:01:23:03 [Epoch 2 Batch 10/82] loss=0.5310, lr=0.0000135, 
metrics:accuracy:0.7719
   INFO:root:01:23:12 [Epoch 2 Batch 20/82] loss=0.5022, lr=0.0000126, 
metrics:accuracy:0.7650
   INFO:root:01:23:22 [Epoch 2 Batch 30/82] loss=0.4835, lr=0.0000116, 
metrics:accuracy:0.7733
   INFO:root:01:23:31 [Epoch 2 Batch 40/82] loss=0.4762, lr=0.0000107, 
metrics:accuracy:0.7754
   INFO:root:01:23:40 [Epoch 2 Batch 50/82] loss=0.4412, lr=0.0000097, 
metrics:accuracy:0.7728
   INFO:root:01:23:48 [Epoch 2 Batch 60/82] loss=0.4915, lr=0.0000088, 
metrics:accuracy:0.7741
   INFO:root:01:23:57 [Epoch 2 Batch 70/82] loss=0.4512, lr=0.0000078, 
metrics:accuracy:0.7767
   INFO:root:01:24:05 [Epoch 2 Batch 80/82] loss=0.3897, lr=0.0000069, 
metrics:accuracy:0.7832
   INFO:root:01:24:06 Now we are doing evaluation on dev with gpu(0).
   INFO:root:01:24:08 [Batch 10/35] loss=0.6482, metrics:accuracy:0.7125
   INFO:root:01:24:09 [Batch 20/35] loss=0.6311, metrics:accuracy:0.7125
   INFO:root:01:24:10 [Batch 30/35] loss=0.7034, metrics:accuracy:0.7042
   INFO:root:01:24:10 validation metrics:accuracy:0.7076
   INFO:root:01:24:10 Time cost=4.00s, throughput=70.06 samples/s
   INFO:root:01:24:11 params saved in: ./output_dir/model_bert_RTE_1.params
   INFO:root:01:24:11 Time cost=76.11s
   INFO:root:01:24:21 [Epoch 3 Batch 10/82] loss=0.2911, lr=0.0000057, 
metrics:accuracy:0.9125
   INFO:root:01:24:30 [Epoch 3 Batch 20/82] loss=0.2762, lr=0.0000048, 
metrics:accuracy:0.9092
   INFO:root:01:24:39 [Epoch 3 Batch 30/82] loss=0.2438, lr=0.0000038, 
metrics:accuracy:0.9121
   INFO:root:01:24:47 [Epoch 3 Batch 40/82] loss=0.2719, lr=0.0000029, 
metrics:accuracy:0.9077
   INFO:root:01:24:56 [Epoch 3 Batch 50/82] loss=0.2787, lr=0.0000019, 
metrics:accuracy:0.9054
   INFO:root:01:25:05 [Epoch 3 Batch 60/82] loss=0.3279, lr=0.0000010, 
metrics:accuracy:0.9049
   INFO:root:01:25:12 Finish training step: 233
   INFO:root:01:25:12 Now we are doing evaluation on dev with gpu(0).
   INFO:root:01:25:14 [Batch 10/35] loss=0.7463, metrics:accuracy:0.7125
   INFO:root:01:25:15 [Batch 20/35] loss=0.6660, metrics:accuracy:0.7250
   INFO:root:01:25:16 [Batch 30/35] loss=0.7802, metrics:accuracy:0.7125
   INFO:root:01:25:16 validation metrics:accuracy:0.7112
   INFO:root:01:25:16 Time cost=3.97s, throughput=70.60 samples/s
   INFO:root:01:25:17 params saved in: ./output_dir/model_bert_RTE_2.params
   INFO:root:01:25:17 Time cost=65.91s
   INFO:root:01:25:17 Best model at epoch 2. Validation metrics:accuracy:0.7112
   INFO:root:01:25:17 Now we are doing testing on test with gpu(0).
   INFO:root:01:25:54 Time cost=36.38s, throughput=82.47 samples/s
   ````


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17596: Fix transformer.cu interleaved matmul for cuda arch < 5

Reply via email to