majnemer added inline comments.

================
Comment at: llvm/include/llvm/ADT/APFloat.h:190
+    // greater throughput than single precision (32-bit) formats.
+    S_FloatTF32,
 
----------------
Hmm,  this says improved precision than half but the semantics you gave say 11 
digits? Does NVIDIA document how many bits we should expect?


================
Comment at: llvm/lib/Support/APFloat.cpp:141
     4, -10, 4, 8, fltNonfiniteBehavior::NanOnly, fltNanEncoding::NegativeZero};
+static constexpr fltSemantics semFloatTF32 = {127, -126, 11, 19};
 static constexpr fltSemantics semX87DoubleExtended = {16383, -16382, 64, 80};
----------------
NVIDIA's 
[docs](https://docs.nvidia.com/cuda/parallel-thread-execution/#alternate-floating-point-data-formats)
 say:
> This data format is a special 32-bit floating point format supported by the 
> matrix multiply-and-accumulate instructions, with the same range as .f32 and 
> reduced precision (>=10 bits). The internal layout of tf32 format is 
> implementation defined. PTX facilitates conversion from single precision .f32 
> type to tf32 format. A register variable containing tf32 data must be 
> declared with .b32 type.

As written, it's at least 11 bits but it can change over time. Will we need 
corresponding flavors of this for future architectures over time?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151923/new/

https://reviews.llvm.org/D151923

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to