Re: [eigen] Mac CUDA build failure question

Rasmus Munk Larsen Thu, 20 Jun 2019 16:34:25 -0700

Glad to help!

On Thu, Jun 20, 2019 at 4:26 PM Eric Klein <[email protected]> wrote:


> Thank you both very much for the help with this. It's much appreciated.
> ---
> Eric Klein
> [email protected]
>
>
> On Thu, Jun 20, 2019 at 3:46 PM Rasmus Munk Larsen <[email protected]>
> wrote:
>
>> PR is here:
>> https://bitbucket.org/eigen/eigen/pull-requests/659/fix-cuda-build-on-mac/diff
>>
>>
>> On Thu, Jun 20, 2019 at 3:12 PM Eric Klein <[email protected]> wrote:
>>
>>> Thank you both. I appreciate the help with this.
>>> ---
>>> Eric Klein
>>> [email protected]
>>>
>>>
>>> On Thu, Jun 20, 2019 at 3:11 PM Artem Belevich <[email protected]> wrote:
>>>
>>>> The changes look reasonable to me. Thank you for helping to sort this
>>>> out.
>>>>
>>>>
>>>> On Thu, Jun 20, 2019 at 3:05 PM Eric Klein <[email protected]> wrote:
>>>>
>>>>> Ok. Looks like the warnings are there with and without my hack(s), and
>>>>> the minimal set of edits needed to get Eigen to build on Mac with nvcc
>>>>> consists of:
>>>>>
>>>>> Half.h. Change this:
>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate
>>>>> support for half floats
>>>>> to this:
>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG &&
>>>>> !EIGEN_COMP_NVCC) // Emulate support for half floats
>>>>>
>>>>> And in PacketMath.h, change this:
>>>>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) ||
>>>>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG)
>>>>> to this:
>>>>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) ||
>>>>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC)
>>>>>
>>>>> Obviously that's excluding any enlightening comments about why that's
>>>>> being done.
>>>>>
>>>>> Would you like me to prepare a patch file, or is this something that
>>>>> would be better handled by one of the regular contributors?
>>>>>
>>>>> Thank you!
>>>>> ---
>>>>> Eric Klein
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2019 at 12:01 AM Eric Klein <[email protected]> wrote:
>>>>>
>>>>>> That appears to work, although there are 2-3 other places that need
>>>>>> similar modifications in order to work. I'll try to get you a more 
>>>>>> complete
>>>>>> list tomorrow.
>>>>>>
>>>>>> I'm paying more attention tonight to warnings coming from Eigen than
>>>>>> I had been previously ignoring, and both with my old Macros.h based hack
>>>>>> and the newer modifications, I'm seeing some of these: "warning: calling 
>>>>>> a
>>>>>> __host__ function from a __host__ __device__ function is not allowed". A
>>>>>> representative one is:
>>>>>>
>>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(735):
>>>>>> warning: calling a __host__ function from a __host__ __device__ function 
>>>>>> is
>>>>>> not allowed
>>>>>>           detected during:
>>>>>>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
>>>>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
>>>>>> Device>::evalSubExprsIfNeeded(MakePointer_<Eigen::TensorEvaluator<const
>>>>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
>>>>>> Device>::CoeffReturnType>::Type) [with
>>>>>> Op=Eigen::internal::AvgPoolMeanReducer<double>, Dims=const
>>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>,
>>>>>> ArgType=const Eigen::TensorImagePatchOp<-1L, -1L, const
>>>>>> Eigen::TensorLayoutSwapOp<const Eigen::TensorMap<Eigen::Tensor<const
>>>>>> double, 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>,
>>>>>> MakePointer_=Eigen::MakePointer, Device=Eigen::GpuDevice]"
>>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h(172):
>>>>>> here
>>>>>>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
>>>>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>,
>>>>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
>>>>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, 
>>>>>> Device>::CoeffReturnType
>>>>>> *) [with NewDimensions=const Eigen::DSizes<Eigen::DenseIndex, 4>,
>>>>>> ArgType=const
>>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, 
>>>>>> const
>>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 
>>>>>> 16,
>>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>, Device=Eigen::GpuDevice]"
>>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(146):
>>>>>> here
>>>>>>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
>>>>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>,
>>>>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
>>>>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) 
>>>>>> [with
>>>>>> LeftArgType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
>>>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, RightArgType=const
>>>>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
>>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, 
>>>>>> const
>>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 
>>>>>> 16,
>>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>, Device=Eigen::GpuDevice]"
>>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(422):
>>>>>> here
>>>>>>             instantiation of "void
>>>>>> Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, 
>>>>>> Vectorizable,
>>>>>> Tileable>::run(const Expression &, const Eigen::GpuDevice &) [with
>>>>>> Expression=const
>>>>>> Eigen::TensorAssignOp<Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
>>>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, const
>>>>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
>>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, 
>>>>>> const
>>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 
>>>>>> 16,
>>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>>, Vectorizable=false,
>>>>>> Tileable=false]"
>>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35):
>>>>>> here
>>>>>>             instantiation of "Eigen::TensorDevice<ExpressionType,
>>>>>> DeviceType> &Eigen::TensorDevice<ExpressionType,
>>>>>> DeviceType>::operator=(const OtherDerived &) [with
>>>>>> ExpressionType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
>>>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>,
>>>>>> DeviceType=tensorflow::GPUDevice,
>>>>>> OtherDerived=Eigen::TensorReshapingOp<const
>>>>>> Eigen::DSizes<Eigen::DenseIndex, 4>, const
>>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, 
>>>>>> const
>>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 
>>>>>> 16,
>>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>]"
>>>>>> ./tensorflow/core/kernels/avgpooling_op.h(42): here
>>>>>>             instantiation of "void
>>>>>> tensorflow::functor::SpatialAvgPooling<Device, T>::operator()(const 
>>>>>> Device
>>>>>> &, tensorflow::TTypes<T, 4, Eigen::DenseIndex>::Tensor,
>>>>>> tensorflow::TTypes<T, 4, Eigen::DenseIndex>::ConstTensor, int, int, int,
>>>>>> int, const Eigen::PaddingType &) [with Device=tensorflow::GPUDevice,
>>>>>> T=double]"
>>>>>> tensorflow/core/kernels/avgpooling_op_gpu.cu.cc(38): here
>>>>>>
>>>>>>
>>>>>> I'm not sure how concerned I should be about these. The build will
>>>>>> succeed, but... I wouldn't be at all surprised to get weird results
>>>>>> eventually.
>>>>>>
>>>>>> In this particular case, it looks like it's complaining because
>>>>>> Eigen::GpuDevice::allocate_temp appears to be __host__ rather than 
>>>>>> __host__
>>>>>> __device__ (i.e. missing EIGEN_DEVICE_FUNC). I fully admit that I could 
>>>>>> be
>>>>>> misinterpreting that or otherwise misunderstanding something basic.
>>>>>>
>>>>>> Should I be concerned about these?
>>>>>>
>>>>>> Thanks!
>>>>>> ---
>>>>>> Eric Klein
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2019 at 5:21 PM Rasmus Munk Larsen <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Erik, does Artem's suggestion work for you?
>>>>>>>
>>>>>>> On Wed, Jun 19, 2019 at 2:52 PM Artem Belevich <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 19, 2019 at 1:47 PM Rasmus Munk Larsen <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> It looks like we broke the Eigen Cuda build on Mac. What do you
>>>>>>>>> think about his workaround?
>>>>>>>>>
>>>>>>>>> ---------- Forwarded message ---------
>>>>>>>>> From: Eric Klein <[email protected]>
>>>>>>>>> Date: Wed, Jun 19, 2019 at 1:39 PM
>>>>>>>>> Subject: [eigen] Mac CUDA build failure question
>>>>>>>>> To: <[email protected]>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello all,
>>>>>>>>>
>>>>>>>>> I posted a question on the forums several days back, but suspect
>>>>>>>>> that might not be the right place to be asking what I'm asking, so I'm
>>>>>>>>> trying the mailing list as well.
>>>>>>>>>
>>>>>>>>> I'll just repost here what I put in the forums, but the link to
>>>>>>>>> that is here: https://forum.kde.org/viewtopic.php?f=74&t=161199
>>>>>>>>>
>>>>>>>>> I'm trying to build Eigen on Mac for CUDA (using the nvcc
>>>>>>>>> compiler), and getting build errors. I understand the errors, and I 
>>>>>>>>> have a
>>>>>>>>> change that lets me dodge the build failures, but I suspect it's not 
>>>>>>>>> the
>>>>>>>>> right change for checkin, and so I'm looking for feedback.
>>>>>>>>>
>>>>>>>>> So the issue I have is in Half.h. I wind up getting errors about a
>>>>>>>>> bunch of operators being already defined. The core issue is that on 
>>>>>>>>> Mac,
>>>>>>>>> nvcc (the CUDA compliler) is using gcc as the host compiler, but gcc 
>>>>>>>>> on Mac
>>>>>>>>> is built on top of clang. Eigen seems to be implicitly assuming that 
>>>>>>>>> the
>>>>>>>>> presence of clang implies that absence of CUDA (or at least the 
>>>>>>>>> absence of
>>>>>>>>> nvcc CUDA support).
>>>>>>>>>
>>>>>>>>> In my build I'm hitting this block:
>>>>>>>>>
>>>>>>>>> #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
>>>>>>>>>      EIGEN_CUDA_ARCH >= 530) ||                                  \
>>>>>>>>>     (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
>>>>>>>>> #define EIGEN_HAS_NATIVE_FP16
>>>>>>>>> #endif
>>>>>>>>>
>>>>>>>>> which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind
>>>>>>>>> up compiling in all the operators from Half.h:253-313. That's fine so 
>>>>>>>>> far.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This assumes device-side compilation.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> What happens next is we hit this line:
>>>>>>>>>
>>>>>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG //
>>>>>>>>> Emulate support for half floats
>>>>>>>>>
>>>>>>>>> which is followed shortly after by (roughly) the same operator
>>>>>>>>> functions (but... emulated), and I get errors because those operator
>>>>>>>>> functions were defined above.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If Clang were CUDA compiler, that would not be a problem. This
>>>>>>>> implies that it's a CUDA compilation with NVCC. What puzzles me is how 
>>>>>>>> did
>>>>>>>> we end up with EIGEN_COMP_CLANG defined for the *device* side of
>>>>>>>> the compilation. I suspect it's the side effect of NVCC doing 
>>>>>>>> device-side
>>>>>>>> preprocessing with clang, but actually compiling with cicc, which is
>>>>>>>> obviously not clang.
>>>>>>>>
>>>>>>>> I guess what we need to do here is something like this:
>>>>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !
>>>>>>>> EIGEN_COMP_NVCC)
>>>>>>>>
>>>>>>>> That, and a comment explaining what's going on.
>>>>>>>>
>>>>>>>> If that does not help, it would be great to compile with '-keep
>>>>>>>> -verbose' and check which compilation phase is failing and what 
>>>>>>>> exactly is
>>>>>>>> it trying to compile.
>>>>>>>>
>>>>>>>> --Artem
>>>>>>>>
>>>>>>>>
>>>>>>>>> So. My hack to work around this is to ensure that EIGEN_COMP_CLANG
>>>>>>>>> gets set to 0 in Macros.h if __NVCC__ is defined. That works fine for 
>>>>>>>>> me
>>>>>>>>> locally, and gets Eigen building fine (and thus unblocks me on getting
>>>>>>>>> TensorFlow building for Mac, or at least unblocks this issue).
>>>>>>>>>
>>>>>>>>> I'm willing to bet however that this is the wrong thing to do in
>>>>>>>>> general. I don't understand enough of what this second code block is 
>>>>>>>>> doing
>>>>>>>>> to really understand why clang is being treated differently than nvcc 
>>>>>>>>> here
>>>>>>>>> (and specifically why half support needs to be emulated in the 
>>>>>>>>> presence of
>>>>>>>>> clang). I believe there is a version of clang that supports CUDA (at 
>>>>>>>>> least
>>>>>>>>> on some platforms?). Presumably this is for that, but I don't know 
>>>>>>>>> enough
>>>>>>>>> about how that differs from nvcc to fully grok this.
>>>>>>>>>
>>>>>>>>> Can anyone help enlighten me about the best way to fix this?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> ---
>>>>>>>>> Eric Klein
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --Artem Belevich
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> --Artem Belevich
>>>>
>>>

Re: [eigen] Mac CUDA build failure question

Reply via email to