Re: [eigen] Mac CUDA build failure question

Rasmus Munk Larsen Thu, 20 Jun 2019 15:12:48 -0700

Thanks for testing. I'll send a PR and add you as reviewer.

On Thu, Jun 20, 2019 at 3:05 PM Eric Klein <[email protected]> wrote:


> Ok. Looks like the warnings are there with and without my hack(s), and the
> minimal set of edits needed to get Eigen to build on Mac with nvcc consists
> of:
>
> Half.h. Change this:
> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate support
> for half floats
> to this:
> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG &&
> !EIGEN_COMP_NVCC) // Emulate support for half floats
>
> And in PacketMath.h, change this:
> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) ||
> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG)
> to this:
> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) ||
> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC)
>
> Obviously that's excluding any enlightening comments about why that's
> being done.
>
> Would you like me to prepare a patch file, or is this something that would
> be better handled by one of the regular contributors?
>
> Thank you!
> ---
> Eric Klein
> [email protected]
>
>
> On Thu, Jun 20, 2019 at 12:01 AM Eric Klein <[email protected]> wrote:
>
>> That appears to work, although there are 2-3 other places that need
>> similar modifications in order to work. I'll try to get you a more complete
>> list tomorrow.
>>
>> I'm paying more attention tonight to warnings coming from Eigen than I
>> had been previously ignoring, and both with my old Macros.h based hack and
>> the newer modifications, I'm seeing some of these: "warning: calling a
>> __host__ function from a __host__ __device__ function is not allowed". A
>> representative one is:
>>
>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(735):
>> warning: calling a __host__ function from a __host__ __device__ function is
>> not allowed
>>           detected during:
>>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
>> Device>::evalSubExprsIfNeeded(MakePointer_<Eigen::TensorEvaluator<const
>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
>> Device>::CoeffReturnType>::Type) [with
>> Op=Eigen::internal::AvgPoolMeanReducer<double>, Dims=const
>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>,
>> ArgType=const Eigen::TensorImagePatchOp<-1L, -1L, const
>> Eigen::TensorLayoutSwapOp<const Eigen::TensorMap<Eigen::Tensor<const
>> double, 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>,
>> MakePointer_=Eigen::MakePointer, Device=Eigen::GpuDevice]"
>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h(172):
>> here
>>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
>> Eigen::TensorReshapingOp<NewDimensions, ArgType>,
>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, Device>::CoeffReturnType
>> *) [with NewDimensions=const Eigen::DSizes<Eigen::DenseIndex, 4>,
>> ArgType=const
>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
>> Eigen::MakePointer>>>, Eigen::MakePointer>, Device=Eigen::GpuDevice]"
>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(146):
>> here
>>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
>> Eigen::TensorAssignOp<LeftArgType, RightArgType>,
>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with
>> LeftArgType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, RightArgType=const
>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
>> Eigen::MakePointer>>>, Eigen::MakePointer>>, Device=Eigen::GpuDevice]"
>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(422):
>> here
>>             instantiation of "void
>> Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable,
>> Tileable>::run(const Expression &, const Eigen::GpuDevice &) [with
>> Expression=const
>> Eigen::TensorAssignOp<Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, const
>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
>> Eigen::MakePointer>>>, Eigen::MakePointer>>>, Vectorizable=false,
>> Tileable=false]"
>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35):
>> here
>>             instantiation of "Eigen::TensorDevice<ExpressionType,
>> DeviceType> &Eigen::TensorDevice<ExpressionType,
>> DeviceType>::operator=(const OtherDerived &) [with
>> ExpressionType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>,
>> DeviceType=tensorflow::GPUDevice,
>> OtherDerived=Eigen::TensorReshapingOp<const
>> Eigen::DSizes<Eigen::DenseIndex, 4>, const
>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
>> Eigen::MakePointer>>>, Eigen::MakePointer>>]"
>> ./tensorflow/core/kernels/avgpooling_op.h(42): here
>>             instantiation of "void
>> tensorflow::functor::SpatialAvgPooling<Device, T>::operator()(const Device
>> &, tensorflow::TTypes<T, 4, Eigen::DenseIndex>::Tensor,
>> tensorflow::TTypes<T, 4, Eigen::DenseIndex>::ConstTensor, int, int, int,
>> int, const Eigen::PaddingType &) [with Device=tensorflow::GPUDevice,
>> T=double]"
>> tensorflow/core/kernels/avgpooling_op_gpu.cu.cc(38): here
>>
>>
>> I'm not sure how concerned I should be about these. The build will
>> succeed, but... I wouldn't be at all surprised to get weird results
>> eventually.
>>
>> In this particular case, it looks like it's complaining because
>> Eigen::GpuDevice::allocate_temp appears to be __host__ rather than __host__
>> __device__ (i.e. missing EIGEN_DEVICE_FUNC). I fully admit that I could be
>> misinterpreting that or otherwise misunderstanding something basic.
>>
>> Should I be concerned about these?
>>
>> Thanks!
>> ---
>> Eric Klein
>> [email protected]
>>
>>
>> On Wed, Jun 19, 2019 at 5:21 PM Rasmus Munk Larsen <[email protected]>
>> wrote:
>>
>>> Erik, does Artem's suggestion work for you?
>>>
>>> On Wed, Jun 19, 2019 at 2:52 PM Artem Belevich <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 19, 2019 at 1:47 PM Rasmus Munk Larsen <[email protected]>
>>>> wrote:
>>>>
>>>>> It looks like we broke the Eigen Cuda build on Mac. What do you think
>>>>> about his workaround?
>>>>>
>>>>> ---------- Forwarded message ---------
>>>>> From: Eric Klein <[email protected]>
>>>>> Date: Wed, Jun 19, 2019 at 1:39 PM
>>>>> Subject: [eigen] Mac CUDA build failure question
>>>>> To: <[email protected]>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>
>>>>> I posted a question on the forums several days back, but suspect that
>>>>> might not be the right place to be asking what I'm asking, so I'm trying
>>>>> the mailing list as well.
>>>>>
>>>>> I'll just repost here what I put in the forums, but the link to that
>>>>> is here: https://forum.kde.org/viewtopic.php?f=74&t=161199
>>>>>
>>>>> I'm trying to build Eigen on Mac for CUDA (using the nvcc compiler),
>>>>> and getting build errors. I understand the errors, and I have a change 
>>>>> that
>>>>> lets me dodge the build failures, but I suspect it's not the right change
>>>>> for checkin, and so I'm looking for feedback.
>>>>>
>>>>> So the issue I have is in Half.h. I wind up getting errors about a
>>>>> bunch of operators being already defined. The core issue is that on Mac,
>>>>> nvcc (the CUDA compliler) is using gcc as the host compiler, but gcc on 
>>>>> Mac
>>>>> is built on top of clang. Eigen seems to be implicitly assuming that the
>>>>> presence of clang implies that absence of CUDA (or at least the absence of
>>>>> nvcc CUDA support).
>>>>>
>>>>> In my build I'm hitting this block:
>>>>>
>>>>> #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
>>>>>      EIGEN_CUDA_ARCH >= 530) ||                                  \
>>>>>     (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
>>>>> #define EIGEN_HAS_NATIVE_FP16
>>>>> #endif
>>>>>
>>>>> which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind up
>>>>> compiling in all the operators from Half.h:253-313. That's fine so far.
>>>>>
>>>>
>>>> This assumes device-side compilation.
>>>>
>>>>
>>>>>
>>>>> What happens next is we hit this line:
>>>>>
>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate
>>>>> support for half floats
>>>>>
>>>>> which is followed shortly after by (roughly) the same operator
>>>>> functions (but... emulated), and I get errors because those operator
>>>>> functions were defined above.
>>>>>
>>>>
>>>> If Clang were CUDA compiler, that would not be a problem. This implies
>>>> that it's a CUDA compilation with NVCC. What puzzles me is how did we end
>>>> up with EIGEN_COMP_CLANG defined for the *device* side of the
>>>> compilation. I suspect it's the side effect of NVCC doing device-side
>>>> preprocessing with clang, but actually compiling with cicc, which is
>>>> obviously not clang.
>>>>
>>>> I guess what we need to do here is something like this:
>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !
>>>> EIGEN_COMP_NVCC)
>>>>
>>>> That, and a comment explaining what's going on.
>>>>
>>>> If that does not help, it would be great to compile with '-keep
>>>> -verbose' and check which compilation phase is failing and what exactly is
>>>> it trying to compile.
>>>>
>>>> --Artem
>>>>
>>>>
>>>>> So. My hack to work around this is to ensure that EIGEN_COMP_CLANG
>>>>> gets set to 0 in Macros.h if __NVCC__ is defined. That works fine for me
>>>>> locally, and gets Eigen building fine (and thus unblocks me on getting
>>>>> TensorFlow building for Mac, or at least unblocks this issue).
>>>>>
>>>>> I'm willing to bet however that this is the wrong thing to do in
>>>>> general. I don't understand enough of what this second code block is doing
>>>>> to really understand why clang is being treated differently than nvcc here
>>>>> (and specifically why half support needs to be emulated in the presence of
>>>>> clang). I believe there is a version of clang that supports CUDA (at least
>>>>> on some platforms?). Presumably this is for that, but I don't know enough
>>>>> about how that differs from nvcc to fully grok this.
>>>>>
>>>>> Can anyone help enlighten me about the best way to fix this?
>>>>>
>>>>> Thanks!
>>>>> ---
>>>>> Eric Klein
>>>>> [email protected]
>>>>>
>>>>
>>>>
>>>> --
>>>> --Artem Belevich
>>>>
>>>

Re: [eigen] Mac CUDA build failure question

Reply via email to