PR is here: https://bitbucket.org/eigen/eigen/pull-requests/659/fix-cuda-build-on-mac/diff
On Thu, Jun 20, 2019 at 3:12 PM Eric Klein <[email protected]> wrote: > Thank you both. I appreciate the help with this. > --- > Eric Klein > [email protected] > > > On Thu, Jun 20, 2019 at 3:11 PM Artem Belevich <[email protected]> wrote: > >> The changes look reasonable to me. Thank you for helping to sort this out. >> >> >> On Thu, Jun 20, 2019 at 3:05 PM Eric Klein <[email protected]> wrote: >> >>> Ok. Looks like the warnings are there with and without my hack(s), and >>> the minimal set of edits needed to get Eigen to build on Mac with nvcc >>> consists of: >>> >>> Half.h. Change this: >>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate >>> support for half floats >>> to this: >>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && >>> !EIGEN_COMP_NVCC) // Emulate support for half floats >>> >>> And in PacketMath.h, change this: >>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) || >>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG) >>> to this: >>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) || >>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) >>> >>> Obviously that's excluding any enlightening comments about why that's >>> being done. >>> >>> Would you like me to prepare a patch file, or is this something that >>> would be better handled by one of the regular contributors? >>> >>> Thank you! >>> --- >>> Eric Klein >>> [email protected] >>> >>> >>> On Thu, Jun 20, 2019 at 12:01 AM Eric Klein <[email protected]> wrote: >>> >>>> That appears to work, although there are 2-3 other places that need >>>> similar modifications in order to work. I'll try to get you a more complete >>>> list tomorrow. >>>> >>>> I'm paying more attention tonight to warnings coming from Eigen than I >>>> had been previously ignoring, and both with my old Macros.h based hack and >>>> the newer modifications, I'm seeing some of these: "warning: calling a >>>> __host__ function from a __host__ __device__ function is not allowed". A >>>> representative one is: >>>> >>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(735): >>>> warning: calling a __host__ function from a __host__ __device__ function is >>>> not allowed >>>> detected during: >>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>, >>>> Device>::evalSubExprsIfNeeded(MakePointer_<Eigen::TensorEvaluator<const >>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>, >>>> Device>::CoeffReturnType>::Type) [with >>>> Op=Eigen::internal::AvgPoolMeanReducer<double>, Dims=const >>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, >>>> ArgType=const Eigen::TensorImagePatchOp<-1L, -1L, const >>>> Eigen::TensorLayoutSwapOp<const Eigen::TensorMap<Eigen::Tensor<const >>>> double, 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, >>>> MakePointer_=Eigen::MakePointer, Device=Eigen::GpuDevice]" >>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h(172): >>>> here >>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, >>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const >>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, Device>::CoeffReturnType >>>> *) [with NewDimensions=const Eigen::DSizes<Eigen::DenseIndex, 4>, >>>> ArgType=const >>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const >>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>> Eigen::MakePointer>>>, Eigen::MakePointer>, Device=Eigen::GpuDevice]" >>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(146): >>>> here >>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, >>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const >>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with >>>> LeftArgType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, RightArgType=const >>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const >>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>> Eigen::MakePointer>>>, Eigen::MakePointer>>, Device=Eigen::GpuDevice]" >>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(422): >>>> here >>>> instantiation of "void >>>> Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable, >>>> Tileable>::run(const Expression &, const Eigen::GpuDevice &) [with >>>> Expression=const >>>> Eigen::TensorAssignOp<Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, const >>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const >>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>> Eigen::MakePointer>>>, Eigen::MakePointer>>>, Vectorizable=false, >>>> Tileable=false]" >>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35): >>>> here >>>> instantiation of "Eigen::TensorDevice<ExpressionType, >>>> DeviceType> &Eigen::TensorDevice<ExpressionType, >>>> DeviceType>::operator=(const OtherDerived &) [with >>>> ExpressionType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, >>>> DeviceType=tensorflow::GPUDevice, >>>> OtherDerived=Eigen::TensorReshapingOp<const >>>> Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const >>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>> Eigen::MakePointer>>>, Eigen::MakePointer>>]" >>>> ./tensorflow/core/kernels/avgpooling_op.h(42): here >>>> instantiation of "void >>>> tensorflow::functor::SpatialAvgPooling<Device, T>::operator()(const Device >>>> &, tensorflow::TTypes<T, 4, Eigen::DenseIndex>::Tensor, >>>> tensorflow::TTypes<T, 4, Eigen::DenseIndex>::ConstTensor, int, int, int, >>>> int, const Eigen::PaddingType &) [with Device=tensorflow::GPUDevice, >>>> T=double]" >>>> tensorflow/core/kernels/avgpooling_op_gpu.cu.cc(38): here >>>> >>>> >>>> I'm not sure how concerned I should be about these. The build will >>>> succeed, but... I wouldn't be at all surprised to get weird results >>>> eventually. >>>> >>>> In this particular case, it looks like it's complaining because >>>> Eigen::GpuDevice::allocate_temp appears to be __host__ rather than __host__ >>>> __device__ (i.e. missing EIGEN_DEVICE_FUNC). I fully admit that I could be >>>> misinterpreting that or otherwise misunderstanding something basic. >>>> >>>> Should I be concerned about these? >>>> >>>> Thanks! >>>> --- >>>> Eric Klein >>>> [email protected] >>>> >>>> >>>> On Wed, Jun 19, 2019 at 5:21 PM Rasmus Munk Larsen <[email protected]> >>>> wrote: >>>> >>>>> Erik, does Artem's suggestion work for you? >>>>> >>>>> On Wed, Jun 19, 2019 at 2:52 PM Artem Belevich <[email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 19, 2019 at 1:47 PM Rasmus Munk Larsen < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> It looks like we broke the Eigen Cuda build on Mac. What do you >>>>>>> think about his workaround? >>>>>>> >>>>>>> ---------- Forwarded message --------- >>>>>>> From: Eric Klein <[email protected]> >>>>>>> Date: Wed, Jun 19, 2019 at 1:39 PM >>>>>>> Subject: [eigen] Mac CUDA build failure question >>>>>>> To: <[email protected]> >>>>>>> >>>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>> I posted a question on the forums several days back, but suspect >>>>>>> that might not be the right place to be asking what I'm asking, so I'm >>>>>>> trying the mailing list as well. >>>>>>> >>>>>>> I'll just repost here what I put in the forums, but the link to that >>>>>>> is here: https://forum.kde.org/viewtopic.php?f=74&t=161199 >>>>>>> >>>>>>> I'm trying to build Eigen on Mac for CUDA (using the nvcc compiler), >>>>>>> and getting build errors. I understand the errors, and I have a change >>>>>>> that >>>>>>> lets me dodge the build failures, but I suspect it's not the right >>>>>>> change >>>>>>> for checkin, and so I'm looking for feedback. >>>>>>> >>>>>>> So the issue I have is in Half.h. I wind up getting errors about a >>>>>>> bunch of operators being already defined. The core issue is that on Mac, >>>>>>> nvcc (the CUDA compliler) is using gcc as the host compiler, but gcc on >>>>>>> Mac >>>>>>> is built on top of clang. Eigen seems to be implicitly assuming that the >>>>>>> presence of clang implies that absence of CUDA (or at least the absence >>>>>>> of >>>>>>> nvcc CUDA support). >>>>>>> >>>>>>> In my build I'm hitting this block: >>>>>>> >>>>>>> #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \ >>>>>>> EIGEN_CUDA_ARCH >= 530) || \ >>>>>>> (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE)) >>>>>>> #define EIGEN_HAS_NATIVE_FP16 >>>>>>> #endif >>>>>>> >>>>>>> which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind up >>>>>>> compiling in all the operators from Half.h:253-313. That's fine so far. >>>>>>> >>>>>> >>>>>> This assumes device-side compilation. >>>>>> >>>>>> >>>>>>> >>>>>>> What happens next is we hit this line: >>>>>>> >>>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate >>>>>>> support for half floats >>>>>>> >>>>>>> which is followed shortly after by (roughly) the same operator >>>>>>> functions (but... emulated), and I get errors because those operator >>>>>>> functions were defined above. >>>>>>> >>>>>> >>>>>> If Clang were CUDA compiler, that would not be a problem. This >>>>>> implies that it's a CUDA compilation with NVCC. What puzzles me is how >>>>>> did >>>>>> we end up with EIGEN_COMP_CLANG defined for the *device* side of the >>>>>> compilation. I suspect it's the side effect of NVCC doing device-side >>>>>> preprocessing with clang, but actually compiling with cicc, which is >>>>>> obviously not clang. >>>>>> >>>>>> I guess what we need to do here is something like this: >>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && ! >>>>>> EIGEN_COMP_NVCC) >>>>>> >>>>>> That, and a comment explaining what's going on. >>>>>> >>>>>> If that does not help, it would be great to compile with '-keep >>>>>> -verbose' and check which compilation phase is failing and what exactly >>>>>> is >>>>>> it trying to compile. >>>>>> >>>>>> --Artem >>>>>> >>>>>> >>>>>>> So. My hack to work around this is to ensure that EIGEN_COMP_CLANG >>>>>>> gets set to 0 in Macros.h if __NVCC__ is defined. That works fine for me >>>>>>> locally, and gets Eigen building fine (and thus unblocks me on getting >>>>>>> TensorFlow building for Mac, or at least unblocks this issue). >>>>>>> >>>>>>> I'm willing to bet however that this is the wrong thing to do in >>>>>>> general. I don't understand enough of what this second code block is >>>>>>> doing >>>>>>> to really understand why clang is being treated differently than nvcc >>>>>>> here >>>>>>> (and specifically why half support needs to be emulated in the presence >>>>>>> of >>>>>>> clang). I believe there is a version of clang that supports CUDA (at >>>>>>> least >>>>>>> on some platforms?). Presumably this is for that, but I don't know >>>>>>> enough >>>>>>> about how that differs from nvcc to fully grok this. >>>>>>> >>>>>>> Can anyone help enlighten me about the best way to fix this? >>>>>>> >>>>>>> Thanks! >>>>>>> --- >>>>>>> Eric Klein >>>>>>> [email protected] >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> --Artem Belevich >>>>>> >>>>> >> >> -- >> --Artem Belevich >> >
