Glad to help! On Thu, Jun 20, 2019 at 4:26 PM Eric Klein <[email protected]> wrote:
> Thank you both very much for the help with this. It's much appreciated. > --- > Eric Klein > [email protected] > > > On Thu, Jun 20, 2019 at 3:46 PM Rasmus Munk Larsen <[email protected]> > wrote: > >> PR is here: >> https://bitbucket.org/eigen/eigen/pull-requests/659/fix-cuda-build-on-mac/diff >> >> >> On Thu, Jun 20, 2019 at 3:12 PM Eric Klein <[email protected]> wrote: >> >>> Thank you both. I appreciate the help with this. >>> --- >>> Eric Klein >>> [email protected] >>> >>> >>> On Thu, Jun 20, 2019 at 3:11 PM Artem Belevich <[email protected]> wrote: >>> >>>> The changes look reasonable to me. Thank you for helping to sort this >>>> out. >>>> >>>> >>>> On Thu, Jun 20, 2019 at 3:05 PM Eric Klein <[email protected]> wrote: >>>> >>>>> Ok. Looks like the warnings are there with and without my hack(s), and >>>>> the minimal set of edits needed to get Eigen to build on Mac with nvcc >>>>> consists of: >>>>> >>>>> Half.h. Change this: >>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate >>>>> support for half floats >>>>> to this: >>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && >>>>> !EIGEN_COMP_NVCC) // Emulate support for half floats >>>>> >>>>> And in PacketMath.h, change this: >>>>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) || >>>>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG) >>>>> to this: >>>>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) || >>>>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) >>>>> >>>>> Obviously that's excluding any enlightening comments about why that's >>>>> being done. >>>>> >>>>> Would you like me to prepare a patch file, or is this something that >>>>> would be better handled by one of the regular contributors? >>>>> >>>>> Thank you! >>>>> --- >>>>> Eric Klein >>>>> [email protected] >>>>> >>>>> >>>>> On Thu, Jun 20, 2019 at 12:01 AM Eric Klein <[email protected]> wrote: >>>>> >>>>>> That appears to work, although there are 2-3 other places that need >>>>>> similar modifications in order to work. I'll try to get you a more >>>>>> complete >>>>>> list tomorrow. >>>>>> >>>>>> I'm paying more attention tonight to warnings coming from Eigen than >>>>>> I had been previously ignoring, and both with my old Macros.h based hack >>>>>> and the newer modifications, I'm seeing some of these: "warning: calling >>>>>> a >>>>>> __host__ function from a __host__ __device__ function is not allowed". A >>>>>> representative one is: >>>>>> >>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(735): >>>>>> warning: calling a __host__ function from a __host__ __device__ function >>>>>> is >>>>>> not allowed >>>>>> detected during: >>>>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>, >>>>>> Device>::evalSubExprsIfNeeded(MakePointer_<Eigen::TensorEvaluator<const >>>>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>, >>>>>> Device>::CoeffReturnType>::Type) [with >>>>>> Op=Eigen::internal::AvgPoolMeanReducer<double>, Dims=const >>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, >>>>>> ArgType=const Eigen::TensorImagePatchOp<-1L, -1L, const >>>>>> Eigen::TensorLayoutSwapOp<const Eigen::TensorMap<Eigen::Tensor<const >>>>>> double, 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, >>>>>> MakePointer_=Eigen::MakePointer, Device=Eigen::GpuDevice]" >>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h(172): >>>>>> here >>>>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, >>>>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const >>>>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, >>>>>> Device>::CoeffReturnType >>>>>> *) [with NewDimensions=const Eigen::DSizes<Eigen::DenseIndex, 4>, >>>>>> ArgType=const >>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>>> const >>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, >>>>>> 16, >>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>, Device=Eigen::GpuDevice]" >>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(146): >>>>>> here >>>>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, >>>>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const >>>>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) >>>>>> [with >>>>>> LeftArgType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, RightArgType=const >>>>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>>> const >>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, >>>>>> 16, >>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>, Device=Eigen::GpuDevice]" >>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(422): >>>>>> here >>>>>> instantiation of "void >>>>>> Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, >>>>>> Vectorizable, >>>>>> Tileable>::run(const Expression &, const Eigen::GpuDevice &) [with >>>>>> Expression=const >>>>>> Eigen::TensorAssignOp<Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, const >>>>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>>> const >>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, >>>>>> 16, >>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>>, Vectorizable=false, >>>>>> Tileable=false]" >>>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35): >>>>>> here >>>>>> instantiation of "Eigen::TensorDevice<ExpressionType, >>>>>> DeviceType> &Eigen::TensorDevice<ExpressionType, >>>>>> DeviceType>::operator=(const OtherDerived &) [with >>>>>> ExpressionType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, >>>>>> DeviceType=tensorflow::GPUDevice, >>>>>> OtherDerived=Eigen::TensorReshapingOp<const >>>>>> Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>>> const >>>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, >>>>>> 16, >>>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>]" >>>>>> ./tensorflow/core/kernels/avgpooling_op.h(42): here >>>>>> instantiation of "void >>>>>> tensorflow::functor::SpatialAvgPooling<Device, T>::operator()(const >>>>>> Device >>>>>> &, tensorflow::TTypes<T, 4, Eigen::DenseIndex>::Tensor, >>>>>> tensorflow::TTypes<T, 4, Eigen::DenseIndex>::ConstTensor, int, int, int, >>>>>> int, const Eigen::PaddingType &) [with Device=tensorflow::GPUDevice, >>>>>> T=double]" >>>>>> tensorflow/core/kernels/avgpooling_op_gpu.cu.cc(38): here >>>>>> >>>>>> >>>>>> I'm not sure how concerned I should be about these. The build will >>>>>> succeed, but... I wouldn't be at all surprised to get weird results >>>>>> eventually. >>>>>> >>>>>> In this particular case, it looks like it's complaining because >>>>>> Eigen::GpuDevice::allocate_temp appears to be __host__ rather than >>>>>> __host__ >>>>>> __device__ (i.e. missing EIGEN_DEVICE_FUNC). I fully admit that I could >>>>>> be >>>>>> misinterpreting that or otherwise misunderstanding something basic. >>>>>> >>>>>> Should I be concerned about these? >>>>>> >>>>>> Thanks! >>>>>> --- >>>>>> Eric Klein >>>>>> [email protected] >>>>>> >>>>>> >>>>>> On Wed, Jun 19, 2019 at 5:21 PM Rasmus Munk Larsen < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Erik, does Artem's suggestion work for you? >>>>>>> >>>>>>> On Wed, Jun 19, 2019 at 2:52 PM Artem Belevich <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 19, 2019 at 1:47 PM Rasmus Munk Larsen < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> It looks like we broke the Eigen Cuda build on Mac. What do you >>>>>>>>> think about his workaround? >>>>>>>>> >>>>>>>>> ---------- Forwarded message --------- >>>>>>>>> From: Eric Klein <[email protected]> >>>>>>>>> Date: Wed, Jun 19, 2019 at 1:39 PM >>>>>>>>> Subject: [eigen] Mac CUDA build failure question >>>>>>>>> To: <[email protected]> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>> >>>>>>>>> I posted a question on the forums several days back, but suspect >>>>>>>>> that might not be the right place to be asking what I'm asking, so I'm >>>>>>>>> trying the mailing list as well. >>>>>>>>> >>>>>>>>> I'll just repost here what I put in the forums, but the link to >>>>>>>>> that is here: https://forum.kde.org/viewtopic.php?f=74&t=161199 >>>>>>>>> >>>>>>>>> I'm trying to build Eigen on Mac for CUDA (using the nvcc >>>>>>>>> compiler), and getting build errors. I understand the errors, and I >>>>>>>>> have a >>>>>>>>> change that lets me dodge the build failures, but I suspect it's not >>>>>>>>> the >>>>>>>>> right change for checkin, and so I'm looking for feedback. >>>>>>>>> >>>>>>>>> So the issue I have is in Half.h. I wind up getting errors about a >>>>>>>>> bunch of operators being already defined. The core issue is that on >>>>>>>>> Mac, >>>>>>>>> nvcc (the CUDA compliler) is using gcc as the host compiler, but gcc >>>>>>>>> on Mac >>>>>>>>> is built on top of clang. Eigen seems to be implicitly assuming that >>>>>>>>> the >>>>>>>>> presence of clang implies that absence of CUDA (or at least the >>>>>>>>> absence of >>>>>>>>> nvcc CUDA support). >>>>>>>>> >>>>>>>>> In my build I'm hitting this block: >>>>>>>>> >>>>>>>>> #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \ >>>>>>>>> EIGEN_CUDA_ARCH >= 530) || \ >>>>>>>>> (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE)) >>>>>>>>> #define EIGEN_HAS_NATIVE_FP16 >>>>>>>>> #endif >>>>>>>>> >>>>>>>>> which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind >>>>>>>>> up compiling in all the operators from Half.h:253-313. That's fine so >>>>>>>>> far. >>>>>>>>> >>>>>>>> >>>>>>>> This assumes device-side compilation. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> What happens next is we hit this line: >>>>>>>>> >>>>>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // >>>>>>>>> Emulate support for half floats >>>>>>>>> >>>>>>>>> which is followed shortly after by (roughly) the same operator >>>>>>>>> functions (but... emulated), and I get errors because those operator >>>>>>>>> functions were defined above. >>>>>>>>> >>>>>>>> >>>>>>>> If Clang were CUDA compiler, that would not be a problem. This >>>>>>>> implies that it's a CUDA compilation with NVCC. What puzzles me is how >>>>>>>> did >>>>>>>> we end up with EIGEN_COMP_CLANG defined for the *device* side of >>>>>>>> the compilation. I suspect it's the side effect of NVCC doing >>>>>>>> device-side >>>>>>>> preprocessing with clang, but actually compiling with cicc, which is >>>>>>>> obviously not clang. >>>>>>>> >>>>>>>> I guess what we need to do here is something like this: >>>>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && ! >>>>>>>> EIGEN_COMP_NVCC) >>>>>>>> >>>>>>>> That, and a comment explaining what's going on. >>>>>>>> >>>>>>>> If that does not help, it would be great to compile with '-keep >>>>>>>> -verbose' and check which compilation phase is failing and what >>>>>>>> exactly is >>>>>>>> it trying to compile. >>>>>>>> >>>>>>>> --Artem >>>>>>>> >>>>>>>> >>>>>>>>> So. My hack to work around this is to ensure that EIGEN_COMP_CLANG >>>>>>>>> gets set to 0 in Macros.h if __NVCC__ is defined. That works fine for >>>>>>>>> me >>>>>>>>> locally, and gets Eigen building fine (and thus unblocks me on getting >>>>>>>>> TensorFlow building for Mac, or at least unblocks this issue). >>>>>>>>> >>>>>>>>> I'm willing to bet however that this is the wrong thing to do in >>>>>>>>> general. I don't understand enough of what this second code block is >>>>>>>>> doing >>>>>>>>> to really understand why clang is being treated differently than nvcc >>>>>>>>> here >>>>>>>>> (and specifically why half support needs to be emulated in the >>>>>>>>> presence of >>>>>>>>> clang). I believe there is a version of clang that supports CUDA (at >>>>>>>>> least >>>>>>>>> on some platforms?). Presumably this is for that, but I don't know >>>>>>>>> enough >>>>>>>>> about how that differs from nvcc to fully grok this. >>>>>>>>> >>>>>>>>> Can anyone help enlighten me about the best way to fix this? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> --- >>>>>>>>> Eric Klein >>>>>>>>> [email protected] >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> --Artem Belevich >>>>>>>> >>>>>>> >>>> >>>> -- >>>> --Artem Belevich >>>> >>>
