https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118794
--- Comment #7 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- with this here it is satisfied: // Normalize v T norm=0; // T norm=fabs(gpu_dot_product_w(v,v)); T normc= sqrt(norm); // const T normc=norm; #pragma omp parallel for for (size_t i = 0; i < pext0; ++i) { v(i,pstrv0)= v(i,pstrv0)/normc; } However, gpu_dot_product_w is called before as T dot_pr=gpu_dot_product_w(u,v); which does not throw any nonlocal gotos. it is defined like this: template <typename T> inline T gpu_dot_product_w( const datastruct<T>& vec1, const datastruct<T> &vec2) { const size_t n=vec1.pextents[0]; const size_t strv1=vec1.pstrides[0]; const size_t strv2=vec2.pstrides[0]; T result=0; #pragma omp parallel for reduction(+:result) for (size_t i = 0; i < n; ++i) { result += vec1(i,strv1) * vec2(i,strv2); } return result; } and the operators are these: #pragma omp begin declare target template<typename T> inline T& datastruct<T>::operator()(const size_t row, const size_t stride) { return pdata[row * stride]; } #pragma omp end declare target none of this has anything to do with gotos.... pstrides are pointers to non-stl arrays i do not know what is going on here...