-fopt-info handling

2016-06-27 Thread Ulrich Drepper
The manual says about -fop-info:

   If OPTIONS is omitted, it defaults to 'all-all', which means
dump all available optimization info from all the passes.

The current implementation (at at least recent gcc 6.1) don't follow
that, though.  They just ignore the option in that case.

How about the attached patch?  It is simple and doesn't duplicate the
information what "all-all" means and instead let's the option parser
do the hard work.


d-gcc-opt-info
Description: Binary data


Re: -fopt-info handling

2016-07-04 Thread Ulrich Drepper
Anyone?

On Mon, Jun 27, 2016 at 1:31 PM, Ulrich Drepper  wrote:
> The manual says about -fop-info:
>
>If OPTIONS is omitted, it defaults to 'all-all', which means
> dump all available optimization info from all the passes.
>
> The current implementation (at at least recent gcc 6.1) don't follow
> that, though.  They just ignore the option in that case.
>
> How about the attached patch?  It is simple and doesn't duplicate the
> information what "all-all" means and instead let's the option parser
> do the hard work.


Re: -fopt-info handling

2016-07-06 Thread Ulrich Drepper
On Tue, Jul 5, 2016 at 6:06 AM, Richard Biener
 wrote:
> I don't think all-all is a useful default.  "optimized" may be though.

I relied on old documentation installed on one of my system.
Apparently the default changed to optimized-optall.  So, no change to
the documentation needed if the general opinion is that this is a sane
default and the following patch actually installs a default behavior.


d-gcc-opt-info2
Description: Binary data


Re: skip Cholesky decomposition in is>>n_mv_dist

2019-08-09 Thread Ulrich Drepper
On Fri, Aug 9, 2019 at 9:50 AM Alexandre Oliva  wrote:

> normal_mv_distribution maintains the variance-covariance matrix param
> in Cholesky-decomposed form.  Existing param_type constructors, when
> taking a full or lower-triangle varcov matrix, perform Cholesky
> decomposition to convert it to the internal representation.  This
> internal representation is visible both in the varcov() result, and in
> the streamed-out representation of a normal_mv_distribution object.
>
> […]
>


> Tested on x86_64-linux-gnu.  Ok to install?
>

Yes.  Thanks.


Re: [PATCH V4 02/11] opt-functions.awk: fix comparison of limit, begin and end

2019-08-27 Thread Ulrich Drepper
On Wed, Aug 28, 2019 at 1:47 AM Jose E. Marchesi 
wrote:

>  function integer_range_info(range_option, init, option)
>  {
>  if (range_option != "") {
> -   start = nth_arg(0, range_option);
> -   end = nth_arg(1, range_option);
> +   init = init + 0;
> +   start = nth_arg(0, range_option) + 0;
> +   end = nth_arg(1, range_option) + 0;
> if (init != "" && init != "-1" && (init < start || init > end))


In this case the test for init != "" is at least unnecessary.

Maybe something else has to be used.  I didn't trace the uses but if init
is deliberately set to "" then the test would have to be replaced with init
!= 0.


Re: PR83750: CSE erf/erfc pair

2018-11-02 Thread Ulrich Drepper
On Fri, Nov 2, 2018 at 10:36 AM Prathamesh Kulkarni
 wrote:
> So, the patch adds another transform erf(x) > 1 -> 0
> which resolves the regression.

Why don't you match for any constant with absolute value >= 1.0
instead of just 1.0?


deprecations in OpenMP 5.0

2019-01-02 Thread Ulrich Drepper
Should we mark the symbols that are deprecated in OpenMP 5.0 as such in
the header?  Yes, this will break code that uses the symbols and -Werror
but this is the standard writers intend, right?  It's easy enough to
work around for the time being.

Aside from the header changes the files implementing the
omp_[gs]et_nested functions had to be changed.  I just use the pragma to
disable the warning temporarily instead of a more global option like
using -Wno-deprecated-declarations in the Makefile.

What do people think about this?


2019-01-02  Ulrich Drepper  

   Newly deprecated symbols in OpenMP 5.0.
   * omp.h.in (__GOMP_DEPRECATED): Define.
   Make omp_lock_hint_* enum values, omp_lock_hint_t, omp_set_nested,
   and omp_get_nested with __GOMP_DEPRECATED.
   * fortran.c: Wrap uses of omp_set_nested and omp_get_nested with
   pragmas to ignore -Wdeprecated-declarations warnings.
   * icv.c: Likewise.
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 4d544be1c99..24d361157f0 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -47,10 +47,13 @@ ialias_redirect (omp_test_lock)
 ialias_redirect (omp_test_nest_lock)
 # endif
 ialias_redirect (omp_set_dynamic)
-ialias_redirect (omp_set_nested)
-ialias_redirect (omp_set_num_threads)
 ialias_redirect (omp_get_dynamic)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+ialias_redirect (omp_set_nested)
 ialias_redirect (omp_get_nested)
+#pragma GCC diagnostic pop
+ialias_redirect (omp_set_num_threads)
 ialias_redirect (omp_in_parallel)
 ialias_redirect (omp_get_max_threads)
 ialias_redirect (omp_get_num_procs)
@@ -276,6 +279,8 @@ omp_set_dynamic_8_ (const int64_t *set)
   omp_set_dynamic (!!*set);
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
 void
 omp_set_nested_ (const int32_t *set)
 {
@@ -287,6 +292,7 @@ omp_set_nested_8_ (const int64_t *set)
 {
   omp_set_nested (!!*set);
 }
+#pragma GCC diagnostic pop
 
 void
 omp_set_num_threads_ (const int32_t *set)
@@ -306,11 +312,14 @@ omp_get_dynamic_ (void)
   return omp_get_dynamic ();
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
 int32_t
 omp_get_nested_ (void)
 {
   return omp_get_nested ();
 }
+#pragma GCC diagnostic pop
 
 int32_t
 omp_in_parallel_ (void)
diff --git a/libgomp/icv.c b/libgomp/icv.c
index 095d57a93b1..af0f4c0596e 100644
--- a/libgomp/icv.c
+++ b/libgomp/icv.c
@@ -51,6 +51,8 @@ omp_get_dynamic (void)
   return icv->dyn_var;
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
 void
 omp_set_nested (int val)
 {
@@ -64,6 +66,7 @@ omp_get_nested (void)
   struct gomp_task_icv *icv = gomp_icv (false);
   return icv->nest_var;
 }
+#pragma GCC diagnostic pop
 
 void
 omp_set_schedule (omp_sched_t kind, int chunk_size)
@@ -198,10 +201,13 @@ omp_get_partition_place_nums (int *place_nums)
 }
 
 ialias (omp_set_dynamic)
-ialias (omp_set_nested)
-ialias (omp_set_num_threads)
 ialias (omp_get_dynamic)
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+ialias (omp_set_nested)
 ialias (omp_get_nested)
+#pragma GCC diagnostic pop
+ialias (omp_set_num_threads)
 ialias (omp_set_schedule)
 ialias (omp_get_schedule)
 ialias (omp_get_max_threads)
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index d7ac71400ad..060ee374829 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -26,6 +26,13 @@
 #ifndef _OMP_H
 #define _OMP_H 1
 
+
+#ifdef __GNUC__
+# define __GOMP_DEPRECATED __attribute__((__deprecated__))
+#else
+# define __GOMP_DEPRECATED
+#endif
+
 #ifndef _LIBGOMP_OMP_LOCK_DEFINED
 #define _LIBGOMP_OMP_LOCK_DEFINED 1
 /* These two structures get edited by the libgomp build process to 
@@ -66,18 +73,18 @@ typedef enum omp_proc_bind_t
 typedef enum omp_sync_hint_t
 {
   omp_sync_hint_none = 0,
-  omp_lock_hint_none = omp_sync_hint_none,
+  omp_lock_hint_none __GOMP_DEPRECATED = omp_sync_hint_none,
   omp_sync_hint_uncontended = 1,
-  omp_lock_hint_uncontended = omp_sync_hint_uncontended,
+  omp_lock_hint_uncontended __GOMP_DEPRECATED = omp_sync_hint_uncontended,
   omp_sync_hint_contended = 2,
-  omp_lock_hint_contended = omp_sync_hint_contended,
+  omp_lock_hint_contended __GOMP_DEPRECATED = omp_sync_hint_contended,
   omp_sync_hint_nonspeculative = 4,
-  omp_lock_hint_nonspeculative = omp_sync_hint_nonspeculative,
+  omp_lock_hint_nonspeculative __GOMP_DEPRECATED = 
omp_sync_hint_nonspeculative,
   omp_sync_hint_speculative = 8,
-  omp_lock_hint_speculative = omp_sync_hint_speculative
+  omp_lock_hint_speculative __GOMP_DEPRECATED = omp_sync_hint_speculative
 } omp_sync_hint_t;
 
-typedef omp_sync_hint_t omp_lock_hint_t;
+typedef __GOMP_DEPRECATED omp_sync_hint_t omp_lock_hint_t;
 
 typedef struct __attribute__((__aligned__ (sizeof (void * omp_depend_t
 {
@@ -108,8 +115,8 @@ extern int omp_in_para

Re: deprecations in OpenMP 5.0

2019-01-02 Thread Ulrich Drepper
On 1/2/19 6:21 PM, Jakub Jelinek wrote:
> As we aren't implementing OpenMP 5.0 fully yet and especially because
> we aren't implementing the new nesting ICV semantics, we shouldn't do it now,
> they are valid in OpenMP 4.5.

OK, that applies to omp_[gs]et_nested.

How about the lock symbols?  All the sync symbols are already defined as
aliases (or more correctly, the other way around).  The sooner people
change, the better, and at least those parts of OpenMP5 are "implemented".




signature.asc
Description: OpenPGP digital signature


warnings about unused shared_ptr/unique_ptr comparisons

2019-01-14 Thread Ulrich Drepper
This is a conservative implementation of a patch to make
shared/unique_ptrs behave more like plain old pointers.  More about this
in bug #88738

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88738

The summary is

- using clang, which enables a warning for unused results of all
comparison operation, found a real bug

- a library implementation is limited in scope and tedious to add
everywhere. At this stage of gcc 9 it was the only acceptable solution,
though

- longer term there should be a warning for comparison operators.
Possibly on by default with the possibility to disable it with an
attribute (see the discussion in the bug).


The patch proposed here only changes the code for C++17 and up to use
the [[nodiscard]] attribute.  For gcc 10 we can either widen this or
implement a better way with the help of the compiler.

I ran the regression test suite and didn't see any additional failures.

OK?

libstdc++-v3/
2019-02-14  Ulrich Drepper  

PR libstdc++/88738
Warn about unused comparisons of shared_ptr/unique_ptr
* include/bits/c++config [_GLIBCXX_NODISCARD]: Define.
* include/bits/shared_ptr.h: Use it for operator ==, !=,
<, <=, >, >= for shared_ptr.
* include/bits/unique_ptr.h: Likewise for unique_ptr.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 9b2fabd7d76..97bb6db70b1 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -99,6 +99,14 @@
 # define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11")))
 #endif
 
+// Macro to warn about unused results.
+#if __cplusplus >= 201703L
+# define _GLIBCXX_NODISCARD [[__nodiscard__]]
+#else
+# define _GLIBCXX_NODISCARD
+#endif
+
+
 
 #if __cplusplus
 
diff --git a/libstdc++-v3/include/bits/shared_ptr.h 
b/libstdc++-v3/include/bits/shared_ptr.h
index 99009ab4f99..d504627d1a0 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -380,37 +380,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // 20.7.2.2.7 shared_ptr comparisons
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator==(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return __a.get() == __b.get(); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator==(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return !__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator==(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 { return !__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator!=(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return __a.get() != __b.get(); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator!=(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return (bool)__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator!=(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 { return (bool)__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 {
   using _Tp_elt = typename shared_ptr<_Tp>::element_type;
@@ -420,7 +420,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 {
   using _Tp_elt = typename shared_ptr<_Tp>::element_type;
@@ -428,7 +428,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 {
   using _Tp_elt = typename shared_ptr<_Tp>::element_type;
@@ -436,47 +436,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<=(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return !(__b < __a); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<=(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return !(nullptr < __a); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<=(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 { return !(__a < nullptr); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return (__b < __a); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return nullptr < 

Implement C++20 feature P0600r1: nodiscard in the library

2019-01-20 Thread Ulrich Drepper
Since a previous patch introduced the _GLIBCXX_NODISCARD macro it is now
simple to implement the rest of P0600.  The parts specific to C++20 were
already added, this patch adds the attribute to the other functions.

Even though the feature specifies the nodiscard attribute only for C++20
it makes no sense to restrict its use.  Code which provokes the warning
but uses C++17 also has a problem.  I do not propose to go beyond C++17
yet.  Just as for the shared_ptr comparison patch, let's first see how
things go and then we can extend the attribute to previous revisions.

If a code base is compiled with multiple compilers chances are that no
problem will be flagged.  Other library implementations are much more
aggressive with the use of this flag.  In the gcc code base on one test
case was flagged and that code sequence obviously should never appear in
real code (allocate and discard the result).

The test suite shows no additional errors and given that only the
functions mentioned in the feature (allocate, empty, new, async) are
changed I think this is a patch which can be applied to the trunk.  It
can catch real mistakes.

OK?

gcc/testsuite/ChangeLog
2019-02-20  Ulrich Drepper  

Fix after P0600.
* g++.dg/init/new39.C: Don't just ignore result of new.

libstdc++/ChangeLog
2019-02-20  Ulrich Drepper  

Implement C++20 P0600r1.
* include/backward/hash_map: Add nodiscard attribute to empty.
* include/backward/hash_set: Likewise.
* backward/hashtable.h: Likewise.
* include/bits/basic_string.h: Likewise.
* include/bits/forward_list.h: Likewise.
* include/bits/hashtable.h: Likewise.
* include/bits/regex.h: Likewise.
* include/bits/stl_deque.h: Likewise.
* include/bits/stl_list.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_queue.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_stack.h: Likewise.
* include/bits/stl_tree.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/debug/array: Likewise.
* include/experimental/any: Likewise.
* include/experimental/bits/fs_path.h: Likewise.
* include/experimental/internet: Likewise.
* include/experimental/string_view: Likewise.
* include/ext/pb_ds/detail/bin_search_tree_/info_fn_imps.hpp:
Likewise.
* include/ext/pb_ds/detail/binary_heap_/binary_heap_.hpp:
Likewise.
* include/ext/pb_ds/detail/binary_heap_/info_fn_imps.hpp:
Likewise.
* include/ext/pb_ds/detail/cc_hash_table_map_/cc_ht_map_.hpp:
Likewise.
* include/ext/pb_ds/detail/cc_hash_table_map_/info_fn_imps.hpp:
Likewise.
* include/ext/pb_ds/detail/cc_hash_table_map_/size_fn_imps.hpp:
Likewise.
* include/ext/pb_ds/detail/gp_hash_table_map_/gp_ht_map_.hpp:
Likewise.
* include/ext/pb_ds/detail/gp_hash_table_map_/info_fn_imps.hpp:
Likewise.
* 
include/ext/pb_ds/detail/left_child_next_sibling_heap_/info_fn_imps.hpp:
Likewise.
*
include/ext/pb_ds/detail/left_child_next_sibling_heap_/left_child_next_sibling_heap_.hpp:
Likewise.
* include/ext/pb_ds/detail/list_update_map_/info_fn_imps.hpp:
Likewise.
* include/ext/pb_ds/detail/list_update_map_/lu_map_.hpp:
Likewise.
* include/ext/pb_ds/detail/ov_tree_map_/info_fn_imps.hpp:
Likewise.
* include/ext/pb_ds/detail/ov_tree_map_/ov_tree_map_.hp:
Likewise.
* include/ext/pb_ds/detail/pat_trie_/info_fn_imps.hpp:
Likewise.
* include/ext/pb_ds/detail/pat_trie_/pat_trie_.hpp:
Likewise.
* include/ext/pb_ds/detail/rc_binomial_heap_/rc.hpp:
Likewise.
* include/ext/pb_ds/detail/tree_trace_base.hpp: Likewise.
* include/ext/pb_ds/trie_policy.hpp: Likewise.
* include/ext/rope: Likewise.
* include/ext/slist: Likewise.
* include/ext/vstring.h: Likewise.
* include/profile/array: Likewise.
* include/std/array: Likewise.
* include/tr1/array: Likewise.
* include/tr1/hashtable.h: Likewise.
* include/tr1/regex: Likewise.
* include/tr2/dynamic_bitset: Likewise.
* include/bits/alloc_traits.h: Add nodiscard attribute to
allocate.
* include/experimental/memory_resource: Likewise.
* include/ext/alloc_traits.h: Likewise.
* include/ext/array_allocator.h: Likewise.
* include/ext/bitmap_allocator.h: Likewise.
* include/ext/debug_allocator.h: Likewise.
* include/ext/extptr_allocator.h: Likewise.
* include/ext/mt_allocator.h: Likewise.
* i

incorrect parsing of -fopt-info

2019-01-21 Thread Ulrich Drepper
There is a problem with parsing the second part of the -fopt-info
command line parameter in case there is an equal sign followed by a
filename with a dash:

$ g++ -c -O -fopt-info-all=some-file u.cc
cc1plus: warning: unknown option ‘all=some’ in ‘-fopt-info-all=some-file’
cc1plus: error: unrecognized command line option ‘-fopt-info-all=some-file’

The code looks for a '-' and a '=' concurrently but does not ignore the
'-' if it is part of the filename specified after the '='.  The patch
below fixes this.  I also changed the second 'if' into 'else if' which
is clearly always the case but the current code makes it unnecessarily
cumbersome to understand.

This is a highly annoying bug in the right circumstance. I have file
names generated based in the source file name and those include in some
situations dashes.

OK for trunk?

gcc/ChangeLog
2019-01-21  Ulrich Drepper  

* dumpfile.c (opt_info_switch_p_1): Ignore '-' if it appears
after the '='.


diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index c92bba8efd1..14b6dfea75e 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -1915,10 +1915,9 @@ opt_info_switch_p_1 (const char *arg, dump_flags_t 
*flags,
   end_ptr = strchr (ptr, '-');
   eq_ptr = strchr (ptr, '=');
 
-  if (eq_ptr && !end_ptr)
+  if (eq_ptr && (!end_ptr || eq_ptr < end_ptr))
 end_ptr = eq_ptr;
-
-  if (!end_ptr)
+  else if (!end_ptr)
end_ptr = ptr + strlen (ptr);
   length = end_ptr - ptr;
 


signature.asc
Description: OpenPGP digital signature


JIT breakage after last builtin-types change

2015-09-30 Thread Ulrich Drepper
After some recent additions to builtin-types.def the jit user of the
definitions hasn't been updated.  OK to apply?


2015-09-30  Ulrich Drepper  

* jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
* jit-builtins.h: Likewise.


 jit-builtins.c |5 +
 jit-builtins.h |3 +++
 2 files changed, 8 insertions(+)

diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c
index a29f446..8a89915 100644
--- a/gcc/jit/jit-builtins.c
+++ b/gcc/jit/jit-builtins.c
@@ -320,6 +320,10 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, \
  ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6) \
+  case ENUM: return make_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, \
+ ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
ARG6, ARG7) \
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, \
@@ -350,6 +354,7 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
diff --git a/gcc/jit/jit-builtins.h b/gcc/jit/jit-builtins.h
index fdf1323..8854326 100644
--- a/gcc/jit/jit-builtins.h
+++ b/gcc/jit/jit-builtins.h
@@ -50,6 +50,8 @@ enum jit_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
ARG6, ARG7) NAME,
 #define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
@@ -72,6 +74,7 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE


Re: JIT breakage after last builtin-types change

2015-09-30 Thread Ulrich Drepper
Except that this is missing:

diff --git a/gcc/jit/jit-builtins.h b/gcc/jit/jit-builtins.h
index 0b6f974..3d76247 100644
--- a/gcc/jit/jit-builtins.h
+++ b/gcc/jit/jit-builtins.h
@@ -72,6 +72,7 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_POINTER_TYPE
   BT_LAST


On Wed, Sep 30, 2015 at 9:09 AM, Jakub Jelinek  wrote:
> On Wed, Sep 30, 2015 at 09:05:45AM -0400, Ulrich Drepper wrote:
>> After some recent additions to builtin-types.def the jit user of the
>> definitions hasn't been updated.  OK to apply?
>>
>>
>> 2015-09-30  Ulrich Drepper  
>>
>>   * jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
>>   * jit-builtins.h: Likewise.
>
> https://gcc.gnu.org/viewcvs?rev=228289&root=gcc&view=rev should fix this
> already.
>
> Jakub


[PATCH] fix URL

2015-03-02 Thread Ulrich Drepper
A trivial patch to fix the URL of my paper referenced in the
documentation.  The RH server isn't the canonical address even though it
will automatically redirect to the correct address.  At least for now,
who knows.

OK?


gcc/ChangeLog

2015-03-02  Ulrich Drepper  

* doc/invoke.texi (Options for Code Generation Conventions):
Fix URL of DSO paper.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a87376e..6e7cc82 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -23644,7 +23644,7 @@ GCC@.
 A good explanation of the benefits offered by ensuring ELF
 symbols have the correct visibility is given by ``How To Write
 Shared Libraries'' by Ulrich Drepper (which can be found at
-@w{@uref{http://people.redhat.com/~drepper/}})---however a superior
+@w{@uref{http://www.akkadia.org/drepper/}})---however a superior
 solution made possible by this option to marking things hidden when
 the default is public is to make the default hidden and mark things
 public.  This is the norm with DLLs on Windows and with 
@option{-fvisibility=hidden}


Re: Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)

2015-06-29 Thread Ulrich Drepper
On Mon, Jun 29, 2015 at 5:26 PM, David Malcolm  wrote:
> I'm looking at ways to manage libgccjit API/ABI as per this thread:
> https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html
> by transitioning to using symbol versioning, so that the linker can tag
> subsets of libgccjit symbols in both libgccjit and in client binaries.

You don't have to bump the SONAME to introduce symbol versioning.
glibc in the beginning didn't have symbol versioning and we wrote the
linker and dynamic linker support so that no SONAME change was
necessary.  The idea is that unversioned symbols are satisfied by the
oldest symbol version.


Re: [i386] Replace builtins with vector extensions

2014-06-28 Thread Ulrich Drepper
On Sat, Jun 28, 2014 at 6:42 AM, Marc Glisse  wrote:
> Ping,
>
> nobody has an opinion on this? Or some explanation why I am mistaken to
> believe that #pragma target makes it safer now?
>
> It would enable a number of optimizations, like constant propagation, FMA
> contraction, etc. It would also allow us to remove several builtins.

I see no problem with using the array-type access to the registers.

As for replacing the builtins with arithmetic operators: I appreciate
the possibility for optimization.  But is there any chance the calls
could not end up being implemented with a vector instruction?  I think
that would be bad.  The intrinsics should be a way to guarantee that
the programmer can create vector instructions.  Otherwise we might
just not support them.


Re: [i386] Replace builtins with vector extensions

2014-06-29 Thread Ulrich Drepper
On Sat, Jun 28, 2014 at 6:53 PM, Marc Glisse  wrote:
> There is always a risk, but then even with builtins I think there was a
> small risk that an RTL optimization would mess things up. It is indeed
> higher if we expose the operation to the optimizers earlier, but it would be
> a bug if an "optimization" replaced a vector operation by something worse.
> Also, I am only proposing to handle the most trivial operations this way,
> not more complicated ones (like v[0]+=s) where we would be likely to fail
> generating the right instruction. And the pragma should ensure that the
> function will always be compiled in a mode where the vector instruction is
> available.
>
> ARM did the same and I don't think I have seen a bug reporting a regression
> about it (I haven't really looked though).

I think the Arm definitions come from a different angle.  It's new,
there is no assumed semantics.  For the x86 intrinsics Intel defines
that _mm_xxx() generates one of a given opcodes if there is a match.
If I want to generate a specific code sequence I use the intrinsics.
Otherwise I could already today use the vector type semantics myself.

Don't get me wrong, I like the idea to have the optimization of the
intrinsics happening.  But perhaps not unconditionally or at least not
without preventing them.

I know this will look ugly, but how about a macro
__GCC_X86_HONOR_INTRINSICS to enable the current code and have by
default your proposed use of the vector arithmetic in place?  This
wouldn't allow removing support for the built-ins but it would also
open the door to some more risky optimizations to be enabled by
default.


Re: [PATCH, libstdc++] Add the logistic distribution as an extension

2014-07-10 Thread Ulrich Drepper
On Thu, Jul 10, 2014 at 4:07 AM, Ed Smith-Rowland <3dw...@verizon.net> wrote:
> The title says it all.
>
> I've been bootstrapping and testing with this on x86_64-linux for a month.
>
> OK?

Looks good to me.


[PATCH] libstdc++: add uniform on sphere distribution

2014-07-12 Thread Ulrich Drepper
Ed's submission of the logistic regression distribution caused problems
for me because, like Ed, I have changes to the  header in my
tree for a long time.  Time to submit them.

This first one is a new distribution.  It generates coordinates for
random points on a unit sphere in arbitrarily many dimensions.  This
distribution by itself is useful but if I get some other code fully
implemented it will also form the basis for yet another, more
sophisticated distribution.

The patch is tested against the current tree without causing additional
problems.

OK?


2014-07-12  Ulrich Drepper  

* include/ext/random: Add uniform_on_sphere_distribution definition.
* include/ext/random.tcc: Add out-of-band member function definitions
for uniform_on_sphere_distribution.
* libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/
cons/default.cc: New file.
*
* libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/
operators/equal.cc: New file.
*
* libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/
operators/inequal.cc: New file.
*
* libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/
operators/serialize.cc: New file.
*
* libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/
operators/values.cc: New file.


diff --git a/libstdc++-v3/include/ext/random b/libstdc++-v3/include/ext/random
--- a/libstdc++-v3/include/ext/random
+++ b/libstdc++-v3/include/ext/random
@@ -36,6 +36,7 @@
 #else
 
 #include 
+#include 
 #include 
 #include 
 #ifdef __SSE2__
@@ -3293,6 +3294,196 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const logistic_distribution<_RealType1>& __d2)
 { return !(__d1 == __d2); }
 
+
+  /**
+   * @brief A distribution for random coordinates on a unit sphere.
+   *
+   * The method used in the generation function is attributed by Donald Knuth
+   * to G. W. Brown, Modern Mathematics for the Engineer (1956).
+   */
+  template
+class uniform_on_sphere_distribution
+{
+  static_assert(std::is_floating_point<_RealType>::value,
+   "template argument not a floating point type");
+  static_assert(_Dimen != 0, "dimension is zero");
+
+public:
+  /** The type of the range of the distribution. */
+  typedef std::array<_RealType, _Dimen> result_type;
+  /** Parameter type. */
+  struct param_type
+  {
+   explicit
+   param_type()
+   { }
+
+   friend bool
+   operator==(const param_type& __p1, const param_type& __p2)
+   { return true; }
+  };
+
+  /**
+   * @brief Constructs a uniform on sphere distribution.
+   */
+  explicit
+  uniform_on_sphere_distribution()
+  : _M_param(), _M_n(_RealType(0), _RealType(1))
+  { }
+
+  explicit
+  uniform_on_sphere_distribution(const param_type& __p)
+  : _M_param(__p), _M_n(_RealType(0), _RealType(1))
+  { }
+
+  /**
+   * @brief Resets the distribution state.
+   */
+  void
+  reset()
+  { }
+
+  /**
+   * @brief Returns the parameter set of the distribution.
+   */
+  param_type
+  param() const
+  { return _M_param; }
+
+  /**
+   * @brief Sets the parameter set of the distribution.
+   * @param __param The new parameter set of the distribution.
+   */
+  void
+  param(const param_type& __param)
+  { _M_param = __param; }
+
+  /**
+   * @brief Returns the greatest lower bound value of the distribution.
+   * This function makes no sense for this distribution.
+   */
+  result_type
+  min() const
+  {
+   result_type __res;
+   __res.fill(0);
+   return __res;
+  }
+
+  /**
+   * @brief Returns the least upper bound value of the distribution.
+   * This function makes no sense for this distribution.
+   */
+  result_type
+  max() const
+  {
+   result_type __res;
+   __res.fill(0);
+   return __res;
+  }
+
+  /**
+   * @brief Generating functions.
+   */
+  template
+   result_type
+   operator()(_UniformRandomNumberGenerator& __urng)
+   { return this->operator()(__urng, _M_param); }
+
+  template
+   result_type
+   operator()(_UniformRandomNumberGenerator& __urng,
+  const param_type& __p);
+
+  template
+   void
+   __generate(_ForwardIterator __f, _ForwardIterator __t,
+  _UniformRandomNumberGenerator& __urng)
+   { this->__generate(__f, __t, __urng, this->param()); }
+
+  template
+   void
+   __generate(_ForwardIterator __f, _ForwardIterator __t,
+  _UniformRandomNumberGenerator& __urng,
+  const param_type& __p)
+   { this->__generate_impl(__f, __t, _

Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-12 Thread Ulrich Drepper
On Sat, Jul 12, 2014 at 10:00 PM, Ulrich Drepper  wrote:
> The patch is tested against the current tree without causing additional
> problems.
>
> OK?

Ignore the values.cc test case, it's a left-over from copying existing
tests and modifying them for a new distribution.  This test does not
apply to the new distribution and therefore the entire file is gone.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-13 Thread Ulrich Drepper
On Sun, Jul 13, 2014 at 5:24 AM, Paolo Carlini  wrote:
> are these dummy implementations intended?

Yes.  There is no state.  The only parameter is the dimensionality
which is a template parameter.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-13 Thread Ulrich Drepper
On Sun, Jul 13, 2014 at 9:55 AM, Ed Smith-Rowland <3dw...@verizon.net> wrote:
> So I would just serialize _M_n here.

It has fixed parameters. This would mean unnecessary work.  When you
try to use the parameter of the sphere distribution the normal
distribution will be reset.  So there really is no need here.

The only problem would be if code couldn't handle the operators not
writing/reading anything  But I haven't seen anything like that.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-13 Thread Ulrich Drepper
On Sun, Jul 13, 2014 at 11:43 AM, Paolo Carlini
 wrote:
> and I think: the normal distributions in x and y do have a non-trivial state
> (_M_saved, _M_saved_available) which, at any given moment, is different in x
> and y. Then the trivial inserter of x is called and the trivial extractor of
> y is called, nothing changes in y. I don't see how the following invocations
> of y(g) can produce the same sequence of numbers that would be produced by
> invocations of x(g).

Remember: we are talking about distributions, not RNGs.

The distribution has no parameters so given the same input (i.e.,
random byte sequences) it will create the same output all the time.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-13 Thread Ulrich Drepper
On Sun, Jul 13, 2014 at 12:07 PM, Paolo Carlini
 wrote:
> Sorry, I still don't get it. When operator() of x and y, two
> uniform_on_sphere_distribution, call _M_n(__urng) and those _M_n have a
> different state, the numbers produced are in general different.

Correct.  But in the case of this distribution once you have the
random numbers the remainder of the work is done by a fixed formula:

   v = (N(0,1), ..., N(0,1))

   result = v / ||v||_2

That's it, nothing else to be done.

If you have two calls of operator() of two different uniform_on_sphere
objects and you pass to each an RNG object in exactly the same state
you will get the same result.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-13 Thread Ulrich Drepper
On Sun, Jul 13, 2014 at 12:25 PM, Paolo Carlini
 wrote:
> I don't think so. It depends on the past of the two different
> uniform_on_sphere: each time each uniform_on_sphere calls _M_n(__urng) the
> state of *its own* _M_n changes, evolves from the initial state.

I indeed should use the normal_distribution operator<< and operator>>
but I think for a different reason than you think.  The way the
normal_distribution is implemented produces two values at a time and
saves the second for a latter call.  So, yes, that implicit state has
to be preserved and I should have followed what Ed said.

But your 4th and 7th call example by itself is not a reason.  Again,
the input exclusively determined by the random numbers.  Here, of
course, the 4th and 7th use will produce different results.  But this
is not what the state of the distribution is supposed to capture.  For
that you'll have to save the state of the RNG as well.


I've checked in a patch to save the _M_n state.

Thanks.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-16 Thread Ulrich Drepper
On Wed, Jul 16, 2014 at 11:01 AM, Paolo Carlini
 wrote:
> Right. And reset too. I'm going to test and apply the below.

Sorry for not reacting to this, I was away from the machines I could
have done that work on.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-07-24 Thread Ulrich Drepper
On Wed, Jul 23, 2014 at 6:29 AM, Jonathan Wakely  wrote:
> As an aside, we already have divide-by-zero bugs in , it
> would be nice if someone could look at that.

I'll take a look at this soon.


various _mm512_set* intrinsics

2014-03-27 Thread Ulrich Drepper
Here are more intrinsics that are missing.  I know that gcc currently
generates horrible code for most of them but I think it's more important
to have the API in place, albeit non-optimal.  Maybe this entices some
one to add the necessary optimizations.

The code is self-contained and shouldn't interfere with any correct
code.  Should this also go into 4.9?

2014-03-27  Ulrich Drepper  

* config/i386/avx512fintrin.h (__v32hi): Define type.
(__v64qi): Likewise.
(_mm512_set1_epi8): Define.
(_mm512_set1_epi16): Define.
(_mm512_set4_epi32): Define.
(_mm512_set4_epi64): Define.
(_mm512_set4_pd): Define.
(_mm512_set4_ps): Define.
(_mm512_setr4_epi64): Define.
(_mm512_setr4_epi32): Define.
(_mm512_setr4_pd): Define.
(_mm512_setr4_ps): Define.
(_mm512_setzero_epi32): Define.

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 9602866..314895a 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -39,6 +39,8 @@ typedef double __v8df __attribute__ ((__vector_size__ (64)));
 typedef float __v16sf __attribute__ ((__vector_size__ (64)));
 typedef long long __v8di __attribute__ ((__vector_size__ (64)));
 typedef int __v16si __attribute__ ((__vector_size__ (64)));
+typedef short __v32hi __attribute__ ((__vector_size__ (64)));
+typedef char __v64qi __attribute__ ((__vector_size__ (64)));
 
 /* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components.  */
@@ -130,6 +132,32 @@ _mm512_undefined_si512 (void)
   return __Y;
 }
 
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi8 (char __A)
+{
+  return __extension__ (__m512i)(__v64qi)
+{ __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A };
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_epi16 (short __A)
+{
+  return __extension__ (__m512i)(__v32hi)
+{ __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A,
+  __A, __A, __A, __A, __A, __A, __A, __A };
+}
+
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_set1_pd (double __A)
@@ -152,6 +180,54 @@ _mm512_set1_ps (float __A)
 (__mmask16) -1);
 }
 
+/* Create the vector [A B C D A B C D A B C D A B C D].  */
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi32 (int __A, int __B, int __C, int __D)
+{
+  return __extension__ (__m512i)(__v16si)
+{ __D, __C, __B, __A, __D, __C, __B, __A,
+  __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+extern __inline __m512i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_epi64 (long long __A, long long __B, long long __C,
+  long long __D)
+{
+  return __extension__ (__m512i) (__v8di)
+{ __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_pd (double __A, double __B, double __C, double __D)
+{
+  return __extension__ (__m512d)
+{ __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set4_ps (float __A, float __B, float __C, float __D)
+{
+  return __extension__ (__m512)
+{ __D, __C, __B, __A, __D, __C, __B, __A,
+  __D, __C, __B, __A, __D, __C, __B, __A };
+}
+
+#define _mm512_setr4_epi64(e0,e1,e2,e3)
  \
+  _mm512_set4_epi64(e3,e2,e1,e0)
+
+#define _mm512_setr4_epi32(e0,e1,e2,e3)
  \
+  _mm512_set4_epi32(e3,e2,e1,e0)
+
+#define _mm512_setr4_pd(e0,e1,e2,e3) \
+  _mm512_set4_pd(e3,e2,e1,e0)
+
+#define _mm512_setr4_ps(e0,e1,e2,e3) \
+  _mm512_set4_ps(e3,e2,e1,e0)
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_ps (void)
@@ -169,6 +245,13 @@ _mm512_setzero_pd (void)
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_setzero_epi32 (void)
+{
+  return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 };
+}
+
+extern __inline __m512i
+__attribute__ ((__gn

re-build problem with ln -s

2014-12-01 Thread Ulrich Drepper
I think the jit patches introduced a problem when you rebuild within a
directory that contains an old build (i.e., no brand new build
directory).  The  gcc/Makefile creates a symlink for xgcc with the full
driver name without first removing the symlink.  The right procedure
(gathered from other Makefiles) seems to be to just use rm first.  This
is what the patch below does.

In addition I took the liberty of changing the Makefile to us $(LN_S)
instead on $(LN) -s.

OK?


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c42c2e4..eaf3ee8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-01  Ulrich Drepper  
+
+   * Makefile.in: Use LN_S instead of ln -s and remove file first
+   if it exists.
+
 2014-12-01  Segher Boessenkool  
 
* combine.c (try_combine): Use is_parallel_of_n_reg_sets some more.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 204bd85..60cfa54 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1545,7 +1545,8 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h 
insn-codes.h \
 # from within the *build* directory, for use when running the JIT library
 # from there (e.g. when running its testsuite).
 $(FULL_DRIVER_NAME): ./xgcc
-   $(LN) -s $< $@
+   rm -f $@
+   $(LN_S) $< $@
 
 #
 # Language makefile fragments.


[PATCH] allow passing argument to the JIT linker

2014-12-04 Thread Ulrich Drepper
If you generate code with the JIT which references outside symbols there
is currently no way to have a self-contained DSO created.  The command
line to invoke the linker is fixed.

The patch below would change that.  It builds upon the existing
framework to specify options for the compiler.  The linker optimization
flag fits fully into the existing functionality.  For additional files
to link with I had to extend the mechanism a bit since it is not just
one string that needs to be remembered.

I've also added the set_str_option member function to the C++ interface
of the library.  That must have been an oversight.


What do you think?


gcc/ChangeLog:

2014-12-05  Ulrich Drepper  

* jit/libgccjit++.h (context): Add missing set_str_option
member function.

* jit/libgccjit.h (gcc_jit_int_option): Add
GCC_JIT_INT_OPTION_LINK_OPTIMIZATION_LEVEL.
(gcc_jit_str_option): Add GCC_JIT_STR_OPTION_LINKFILE.
* jit/jit-playback.c (convert_to_dso): Use auto_vec instead
of fixed-sized array for arguments.  Define ADD_ARG macro
to add to it.  Adjust existing code.  Additionally add
optimization level and additional link files to the list.
* jit/jit-playback.h (context::get_linkfiles): New member
function.
* jit/jit-recording.c (recording::context:set_str_option):
Handle GCC_JIT_STR_OPTION_LINKFILE.
* jit/jit-recording.h (recording::context:set_str_option):
Add get_linkfiles member function.

diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index ecdae80..9c4e45f 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -1726,18 +1726,19 @@ convert_to_dso (const char *ctxt_progname)
  TV_ASSEMBLE.  */
   auto_timevar assemble_timevar (TV_ASSEMBLE);
   const char *errmsg;
-  const char *argv[7];
+  auto_vec  argvec;
+#define ADD_ARG(arg) argvec.safe_push (arg)
   int exit_status = 0;
   int err = 0;
   const char *gcc_driver_name = GCC_DRIVER_NAME;
 
-  argv[0] = gcc_driver_name;
-  argv[1] = "-shared";
+  ADD_ARG (gcc_driver_name);
+  ADD_ARG ("-shared");
   /* The input: assembler.  */
-  argv[2] = m_path_s_file;
+  ADD_ARG (m_path_s_file);
   /* The output: shared library.  */
-  argv[3] = "-o";
-  argv[4] = m_path_so_file;
+  ADD_ARG ("-o");
+  ADD_ARG (m_path_so_file);
 
   /* Don't use the linker plugin.
  If running with just a "make" and not a "make install", then we'd
@@ -1746,17 +1747,39 @@ convert_to_dso (const char *ctxt_progname)
  libto_plugin is a .la at build time, with it becoming installed with
  ".so" suffix: i.e. it doesn't exist with a .so suffix until install
  time.  */
-  argv[5] = "-fno-use-linker-plugin";
+  ADD_ARG ("-fno-use-linker-plugin");
+
+  /* Linker int options.  */
+  switch (get_int_option (GCC_JIT_INT_OPTION_LINK_OPTIMIZATION_LEVEL))
+{
+default:
+  add_error (NULL,
+"unrecognized linker optimization level: %i",
+get_int_option (GCC_JIT_INT_OPTION_LINK_OPTIMIZATION_LEVEL));
+  return;
+
+case 0:
+  break;
+
+case 1:
+  ADD_ARG ("-Wl,-O");
+  break;
+}
+
+  const char *elt;
+  const auto_vec& linkfiles = get_linkfiles();
+  for (unsigned ix = 0; linkfiles.iterate(ix, &elt); ++ix)
+ADD_ARG (elt);
 
   /* pex argv arrays are NULL-terminated.  */
-  argv[6] = NULL;
+  ADD_ARG (NULL);
 
   /* pex_one's error-handling requires pname to be non-NULL.  */
   gcc_assert (ctxt_progname);
 
   errmsg = pex_one (PEX_SEARCH, /* int flags, */
gcc_driver_name,
-   const_cast (argv),
+   const_cast  (argvec.address ()),
ctxt_progname, /* const char *pname */
NULL, /* const char *outname */
NULL, /* const char *errname */
@@ -1783,6 +1806,7 @@ convert_to_dso (const char *ctxt_progname)
 getenv ("PATH"));
   return;
 }
+#undef ADD_ARG
 }
 
 /* Top-level hook for playing back a recording context.
diff --git a/gcc/jit/jit-playback.h b/gcc/jit/jit-playback.h
index 02f08ba..2726347 100644
--- a/gcc/jit/jit-playback.h
+++ b/gcc/jit/jit-playback.h
@@ -175,6 +175,12 @@ public:
 return m_recording_ctxt->get_bool_option (opt);
   }
 
+  const auto_vec &
+  get_linkfiles () const
+  {
+return m_recording_ctxt->get_linkfiles ();
+  }
+
   builtins_manager *get_builtins_manager () const
   {
 return m_recording_ctxt->get_builtins_manager ();
diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
index 82ec399..a6d64f9 100644
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -827,7 +827,17 @@ recording::context::set_str_option (enum 
gcc_jit_str_option opt,
 "unrecognized (enum gcc_jit_str_option) value: %i", opt);
   ret

Re: [PATCH] allow passing argument to the JIT linker

2014-12-05 Thread Ulrich Drepper
On Fri, Dec 5, 2014 at 2:24 PM, David Malcolm  wrote:
> What's the use-case here?  Sorry if I'm not getting at what this is for.

The use case is that a program wants to use library functions,
something common, not everything is self-contained and linked in
automatically (like libc).  Currently you would have to rely on the
fact that a DSO can be created with dangling references which are
expected to be somehow fulfilled at runtime. There are multiple
problems with this:

First, even if the application using the JIT itself is linked against
a library which the JIT-generated code wants to use it is a problem if
the definitions are accidentally found.  If the library with the
desired function in question uses symbol versioning the JIT-created
DSO would have just an ordinary UNDEF entry for the symbol with no
symbol version available.  This then means that at runtime the
/oldest/ version is picked.  That not what you want in this case.

Second, if you implement some form of extension language where the
language allows to reference functions in other DSOs, you'd have to
either use dlopen(RTLD_GLOBAL) in the main app (evil, ever use
RTLD_GLOBAL) or you'd have to implicitly have the generated code use
dlopen() and the dlsym().  That's cumbersome at best and also slow.


On the other hand, with an option as proposed the code generator could
simply record the dependency and have the DSO automatically used at
link-time and runtime, creating the correct references etc.


> Is the "self-containedness of the DSO" in your patch aimed at ensuring
> that libpng.so.N gets unloaded when fake.so is unloaded?

The unloading part is a nice additional benefit.  It's mostly about
the possibility to make it easily and quickly possible to call any
function from any available DSO without having to know which DSOs are
needed at the time the application using the JIT is linked.


> One issue here is the lifetime of str options; currently str options
> simply record the const char *, without taking a copy of the underlying
> buffer.  We might need to change this to make it take a strdup of the
> option, to avoid nasty surprises if someone calls set_str_option with a
> std::string and has it auto-coerced to a const char * from under them.

I'm fine with that, I just followed what you did so far.  If you want
it done this way I'll add this to the patch.


> New options should be documented in:
>   gcc/jit/docs/topics/contexts.rst in the "Options" section.
> and these ones should probably be mentioned in the subsection on
> GCC_JIT_FUNCTION_IMPORTED in functions.rst.

I was more concerned with the code first... ;-)


> Do you have a sense of what impact setting the option would have on the
> time taken by gcc_jit_context_compile?

It's really not much.  The linker just tries different sizes for a
hash table and picks the size with the least number of conflicts and
therefore hopefully best performance at runtime.  With today's
machines this isn't really noticeable.  Jakub (if you read this), when
did we implement this?  It still might not be a good idea to enable it
by default and, as written, there might be other optimizations which
are implemented.


> This doesn't support nested contexts; presumably this should walk up
> through any parent contexts, adding any linkfiles requested by them?

Nested contexts?  Do you deal with with gcc_jit_contact structures
recursively?  I must miss that.  This is just a way to add more
strings (free-form parameters) to the linker command line.  I'm using

   ctxt.set_str_option(GCC_JIT_STR_OPTION_LINKFILE, "-lsomelibrary");

to have fake.so linked against libsomelibrary.so.


> Here's another place where nested contexts may need to be supported: a
> playback context's m_recording_ctxt may have ancestors, and they might
> have linkfiles specified.

This isn't the playback context structure, it the toplevel
(gccjit::context) one.  As far I can see there is no hierarchy and
this makes sense.


> I notice that this string option works differently from the others, in
> that it appends to a list, rather than overwriting a value; that would
> need spelling out in the documentation.

Yes, sure, documentation is nothing I've concerned myself at that point.


> I wondered if this should take a std::string instead of a const char *,
> but a const char * is probably more flexible, given that you can go
> trivially from a std::string to a const char *, but going the other way
> may cost some cycles.

If we want to make copies anyway I think it doesn't matter.  I think
using const char* is easier to use for the reasons you spelled out.


> This descriptive comment needs fleshing out.  For example, are these
> filenames, or SONAMEs?  How does this relate to what a user would pass
> to the linker command line if they were writing a Makefile rather than
> code that's calling into a JIT API?

The strings are supposed to be exactly what you would add  to the
linker command line.  No magic.  In fact, the same mechanism ca

[PATCH] libgccjit cleanups

2014-12-06 Thread Ulrich Drepper
This patch broken out of one I sent earlier with some extensions.  It
contains only little cleanups to the libgccjit code.

When creating the linker command line the code now uses an auto_vec
instead of the fixed size array.

The second change adds the missing context::set_str_option member
function to the C++ interface.

The third change it to the string option handling.  Instead of just
using the pointer passed to the function the code now makes a copy
of the string.


OK?


gcc/ChangeLog:

2014-12-06  Ulrich Drepper  

* jit/jit-playback.c (convert_to_dso): Use auto_vec instead
of automatic array to build up command line.
* jit/jit-recording.c (recording::context::set_str_option):
Make copy of the string.
(recording::context::~context): Free string options.
* jit/jit-recording.h (recording::context): Adjust type
of m_str_options member.
* jit/libgccjit.h: Adjust comment about
gcc_jit_context_set_str_option parameter begin used after
the call.
* jit/libgccjit++.h (gccjit::context): Add set_str_option
member function.


diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index ecdae80..6d1eb8a 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -1726,18 +1726,19 @@ convert_to_dso (const char *ctxt_progname)
  TV_ASSEMBLE.  */
   auto_timevar assemble_timevar (TV_ASSEMBLE);
   const char *errmsg;
-  const char *argv[7];
+  auto_vec  argvec;
+#define ADD_ARG(arg) argvec.safe_push (arg)
   int exit_status = 0;
   int err = 0;
   const char *gcc_driver_name = GCC_DRIVER_NAME;
 
-  argv[0] = gcc_driver_name;
-  argv[1] = "-shared";
+  ADD_ARG (gcc_driver_name);
+  ADD_ARG ("-shared");
   /* The input: assembler.  */
-  argv[2] = m_path_s_file;
+  ADD_ARG (m_path_s_file);
   /* The output: shared library.  */
-  argv[3] = "-o";
-  argv[4] = m_path_so_file;
+  ADD_ARG ("-o");
+  ADD_ARG (m_path_so_file);
 
   /* Don't use the linker plugin.
  If running with just a "make" and not a "make install", then we'd
@@ -1746,17 +1747,17 @@ convert_to_dso (const char *ctxt_progname)
  libto_plugin is a .la at build time, with it becoming installed with
  ".so" suffix: i.e. it doesn't exist with a .so suffix until install
  time.  */
-  argv[5] = "-fno-use-linker-plugin";
+  ADD_ARG ("-fno-use-linker-plugin");
 
   /* pex argv arrays are NULL-terminated.  */
-  argv[6] = NULL;
+  ADD_ARG (NULL);
 
   /* pex_one's error-handling requires pname to be non-NULL.  */
   gcc_assert (ctxt_progname);
 
   errmsg = pex_one (PEX_SEARCH, /* int flags, */
gcc_driver_name,
-   const_cast (argv),
+   const_cast  (argvec.address ()),
ctxt_progname, /* const char *pname */
NULL, /* const char *outname */
NULL, /* const char *errname */
@@ -1783,6 +1784,7 @@ convert_to_dso (const char *ctxt_progname)
 getenv ("PATH"));
   return;
 }
+#undef ADD_ARG
 }
 
 /* Top-level hook for playing back a recording context.
diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -215,6 +215,9 @@ recording::context::~context ()
   delete m;
 }
 
+  for (i = 0; i < GCC_JIT_NUM_STR_OPTIONS; ++i)
+free (m_str_options[i]);
+
   if (m_builtins_manager)
 delete m_builtins_manager;
 
@@ -827,7 +830,7 @@ recording::context::set_str_option (enum gcc_jit_str_option 
opt,
 "unrecognized (enum gcc_jit_str_option) value: %i", opt);
   return;
 }
-  m_str_options[opt] = value;
+  m_str_options[opt] = xstrdup (value);
 }
 
 /* Set the given integer option for this context, or add an error if
diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
--- a/gcc/jit/jit-recording.h
+++ b/gcc/jit/jit-recording.h
@@ -246,7 +246,7 @@ private:
   char *m_first_error_str;
   bool m_owns_first_error_str;
 
-  const char *m_str_options[GCC_JIT_NUM_STR_OPTIONS];
+  char *m_str_options[GCC_JIT_NUM_STR_OPTIONS];
   int m_int_options[GCC_JIT_NUM_INT_OPTIONS];
   bool m_bool_options[GCC_JIT_NUM_BOOL_OPTIONS];
 
diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -99,6 +99,9 @@ namespace gccjit
 void dump_to_file (const std::string &path,
   bool update_locations);
 
+void set_str_option (enum gcc_jit_str_option opt,
+const char *value);
+
 void set_int_option (enum gcc_jit_int_option opt,
 int value);
 
@@ -535,6 +538,14 @@ context::dump_to_file (const std::string &path,
 }
 
 inline void
+context::set_str_option (enum gcc_jit_str_option opt,
+const char *value)
+{
+  gcc_jit_context_set_str_opt

[PATCH] influence JIT linker command line

2014-12-06 Thread Ulrich Drepper
This patch supercedes the patch I sent earlier this week to add
dependencies to the linker command line.  The implementation is
different.

First, based on Dave's comment that he wants to keep the interface
simple, to enable the linker optimizations no new interface is added.
Instead optimizations are enabled in the linker whenever the compiler
optimizes, too.  I don't think this will create problems at all since
the time it takes nowadays is really low; it's only really measurable
for extremely large files.

The way to add dependencies is changed.  Instead of allowing an
unstructured string parameter to be added to the command line the new
proposed interface allows to introduce a dependency with possibly
information about the path the dependency is found.  This should be
useful and implementable if at some point the explicit linker invocation
is replaced by a library implementation.  The path argument of the new
interface is used differently depending on the name.  If the name is of
the form -l* then the -L option is used and a runpath is added.
Otherwise the path is used to locate the file.


Comments?

gcc/ChangeLog:

2014-12-06  Ulrich Drepper  

* jit/jit-recording.c (recording::context::add_dependency):
New function.
(recording::context::~context): Free newly added lists.
* jit/jit-recording.h (recording::context): Add new
member functions.
* jit/libgccjit++.h (context): Add add_dependency member
function.
* jit/libgccjit.h: Declare gcc_jit_context_add_dependency.
* jit/libgccjit.c: Define gcc_jit_context_add_dependency.
* jit/libgccjit.map: Add gcc_jit_context_add_dependency.
* jit/jit-playback.c (convert_to_dso): Add dependencies
and library path arguments to the command line.
* docs/topics/contexts.rst: Document new interface.


diff -u b/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
--- b/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -1749,6 +1749,18 @@
  time.  */
   ADD_ARG ("-fno-use-linker-plugin");
 
+  /* Linker optimization.  We always tell the linker to optimize if the
+ compiler is optimizing, too.  */
+  if (get_int_option (GCC_JIT_INT_OPTION_OPTIMIZATION_LEVEL) > 0)
+ADD_ARG ("-Wl,-O");
+
+  const char *s;
+  for (unsigned i = 0; (s = get_dependency (i)); ++i)
+ADD_ARG (s);
+
+  for (unsigned i = 0; (s = get_library_path (i)); ++i)
+ADD_ARG (s);
+
   /* pex argv arrays are NULL-terminated.  */
   ADD_ARG (NULL);
 
diff -u b/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
--- b/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -218,6 +218,17 @@
   for (i = 0; i < GCC_JIT_NUM_STR_OPTIONS; ++i)
 free (m_str_options[i]);
 
+  char *s;
+  FOR_EACH_VEC_ELT (m_dependencies, i, s)
+{
+  free (s);
+}
+
+  FOR_EACH_VEC_ELT (m_library_path, i, s)
+{
+  free (s);
+}
+
   if (m_builtins_manager)
 delete m_builtins_manager;
 
@@ -871,7 +882,66 @@
   m_bool_options[opt] = value ? true : false;
 }
 
-/* This mutex guards gcc::jit::recording::context::compile, so that only
+/* Add the given library to the set of dependencies, or add an error
+   if it's not recognized.
+
+   Implements the post-error-checking part of
+   gcc_jit_context_add_dependency.  */
+void
+recording::context::add_dependency (const char *name, int flags,
+   const char *path)
+{
+  if (name == NULL)
+{
+  add_error (NULL, "NULL library name");
+  return;
+}
+  /* So far no flags are defined.  */
+  if (flags != 0)
+{
+  add_error (NULL,
+"unrecognized flags value: %i", flags);
+  return;
+}
+
+  bool named_library = strncmp (name, "-l", 2);
+
+  if (strchr (name, '/') != NULL && (named_library || path != NULL))
+{
+  add_error (NULL,
+"path must be NULL unless simple file name is used");
+  return;
+}
+  if (named_library == 0 || path == NULL)
+{
+  m_dependencies.safe_push (xstrdup (name));
+
+  if (named_library)
+   {
+ char *v;
+ asprintf (&v, "-Wl,-R,%s -L %s", path, path);
+ if (v == NULL)
+   {
+ add_error (NULL, "cannot allocate memory");
+ return;
+   }
+ m_library_path.safe_push (v);
+   }
+}
+  else
+{
+  char *v;
+  asprintf (&v, "%s/%s", path, name);
+  if (v == NULL)
+   {
+ add_error (NULL, "cannot allocate memory");
+ return;
+   }
+  m_dependencies.safe_push (v);
+}
+}
+
+  /* This mutex guards gcc::jit::recording::context::compile, so that only
one thread can be accessing the bulk of GCC's state at once.  */
 
 static pthread_mutex_t jit_mutex = PTHREAD_MUTEX_INITIALIZER;
diff -u b/gcc/jit/jit-recording.h b/gcc/jit/jit-recordin

Re: PATCH to add -std=c++14

2014-03-09 Thread Ulrich Drepper
On Sat, Mar 8, 2014 at 8:30 PM, Mike Stump  wrote:
> Are they any plans to change the default language for C++?

Probably problematic for compatibility reasons.  But how about adding
c++1, g++11, and c++14, and g++14 wrappers similar to c99?


Re: [PATCH] x86: Define _mm*_undefined_*

2014-03-17 Thread Ulrich Drepper
On Mon, Mar 17, 2014 at 7:39 AM, Ilya Tocar  wrote:
> Do you know of any cases where xor is
> generated (except for destination in gather/scatter)

I don't have any code exhibiting this handy right now.  I'll keep an eye out.


>  but it also clobbers
> flags. Maybe just define it to setzero for now?

What do you mean by "clobbers flags"?  Do you have an example?


Re: [PATCH] x86: Define _mm*_undefined_*

2014-03-18 Thread Ulrich Drepper
On Tue, Mar 18, 2014 at 7:13 AM, Richard Biener
 wrote:
> extern __inline __m512
> __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> _mm512_undefined_ps (void)
> {
>   __m512 __Y = __Y;
>   return __Y;
> }


This provokes no warnings (as you wrote) and it doesn't clobber flags,
but it doesn't avoid loading.  The code below creates a pxor for the
parameter.  That's what I think compiler support should help to get
rid of.  If the compiler has some magic to recognize -1 masks then
this will help in some situations but it seems to be a specific
implementation for the intrinsics while I've been looking at generic
solution.


typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__));

void g(__m128d);

extern __inline __m128d
__attribute__((__gnu_inline__, __always_inline__, __artificial__, const))
_mm_undefined_pd(void) {
  __m128d v = v;
  return v;
}

void
f()
{
  g(_mm_undefined_pd());
}


Re: [PATCH] x86: Define _mm*_undefined_*

2014-03-18 Thread Ulrich Drepper
On Tue, Mar 18, 2014 at 11:11 AM, Richard Biener
 wrote:
> Btw, without this zeroing (where zero is also "undefined") this may
> be an information leak and thus possibly a security issue?  That is,
> how is _mm_undefined_pd () specified?

People aren't accidentally using the _mm*_undefined_*() functions.  I
don't think that information leak should be a concern here.  Changing
the default behavior of

   TYPE VAR = VAR;

might be a problem.  There might be code which depends on the current behavior.


[PATCH] x86: _mm512_set1_p[sd]

2014-03-19 Thread Ulrich Drepper
Another set of functions missing are those to set all elements of a
512-bit vector to the same float or double value.  I think the patch
below uses the optimal code sequence for that.  The patch requires the
previous patch introducing _mm*_undefined_*.


2014-03-19  Ulrich Drepper  

* config/i386/avx512fintrin.h: Define _mm512_set1_ps and
_mm512_set1_pd.


diff -u b/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
--- b/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -130,6 +130,28 @@
   return __Y;
 }
 
+extern __inline __m512d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_pd (double __A)
+{
+  return (__m512d) __builtin_ia32_broadcastsd512 (__extension__
+ (__v2df) { __A, },
+ (__v8df)
+ _mm512_undefined_pd (),
+ (__mmask8) -1);
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_ps (float __A)
+{
+  return (__m512) __builtin_ia32_broadcastss512 (__extension__
+(__v4sf) { __A, },
+(__v16sf)
+_mm512_undefined_ps (),
+(__mmask16) -1);
+}
+
 extern __inline __m512
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_setzero_ps (void)


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-24 Thread Ulrich Drepper
On Mon, Mar 24, 2014 at 1:50 AM, Kirill Yukhin  wrote:
> Your patch is correct IMHO, but maybe it worst to add all missing
> `mm512_set1*' stuff?
>
> According to trunk and [1] we're still missing (beside mentioned by you)
> _mm512_set1_epi16 and  _mm512_set1_epi8 broadcasts.

Yes, more are missing, but I think those will need new builtins.  The
_ps and _pd don't require additional instructions.

_mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
might have to map to vpbroadcastb.  I haven't seen a way to generate
those instructions if needed and so this work was out of scope for now
due to time constraints.  I agree, they should be added as quickly as
possible to avoid releasing headers with incomplete APIs.

What is the verdict on checking these changes in?  Too late for the
next release?


Re: [PATCH] x86: _mm*_undefined_* (for real)

2014-03-24 Thread Ulrich Drepper
On Mon, Mar 24, 2014 at 2:31 AM, Kirill Yukhin  wrote:
> If list of missing intrinsics is big - maybe you could share it? I can
> help you implementing it.

So far only the set1 intrinsics.  I'll see whether I can spot more.


> In general, I think _undefined idea is correct and the patch is doing most
> important thing - it localizes undef semantics in couple of built-ins.
> However I don't know which code is optimal to model undef behaviour.

Indeed, that's my main objective for now.  Then someone with more
knowledge of the gcc internals could experiment with the code
generation and only have to change a few places.  I could make one
more change, if wanted, and reduce the number of affected locations to
just three.  We would only need a builtin to create, say,
_mm_undefined_esi128.  The other two 128-bit values could be created
using a cast which gcc allows just fine.  Should I do that?  Looks a
bit less clean but helps with maintainability IMO.

In general, as for the other patch, too late for the next release?


Re: [PATCH] x86: _mm512_set1_p[sd]

2014-03-25 Thread Ulrich Drepper
On Mon, Mar 24, 2014 at 9:09 AM, Jakub Jelinek  wrote:
> The following is recognized well:
>
> typedef char v32qi __attribute__((vector_size (32)));
> v32qi foo (char a)
> {
>   return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, 
> a, a, a, a, a, a, a, a, a, a, a, a, a };
> }


Perhaps "well" but not optimal.  The created code is

vmovd   %edi, %xmm0
vpbroadcastb%xmm0, %xmm0
vinserti128   $1, %xmm0, %ymm0, %ymm0

It should generate for AVX2

vmovd   %edi, %xmm0
vpbroadcastb%xmm0, %ymm0


Re: [PATCH] libgccjit cleanups

2014-12-10 Thread Ulrich Drepper
On Mon, Dec 8, 2014 at 11:36 AM, David Malcolm  wrote:
> Thanks.  Overall this is good, a few nitpicks inline below:

I've made the changes and checked in the patch.


Re: GCC 5 Status Report (2015-01-19), Trunk in Stage 4

2015-01-19 Thread Ulrich Drepper
On Mon, Jan 19, 2015 at 12:32 PM, Jonathan Wakely  wrote:
> I would like to commit these two patches which complete the C++11
> library implementation:

I would definitely be in favor.


> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01694.html

Just a nit.  Why wouldn't you check the value of the variable after
the assignment in the test case?


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-08-08 Thread Ulrich Drepper
Jonathan Wakely  writes:

> On 23/07/14 11:58 +0200, Marc Glisse wrote:
> As an aside, we already have divide-by-zero bugs in , it
> would be nice if someone could look at that.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60037

Sorry, it took a while to get back to tihs and now the referenced bug is
already fixed.  Good.  Now also for a fix of the sphere distribution.
Unless someone objects I'll check in the patch below.


2014-08-08  Ulrich Drepper  

* include/ext/random.tcc
(uniform_on_sphere_distribution::__generate_impl): Reject
vectors with norm zero.


diff --git a/libstdc++-v3/include/ext/random.tcc 
b/libstdc++-v3/include/ext/random.tcc
index 05361d8..d1f0b9c 100644
--- a/libstdc++-v3/include/ext/random.tcc
+++ b/libstdc++-v3/include/ext/random.tcc
@@ -1548,13 +1548,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 const param_type& __p)
   {
result_type __ret;
-   _RealType __sum = _RealType(0);
+   _RealType __norm;
+
+   do
+ {
+   _RealType __sum = _RealType(0);
+
+   std::generate(__ret.begin(), __ret.end(),
+ [&__urng, &__sum, this](){
+   _RealType __t = _M_nd(__urng);
+   __sum += __t * __t;
+   return __t; });
+   __norm = std::sqrt(__sum);
+ }
+   while (__norm == _RealType(0));
 
-   std::generate(__ret.begin(), __ret.end(),
- [&__urng, &__sum, this](){ _RealType __t = _M_nd(__urng);
-__sum += __t * __t;
-return __t; });
-   auto __norm = std::sqrt(__sum);
std::transform(__ret.begin(), __ret.end(), __ret.begin(),
   [__norm](_RealType __val){ return __val / __norm; });
 


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-08-09 Thread Ulrich Drepper
On Sat, Aug 9, 2014 at 3:15 AM, Marc Glisse  wrote:
> While there, do we want to also reject infinite norms?
> I would have done: while (__sum < small || __sum > large)
> but testing exactly for 0 and infinity seems good enough.

I guess the squaring can theoretically overflow and produce infinity.
It will never happen with the way we generate normally distributed
numbers, though.  These values are always so unlikely that it is OK
that the algorithms cannot return them.  If you insist I'll add a test
for infinity.

The other change (which would eliminate the necessity for this test in
a special case) is to use hypot for _Dimen==2.  This might be a case
common enough to warrant that little bit of extra text.  I'll prepare
a patch.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-08-09 Thread Ulrich Drepper
On Sat, Aug 9, 2014 at 8:34 AM, Marc Glisse  wrote:
> Oh, a comment saying exactly what you just said would be fine with me (or
> even nothing).

We might at some point use a different method than Box-Muller sampling
so I'm OK with the test.


> If you are going to specialize for dim 2, I imagine you won't be computing
> normal distributions, you will only generate a point uniformy in a square
> and reject it if it is not in the ball? (interestingly enough this is used
> as a subroutine by the implementation of normal_distribution)

We need to be *on* the circle, not inside.  We'll still have to follow
the algorithm unless I miss something.  With reasonable probability we
cannot generate those numbers directly from a uniform source. What is
optimized is just the norm computation.


Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-08-09 Thread Ulrich Drepper
Marc Glisse  writes:

> On Sat, 9 Aug 2014, Ulrich Drepper wrote:
> Yes, you still need the normalization step (divide by the norm).

I guess we can do this.

How about the patch below?  Instead of specializing the entire class for
_Dimen==2 I've added a class at the implementation level.

I've also improved existing tests and add some new ones.


2014-08-09  Ulrich Drepper  

* include/ext/random.tcc (uniform_on_sphere_helper): Define.
(uniform_on_sphere_distribution::operator()): Use the new helper
class for the implementation.

* testsuite/ext/random/uniform_on_sphere_distribution/operators/
equal.cc: Remove bogus part of comment.
* testsuite/ext/random/uniform_on_sphere_distribution/operators/
inequal.cc: Likewise.
* testsuite/ext/random/uniform_on_sphere_distribution/operators/
serialize.cc: Add check to verify result of serialzation and
deserialization.
* testsuite/ext/random/uniform_on_sphere_distribution/operators/
generate.cc: New file.


diff --git a/libstdc++-v3/include/ext/random.tcc 
b/libstdc++-v3/include/ext/random.tcc
index 05361d8..d536ecb 100644
--- a/libstdc++-v3/include/ext/random.tcc
+++ b/libstdc++-v3/include/ext/random.tcc
@@ -1540,6 +1540,83 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
 
+  namespace {
+
+// Helper class for the uniform_on_sphere_distribution generation
+// function.
+template
+  class uniform_on_sphere_helper
+  {
+   typedef typename uniform_on_sphere_distribution<_Dimen, 
_RealType>::result_type result_type;
+
+  public:
+   template
+   result_type operator()(_NormalDistribution& __nd,
+  _UniformRandomNumberGenerator& __urng)
+{
+ result_type __ret;
+ typename result_type::value_type __norm;
+
+ do
+   {
+ auto __sum = _RealType(0);
+
+ std::generate(__ret.begin(), __ret.end(),
+   [&__nd, &__urng, &__sum](){
+ _RealType __t = __nd(__urng);
+ __sum += __t * __t;
+ return __t; });
+ __norm = std::sqrt(__sum);
+   }
+ while (__norm == _RealType(0) || ! std::isfinite(__norm));
+
+ std::transform(__ret.begin(), __ret.end(), __ret.begin(),
+[__norm](_RealType __val){ return __val / __norm; });
+
+ return __ret;
+}
+  };
+
+
+template
+  class uniform_on_sphere_helper<2, _RealType>
+  {
+   typedef typename uniform_on_sphere_distribution<2, _RealType>::
+ result_type result_type;
+
+  public:
+   template
+   result_type operator()(_NormalDistribution&,
+  _UniformRandomNumberGenerator& __urng)
+{
+ result_type __ret;
+ _RealType __sq;
+ std::__detail::_Adaptor<_UniformRandomNumberGenerator,
+ _RealType> __aurng(__urng);
+
+ do
+   {
+ __ret[0] = __aurng();
+ __ret[1] = __aurng();
+
+ __sq = __ret[0] * __ret[0] + __ret[1] * __ret[1];
+   }
+ while (__sq == _RealType(0) || __sq > _RealType(1));
+
+ // Yes, we do not just use sqrt(__sq) because hypot() is more
+ // accurate.
+ auto __norm = std::hypot(__ret[0], __ret[1]);
+ __ret[0] /= __norm;
+ __ret[1] /= __norm;
+
+ return __ret;
+}
+  };
+
+  }
+
+
   template
 template
   typename uniform_on_sphere_distribution<_Dimen, _RealType>::result_type
@@ -1547,18 +1624,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator()(_UniformRandomNumberGenerator& __urng,
 const param_type& __p)
   {
-   result_type __ret;
-   _RealType __sum = _RealType(0);
-
-   std::generate(__ret.begin(), __ret.end(),
- [&__urng, &__sum, this](){ _RealType __t = _M_nd(__urng);
-__sum += __t * __t;
-return __t; });
-   auto __norm = std::sqrt(__sum);
-   std::transform(__ret.begin(), __ret.end(), __ret.begin(),
-  [__norm](_RealType __val){ return __val / __norm; });
-
-   return __ret;
+uniform_on_sphere_helper<_Dimen, _RealType> __helper;
+return __helper(_M_nd, __urng);
   }
 
   template
diff --git 
a/libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/operators/equal.cc
 
b/libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/operators/equal.cc
index 35a024e..f5b8d17 100644
--- 
a/libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/operators/equal.cc
+++ 
b/libstdc++-v3/testsuite/ext/random/uniform_on_sphere_distribution/operators/equal.cc

Re: [PATCH] libstdc++: add uniform on sphere distribution

2014-08-09 Thread Ulrich Drepper
On Sat, Aug 9, 2014 at 1:40 PM, Marc Glisse  wrote:
> __x = result_type(2.0) * __aurng() - 1.0;

You're right, we of course need the negatives as well.

> Assuming the 2 coordinates are obtained through a rescaling x->2*x-1, if
> __sq is not exactly 0, it must be between 2^-103 and 1 (for ieee
> double), so I am not sure hypot gains that much (at least in my mind
> hypot was mostly a gain close to 0 or infinity, but maybe it has more
> advantages). It can only hurt speed though, so not a big issue.

Depending on how similar in size the two values are, not using hypot()
can drop quite a few bits.  Especially with the scaling through
division this error can be noticeable.  Better be sure.  Maybe at some
point I have time to investigate the worst case scenario for the
numbers in question but until this shows hypot isn't needed it's best
to leave it in.

I've committed the patch.


Re: SH atomic asms in glibc and the stack pointer

2011-12-02 Thread Ulrich Drepper
On Tue, Nov 29, 2011 at 17:44, Kaz Kojima  wrote:
> Uli, could you please approve the libc patch?

Has the gcc patch been committed?


Re: [PATCH][Aarch64] Add vectorized mersenne twister

2017-06-06 Thread Ulrich Drepper
On Tue, Jun 6, 2017 at 12:07 PM, James Greenhalgh
 wrote:
> We're a good number of years late to do that without causing some pain.

Well, it's pain for those who deserve it.  Who thought it to be a
smart idea to pollute the global namespace?

It's a one-time deal.


> So we have a few solutions to choose from, each of which invokes a trade-off:
>
>   1 Use the current names and pollute the namespace.

IMO unacceptable.


>   2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
> internals.

Maybe.


>   3 Define a new set of namespace-clean names and complicate the Neon
> intrinsic interface while we migrate old users to new names.

See Jonathan's proposal.  I never suggested that those who don't care
about namespace pollution would have to change their code.  Add
appropriate aliases.

There is perhaps number 4:

- use the x86-64 intrinsics which you map to aarch64 intrinsics.
Isn't this compatibility layer planned anyway?  I don't know whether
everything maps 1-to-1 and you don't lose performance but you could
this way use the arch-specific code I wrote a long time ago for
x86-64.


Re: [fortran, RFC] Getting rid of unneeded functions in libgfortran

2017-07-10 Thread Ulrich Drepper
On Mon, Jul 10, 2017 at 8:43 PM, Thomas Koenig  wrote:
> with the bump in the libfortran version that is needed with
> Paul's patch,

Isn't it time to start thinking about ABI compatibility just as for
the other libraries?


Re: PATCH v2][Aarch64] Add vectorized mersenne twister

2017-07-18 Thread Ulrich Drepper
On Tue, Jul 18, 2017 at 7:57 AM, Michael Collison
 wrote:
> This is the second version of a patch for Aarc64 to add a vectorized mersenne 
> twister to libstdc++. The first version used intrinsics and included 
> "arm_neon.h". After feedback from the community this version uses only GCC 
> vector extensions and Aarch64 simd data types.

Looks OK.  Just stylistically, why do you have

+#ifdef __ARM_NEON
+#ifdef __aarch64__

(in more than one place) instead of one preprocessor line?


[PATCH] trivial cleanup in dwarf2out.c

2017-07-21 Thread Ulrich Drepper
While looking through dwarf2out.c I came across this if expression where
supposedly in case DWARF before 5 is used the 128 LEB  encoding is used.
 This of course cannot be the case.  There isn't really a deeper problem
since the entire block is guarded by a test for at least DWARF 5.

I propose the following patch.

[gcc/ChangeLog]

2017-07-21  Ulrich Drepper  

* dwarf2out.c (output_file_names): Avoid double testing for
dwarf_version >= 5.

--- gcc/dwarf2out.c 2017-07-21 06:15:26.993826963 +0200
+++ gcc/dwarf2out.c-new 2017-07-21 10:29:03.382742797 +0200
@@ -11697,7 +11697,7 @@ output_file_names (void)
   output_line_string (str_form, filename0, "File Entry", 0);

   /* Include directory index.  */
-  if (dwarf_version >= 5 && idx_form != DW_FORM_udata)
+  if (idx_form != DW_FORM_udata)
dw2_asm_output_data (idx_form == DW_FORM_data1 ? 1 : 2,
 0, NULL);
   else


Re: [PATCH] Optimize BB sorting in domwalk

2017-07-24 Thread Ulrich Drepper
Not commenting on the correctness... but

On Mon, Jul 24, 2017 at 1:29 PM, Alexander Monakov  wrote:
> +  basic_block bb0 = bbs[0], bb1 = bbs[1];
> +  if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
> +   bbs[0] = bb1, bbs[1] = bb0;
> +}
> +  else if (__builtin_expect (n == 3, true))
> +{
> +  basic_block t, bb0 = bbs[0], bb1 = bbs[1], bb2 = bbs[2];
> +  if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
> +   t = bb0, bb0 = bb1, bb1 = t;
> +  if (bb_postorder[bb1->index] < bb_postorder[bb2->index])
> +   {
> + t = bb1, bb1 = bb2, bb2 = t;
> + if (bb_postorder[bb0->index] < bb_postorder[bb1->index])
> +   t = bb0, bb0 = bb1, bb1 = t;
> +   }
> +  bbs[0] = bb0, bbs[1] = bb1, bbs[2] = bb2;

... maybe use std::swap() in all four cases?


handle VIEW_CONVERT_EXPR in debug_node

2019-11-12 Thread Ulrich Drepper
I am using debug_node() to emit the tree of functions for later
processing.  For this I need all the information to be present.  So far
I came across one expression type that isn't handled correctly.  For
VIEW_CONVERT_EXPR only the type value is printed, not the first tree
operand.  The following patch fixes this.

OK?

2019-11-12  Ulrich Drepper  

* tree-dump.c (dequeue_and_dump): Print first tree operand
for VIEW_CONVERT_EXPR.

diff --git a/gcc/tree-dump.c b/gcc/tree-dump.c
index 51c0965861f..83eb29b7e2b 100644
--- a/gcc/tree-dump.c
+++ b/gcc/tree-dump.c
@@ -561,6 +561,7 @@ dequeue_and_dump (dump_info_p di)
 case ADDR_EXPR:
 case INDIRECT_REF:
 case CLEANUP_POINT_EXPR:
+case VIEW_CONVERT_EXPR:
 case SAVE_EXPR:
 case REALPART_EXPR:
 case IMAGPART_EXPR:



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]

2024-04-04 Thread Ulrich Drepper
On Thu, Apr 4, 2024 at 5:29 PM Jonathan Wakely  wrote:
> I would appreciate more eyes on this to confirm my conclusions about
> negative int_type values, and the proposed fix, make sense.

The way something like this is handled in glibc's ctype functions is
that both branches are considered.  For isXXX(c) whether c is -v or
256-v the same value is returned (except for EOF which is -1).  This
caused the least number of bad surprises.

You could here also perform similar actions.


Re: [PATCH v3] libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

2024-01-15 Thread Ulrich Drepper
On Mon, Jan 15, 2024 at 9:45 PM Jonathan Wakely  wrote:
> I think I'm happy with this now. It has tests for all the new functions,
> and the performance of the charset alias match algorithm is improved by
> reusing part of .
>
> Tested x86_64-linux.

Looks good to me.  Good work, Jon.


Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-01 Thread Ulrich Drepper
Hey Jon,

I think we should give debuggers a chance to announce themselves by
providing an entry point they can call (in the inferior) which sets a
flag.  A set flag plus a tracer PID would then be a sufficient
indicator.  The remaining code should also stay but some additional
code can be added:

On Sat, Jun 1, 2024 at 12:22 PM Jonathan Wakely  wrote:
> +_GLIBCXX_WEAK_DEFINITION
> +bool
> +std::is_debugger_present() noexcept
> +{
> +#if _GLIBCXX_HOSTED
> +# if _GLIBCXX_USE_PROC_SELF_STATUS
> +  const string_view prefix = "TracerPid:\t";
> +  ifstream in("/proc/self/status");
> +  string line;
> +  while (std::getline(in, line))
> +{
> +  if (!line.starts_with(prefix))
> +   continue;
> +
> +  string_view tracer = line;
> +  tracer.remove_prefix(prefix.size());
> +  if (tracer.size() == 1 && tracer[0] == '0') [[likely]]
> +   return false; // Not being traced.
> +

Here add something like:

   if (debugger_announced)
 return true;


> +  in.close();
> +  string_view cmd;
> +  string proc_dir = "/proc/" + string(tracer) + '/';
> +  in.open(proc_dir + "comm"); // since Linux 2.6.33
> +  if (std::getline(in, line)) [[likely]]
> +   cmd = line;
> +  else
> +   {
> + in.close();
> + in.open(proc_dir + "cmdline");
> + if (std::getline(in, line))
> +   cmd = line.c_str(); // Only up to first '\0'
> + else
> +   return false;
> +   }
> +
> +  for (auto i : {"gdb", "lldb"}) // known debuggers
> +   if (cmd.ends_with(i))
> + return true;
> +
> +  // We found the TracerPid line, no need to do any more work.
> +  return false;
> +}
> +# endif

And then add

namespace {
  bool debugger_announced = false;
}
auto debugger_attached()
{
  debugger_announced = true;
  return std::breakpoint;
}

I suggest to also return the breakpoint function to allow debuggers to
do something clever with it (e.g., set a breakpoint on the entry
instead of having to catch the fallout of the instruction that is
issued in the function (there might be a difference).


With the function any debugger not covered by the existing test can
make itself known.


Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Ulrich Drepper
On Mon, Jun 3, 2024 at 12:20 PM Florian Weimer  wrote:

> Would it make sense to have a special function symbol for this, on which
> the debugger sets the breakpoint?  […]


Jon and I discussed more details off-list.  Hopefully a more complete
version is coming soon-ish.


Re: [committed v4] libstdc++: Fix std::ranges::iota is not included in numeric [PR108760]

2024-06-08 Thread Ulrich Drepper
On Sat, Jun 8, 2024 at 5:03 PM Jonathan Wakely  wrote:
> I'm in two minds about backporting this one. It would be good to fix the
> non-conformance problem for the release branches, but it also
> potentially breaks some code that uses ranges::iota without including
> .

I say add the change as soon as possible so that there is as little
code as possible relying on the non-standard header.


Re: [PATCH] A steadier steady_clock

2012-10-21 Thread Ulrich Drepper
On Sun, Oct 21, 2012 at 12:11 PM, Paolo Carlini
 wrote:
\>> @@ -70,7 +70,11 @@
>> {
>>   timespec tp;
>>   // -EINVAL, -EFAULT
>> +#ifdef CLOCK_MONOTONIC_RAW
>> +  clock_gettime(CLOCK_MONOTONIC_RAW, &tp);
>> +#else
>>   clock_gettime(CLOCK_MONOTONIC, &tp);
>> +#endif
>>   return time_point(duration(chrono::seconds(tp.tv_sec)
>>  + chrono::nanoseconds(tp.tv_nsec)));
>> }

That'll have to be something like

#ifdef CLOCK_MONOTONIC_RAW
  if (clock_gettime(CLOCK_MONOTONIC_RAW, &tp) != 0)
#endif
clock_gettime(CLOCK_MONOTONIC, &tp);

Only way out of this is when you introduce a check for a minimum
kernel ABI somewhere, just like glibc does.


[PATCH] more efficient mersenne_twister_engine::discard

2012-08-22 Thread Ulrich Drepper
The discard member function of the mersenne_twister_engine class is
unnecessarily inefficient.   It currently discard elements one-by-one.
 It is possible to discard with higher granularity by discarding the
entire internal buffer.  The attached patch implements this.  To avoid
duplication a new internal member function is introduced which
generates a new set of bits for the internal buffer.  The operator()
is changed to use it.

2012-08-22  Ulrich Drepper  

* include/bits/random.h (mersenne_twister_engine): Don't inline
discard here.  New member function _M_gen_rand.
* include/bits/random.tcc (mersenne_twister_engine<>::_M_gen_rand):
New function.  Extracted from operator().
(mersenne_twister_engine<>::discard): New implementation which
skips in large steps.
(mersenne_twister_engine<>::operator()): Use _M_gen_rand.


d-mersenne-discard
Description: Binary data


random numbers in bulk

2012-08-25 Thread Ulrich Drepper
The current  interface as defined in the standard is not well
suited for heavy users such as simulations.  The only way to get a
number is using the operator() one-by-one.  This can lead to
significant overhead and, perhaps more importantly, prevents
optimizations from being applied.  For instance, there are way to
faster implement the various distribution functions but the overhead
of setting it up is too high if it has to be done for every call.
Also, in some of the existing distribution implementations there are
tests which the compiler cannot hoist out of the loop and therefore
has to execute every time.

I propose to add a member function fill to all distribution classes.
It has the same interface as the opeartor() except that it has two new
parameters which specify the target buffer.  This means the return
value can be void, too.

I know this is not standardized but I hope the standard committee will
recognize it is necessary.

The question is: how to add this?  What is the practice to add new
member functions?  #ifdef something?

As for the result, some distributions already show with just the
current changes significant improvements.

http://www.akkadia.org/drepper/fill.html

The case where there seems to be a slowdown are artifacts of the micro
benchmark.  There is no reason that any case should be slower.

I have more patches coming.  We can have specialized fill functions.

[ Note: not much testing can be done without modifying the header
without fixing http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54376 ]


d-random-fill
Description: Binary data


Re: random numbers in bulk

2012-08-25 Thread Ulrich Drepper
On Sat, Aug 25, 2012 at 5:42 PM, Paolo Carlini  wrote:
> Personally, assuming the name itself is already reserved / used elsewhere,

That was my thinking as well.  There shouldn't be any further namespace problem.


> .. another preliminary comment of mine: why not using iterators to specify
> those ranges, instead of plain pointers? Aren't the forward iterators
> generally Ok, like for std::fill itself?

Does it really add anything worth?  I used pointers because one of the
other extensions I'll submit really needs pointers because more than
one value is stored at once.

Also, one uses these interfaces to achieve performance.  In no
situation would you store the numbers in a non-sequential way since
this means cache misses.


Re: random numbers in bulk

2012-08-25 Thread Ulrich Drepper
On Sat, Aug 25, 2012 at 7:37 PM, Jonathan Wakely  wrote:
> But iterators don't have to imply non-sequential storage. Using
> iterators instead of pointers would allow you to store them in a
> std::deque, for example, or in a std::vector using
> std::back_insert_iterator.

Yes, and this is already trivial to do with the operator() interface.

The fill() interface is needed for performance, everything else is
taken care by the operator() interface.


Re: random numbers in bulk

2012-08-25 Thread Ulrich Drepper
On Sat, Aug 25, 2012 at 8:29 PM, Paolo Carlini  wrote:
> Understood, but you do *not* loose performance by having those fill functions 
> templates,

Let's see.  The prototypes will then be something like this:

  template
class normal_distribution
{

  template
void
fill(_OutputIterator __f, _OutputIterator __t,
 _UniformRandomNumberGenerator& __urng,
 const param_type& __p);
};

Now I want to define a specialized function which works for a double
iterator and all RNGs.  That's not possible because it means partial
specialization.

Therefore I'd have to add another member function to the class.  If
this is what is wanted I can do this (in fact, I have the code ready).
 It just looks worse because the special functions are in some cases
architecture-specific.  This means the code will be littered with
arch-specific code.


Re: random numbers in bulk

2012-08-26 Thread Ulrich Drepper
On Sun, Aug 26, 2012 at 4:52 AM, Marc Glisse  wrote:
> The std::generate(_n) function seems closer than std::fill. Not sure if
> overloading that function (std::generate_n) would make sense, it avoids
> changing the interface.

I'm not wedded to fill.  generate_n is fine as well.


> If the goal is to avoid listing several overloads in the class, it is still
> possible to dispatch in the (out-of-class) definition of fill. Or is the
> goal to make it extensible, in the sense that a user can still add
> "specializations" (whatever the technical means used, which don't have to be
> what C++ calls specialization)?

As a first step, for instance, I want to add a function which
optimizes the normal_distribution to use SSE and x86 machines.  This
is only useful if I know the type underlying the iterator.  It will
have to match the generated value of the distribution.  I.e., I want
to define

template<>
template
  void
  normal_distribution::
  fill(double* __f, double* __t,
   _UniformRandomNumberGenerator& __urng, const param_type& __param)
  {
...
  }

As said before, this isn't possible because it's a partially
specialized function.

Defining this as a non-member function isn't practical because each
such function would have to be declared a friend.

What I could imagine working is that the iterator fill/generate_n
functions are defined and in addition the special versions which use
pointers of the result_type of the distribution.  There is only one
such type as specified by the template parameter of the distribution
class.  This makes it possible to specialize it and at the same time
it is trivial to specify a default.  In a standardization effort it'd
be possible to exclusively concentrate on the iterator version.
Implementations could provide the pointer versions as transparent
venues for possible optimizations.


Re: random numbers in bulk

2012-08-26 Thread Ulrich Drepper
On Sun, Aug 26, 2012 at 10:04 AM, Daniel Krügler
 wrote:
> The typedef "pointer" should be removed, because it is not used at all.

That 'pointer' type is needed for the __normal_iterator use.  Unfortunately.


Re: random numbers in bulk

2012-08-27 Thread Ulrich Drepper
On Mon, Aug 27, 2012 at 5:46 AM, Paolo Carlini  wrote:
> One last observation from me: I think we are being a little inconsistent in
> terms of inlining. I see some __generate_impl with a non-trivial body inline
> whereas other, with a tad smaller body, in random.tcc.

For the __generate functions there is not much of an upside to having
the functions inline.  Unlike for the operator() definitions, the
values are not immediately used which would enable folding in
additional arithmetic operations with the computations for the
distribution.  I've moved all definitions of __generate_impl into
random.tcc.

Will run some more checks and check in the change after success.


out-of-line and arch-specific random_device

2012-08-27 Thread Ulrich Drepper
Especially after Carlo's comment from earlier that attention is paid
to not inlining unnecessarily it is surprising that the random_device
code is inlined even though it is not a template class.

How about not doing this and moving the definition into the library?

This is done in the attached patch.  It's rather ugly because of the
business with the TR1 support.  Is this really still needed?  Can't we
remove that?  It really makes not much sense for a random_device to be
predictable.

Also, very important, with the current definition the size and layout
of the random_device objects depends on the _GLIBCXX_USE_RANDOM_TR1
macro.  The patch fixes this.  The data elements must be in a union or
all available all the time.  Still, I'd prefer to just remove the code
that uses mt19937...  This would also mean the object size won't
change.

Anyway, another change in the patch is support for a less expensive
implementation on Ivy Bridge processors.  That processor has the
rdrand instruction.  The code uses it if the instruction is usable.
Has been tested on real hardware.  This is not the type of
arch-specific code I meant earlier.  Will get to that tomorrow.

There are other architectures with similar instructions.  Adding
support for those will be simple.


PP1
Description: Binary data


Re: out-of-line and arch-specific random_device

2012-08-28 Thread Ulrich Drepper
On Tue, Aug 28, 2012 at 4:44 AM, Paolo Carlini  wrote:
> Again, without context, I think this is not the point: random_device is meant 
> to be just a simple high level wrapper
> around things like dev/random, inspired by facilities like dev/random on 
> unix-like OSes. The brutal "fall back" we have
> now in place wouldn't be useful anyway for the uses Marc is talking about, 
> because there is no way to provide a seed.
> That said, I can't check right now C++11 about random_device, I suppose Uli 
> has already ;)

I did read it.  random_device is all about non-determinism.  Of course
I know that RNGs in some situations have to be repeatable.  That's
what all the engines are about.  random_device isn't.  You use
random_device to seed an engine etc.

The spec says that if there is no way to create non-deterministic data
the implementation may use a random number engine.  "may" being to
key.


I perhaps didn't make myself clear as to what the big problem is.
Depending on whether or not you define _GLIBCXX_USE_RANDOM_TR1 you get
an object definition for 'random"device" which has the same name and
mangling but has a different size.  This means binary
incompatibilities.  Memory corruptions.

Combine that problem with the fact that there is no need to fall back
to pRNGs (and I'd argue it's really a bad idea because people expect
real entropy and won't get any with pRNGs) and I think that just
removing the pre-TR1 support will solve the issue nicely.


Re: out-of-line and arch-specific random_device

2012-08-28 Thread Ulrich Drepper
On Tue, Aug 28, 2012 at 3:47 AM, Marc Glisse  wrote:
> I assume they are different enough that they can't all be abstracted
> behind a nice common builtin (with default implementation in libgcc
> and/or a macro advertising fast implementations of it) :-(

What is different is the way to interact with the CPU facility to get
the data.  That's exactly the part I abstracted out in its own
function.  I named it __x86_rdrand() but that's more an historic
accident.  I should have named it __get_random_word.  Then all that
needs to be done for another architecture is to provide a definition
of this function.  The rest can be shared.  But this function needs to
be arch-specific since the details really differ sufficiently.


Re: out-of-line and arch-specific random_device

2012-08-28 Thread Ulrich Drepper
On Tue, Aug 28, 2012 at 8:42 AM, Marc Glisse  wrote:
> Thank you for your answers. My main concern was whether it was best to
> implement __get_random_word in libstdc++, or __builtin_random in gcc. But it
> looks like your solution of doing it in libstdc++ makes more sense (at least
> for now).

There are enough subtle differences between the way different
architectures implement this functionality that forcing a common
builtin implementation is problematic at best.  For instance, look at
the repeat counter I use.  That's specific to Intel's code.  For an
implementation like Via's the best choice to handle lack of entropy
might be to (temporarily) reduce the quality of the number.

Anyway, the biggest question for now is whether to keep the
problematic code which uses mt19937 as a replacement.


Re: out-of-line and arch-specific random_device

2012-08-28 Thread Ulrich Drepper
On Tue, Aug 28, 2012 at 9:14 AM, Andi Kleen  wrote:
\> RDRAND is more for cryptographic purposes (key generation etc.), it's not
> supposed to replace pseudo random generators for simulations.

And that's exactly what random_device is for.  It's not an random
number engine like the rest.  It's suposed to be non-deterministic.


Re: out-of-line and arch-specific random_device

2012-08-29 Thread Ulrich Drepper
On Wed, Aug 29, 2012 at 9:48 AM, Paolo Carlini  wrote:
> Minor nit: are you sure we need to
> open a new minor version for the new symbol? Because it seemed to me that
> 4.7.x was behind by one.

I have 4.7 installed and that version already defines the symbols
defined in version 3.4.17.  This is a new symbol and requires a new
version to prevent startup of an app in case of a too old runtime
library.


Re: faster random number engine

2012-08-29 Thread Ulrich Drepper
On Wed, Aug 29, 2012 at 11:43 AM, Paolo Carlini  wro
> The substance isn't of course. But normally we don't have __gnu_cxx things
> in the same std header. Can't we have a new ext/random and put it in there?
> If we can separate the new code to it, I think people would not even object
> to the target dependency, etc. In ext/ we are quite free to do extension /
> experimental work.

OK, I moved the definition to ext.  Will check in the result.


Re: out-of-line and arch-specific random_device

2012-08-30 Thread Ulrich Drepper
On Thu, Aug 30, 2012 at 11:52 AM, Hans-Peter Nilsson
 wrote:
>> From: Ulrich Drepper 
>> Date: Tue, 28 Aug 2012 05:57:08 +0200
>
> This patch (commit r190787) broke build for non-_GLIBCXX_USE_RANDOM_TR1
> targets.  (See libstdc++-v3/configure.ac and its crossconfig.m4 for a
> list.)

Should be fixed now.


Re: faster random number engine

2012-08-31 Thread Ulrich Drepper
On Fri, Aug 31, 2012 at 3:59 AM, Miles Bader  wrote:
> Can this replace the current mersenne twister implementation in
> std:: once the endianness issue, etc, have been worked out?

No, it produces different numbers.


beta distribution

2012-09-03 Thread Ulrich Drepper
Another distribution missing is beta, related to the gamma
distribution.  Instead of the complex formula I've used an iterative
process, similar to the one used for the normal distribution.  There
is no real surprise here, there are two scalar parameters.  Unless
someone things this distribution for some reason does not belong into
the library I'll check in the patch.


d-random-beta
Description: Binary data


Re: [patch] PR bootstrap/54453 (libstdc++ doesn't build)

2012-09-05 Thread Ulrich Drepper
On Wed, Sep 5, 2012 at 6:08 AM, Paolo Carlini  wrote:
> Uli, I'm not sure to understand why that commit of yours changed that
> specific regexp,

Completely unintended, I thought I mentioned this already.  The
problem is emacs's whitespace mode which made those changes.


Re: [PATCH] Fix PR bootstrap/54419

2012-09-06 Thread Ulrich Drepper
On Thu, Sep 6, 2012 at 2:40 PM, Jack Howarth  wrote:
> Okay for gcc trunk?

One typo:

> * configure.ac: Test for rdrnd support in assembler.

It's rdrand.  I wouldn't be pedantic if the opcode wouldn't have
changed from rdrnd to rdrand at some point and using the old name
could be confusing.


symbolic names for processor IDs

2012-09-07 Thread Ulrich Drepper
The x86 cpuid instruction returns a processor ID and the
__get_cpuid_max function even explicitly makes the %ebx value directly
available.  But users of that function have to use a cryptic constant.
 How about adding a few macros to make this more transparent?


Index: gcc/config/i386/cpuid.h
===
--- gcc/config/i386/cpuid.h (revision 191084)
+++ gcc/config/i386/cpuid.h (working copy)
@@ -75,6 +75,16 @@
 #define bit_RDSEED (1 << 18)
 #define bit_ADX(1 << 19)

+/* Signatures for different CPU implementations as returned in uses
+   of cpuid with level 0.  */
+#define signature_INTEL_ebx0x756e6547
+#define signature_INTEL_ecx0x6c65746e
+#define signature_INTEL_edx0x49656e69
+
+#define signature_AMD_ebx  0x68747541
+#define signature_AMD_ecx  0x444d4163
+#define signature_AMD_edx  0x69746e65
+
 #if defined(__i386__) && defined(__PIC__)
 /* %ebx may be the PIC register.  */
 #if __GNUC__ >= 3


Re: symbolic names for processor IDs

2012-09-08 Thread Ulrich Drepper
On Sat, Sep 8, 2012 at 7:17 AM, Uros Bizjak  wrote:
> There are some other cpuid vendor signatures than AMD and Intel, and
> probably there will be some more.

Sure, if there are still people using those they can easily add those.


> IMO, anybody using __get_cpuid_max
> call should define accepted signatures by itself.

According to that argument there should also be no bit_* definitions.
Of course everyone can define them.  It's easy enough but it's also
easy to make mistakes and to not document them.


Re: symbolic names for processor IDs

2012-09-08 Thread Ulrich Drepper
On Sat, Sep 8, 2012 at 7:17 AM, Uros Bizjak  wrote:
> There are some other cpuid vendor signatures than AMD and Intel,

How about a patch with this complete list?


d-gcc-cpuid-signature
Description: Binary data


Re: out-of-line and arch-specific random_device

2012-09-09 Thread Ulrich Drepper
On Sun, Sep 9, 2012 at 1:36 PM, Jonathan Wakely  wrote:
> Also, why does random.cc contain a non-member function called
> _M_strotoul,

Copy&paste.  Used to be in the class.  Should be changed now.


[PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Ulrich Drepper
Here is a patch to accelerate the __generate function for the
normal_distribution class.  The speed-up is quite significant,
the amount depending on which random number engine is used.

mt19937+20%

mt19937_64 +30%

sfmt19937  +30%

sfmt19937_64   +30%


This patch introduces a header with optimizations for .  No
changes to existing code needed, this is a straight-forward
specialization.  Tested on x86_64-linux.  More optimizations follow,
there is still quite a bit of inefficiency in the existing interfaces.
 OK to commit?


2012-09-21  Ulrich Drepper  

Optimize bulk mode for normal_distribution for SSE3.
* configure.host: Define cpu_opt_bits_random.
* configure.ac: Substitute CPU_OPT_BITS_RANDOM.
* configure: Regenerated.
* include/Makefile.am (bits_headers): Add ${bits_host_headers}.
(bits_host_headers): Define.
* include/bits/random.tcc: Move __details::_Power_of_2 to...
* include/bits/random.h: ...here.
* include/std/random: Include .
* config/cpu/i486/opt/bits/opt_random.h: New file.
* config/cpu/generic/opt/bits/opt_random.h: New file.


d-random-opt-normal-sse
Description: Binary data


Re: [PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Ulrich Drepper
On Wed, Sep 26, 2012 at 7:32 AM, Jakub Jelinek  wrote:
> Have you considered also an __AVX__ version handling 4 elements at a time?
> Without __AVX2__ one would need to cast __m256i to __m256d for and/or, as
> AVX1 doesn't have _mm256_and_si256 or _mm256_or_si256, but _mm256_and_pd
> or _mm256_or_pd could be used instead.

One step to do first.  Currently the random number engine interface is
inefficient since it returns a single number.  What we need is an
additional interface to return vectors.  I'd love to use the gcc
vector extensions.  For engines like sfmt this is natural.  There are
a few issues with C++ support for the vector extensions.  Operations
available in C are not supported in C++ yet.


Re: [PATCH] normal_distribution performance improvement with SSE

2012-09-26 Thread Ulrich Drepper
On Wed, Sep 26, 2012 at 12:14 PM, Marc Glisse  wrote:
>> Currently the random number engine interface is
>> inefficient since it returns a single number.  What we need is an
>> additional interface to return vectors.
>
>
> Isn't the __generate interface good enough?

__generate is for the distributions.  I'm talking about the engines.
Bulk access isn't that easy there.  The stream is deterministic and if
I request 32 bytes of entropy although I only need 16 bytes I cannot
push back the remaining 16 bytes.  Without that the next use of the
engine would produce different bits.


[PATCH] C++ math constants

2013-02-21 Thread Ulrich Drepper
How about the attached file as a start for .  I used the
constexpr approach (instead of function calls) and replicated the
constants that are available in  in Unix.

What other constants to add?


math
Description: Binary data


Re: [PATCH] C++ math constants

2013-02-28 Thread Ulrich Drepper
On Thu, Feb 21, 2013 at 12:38 PM, Benjamin De Kosnik  wrote:
> Not seeing it.
>
> Say for:
>
> #include 
>
>   // A class for math constants.
>   template
> struct __math_constants
> {
>   // Constant @f$ \pi @f$.
>   static constexpr _RealType __pie =
>   3.1415926535897932384626433832795029L; };
>
> template
> void print(const T& t) { std::cout << t; }
>
> int main()
> {
>   print(__math_constants::__pie);
>   return 0;
> }
>
> I'm not getting any definition, even at -O0.

Even more so: how would an explicit instantiation even work?

Try this simplified code:

template
struct a {
  static constexpr T m = T(1);
};

If you try

  template<> constexpr int a::m;

nothing gets emitted into the object file (this is even with the trunk
gcc).  If I use

  template<> constexpr int a::m = 1;

I get a definition but I have to remove the initialization in the
class definition itself.  If I use

  struct template a;

there is no output in the file as well.


All this makes perfect sense with gcc.  Since the constant will never
be referenced as a variable there is no need for the compiler to emit
a definition.  If the argument is that there has to be one, how would
it be done?


Re: [PATCH] C++ math constants

2013-03-01 Thread Ulrich Drepper
How about this patch then?  As I said, I have code in need of
constants lined up and Edward likely also wants to take advantage of
them in some of his code.


Index: include/Makefile.am
===
--- include/Makefile.am (revision 196362)
+++ include/Makefile.am (working copy)
@@ -499,6 +499,7 @@
  ${ext_srcdir}/array_allocator.h \
  ${ext_srcdir}/bitmap_allocator.h \
  ${ext_srcdir}/cast.h \
+ ${ext_srcdir}/cmath \
  ${ext_srcdir}/codecvt_specializations.h \
  ${ext_srcdir}/concurrence.h \
  ${ext_srcdir}/debug_allocator.h \
--- /dev/null 2013-02-06 19:11:05.441448320 -0500
+++ include/ext/cmath 2013-03-01 09:28:36.448535383 -0500
@@ -0,0 +1,152 @@
+// Math extensions -*- C++ -*-
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file ext/cmath
+ *  This file is a GNU extension to the Standard C++ Library.
+ */
+
+#ifndef _EXT_CMATH
+#define _EXT_CMATH 1
+
+#pragma GCC system_header
+
+#if __cplusplus < 201103L
+# include 
+#else
+
+#include 
+#include 
+
+namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  // A class for math constants.
+  template
+struct __math_constants
+{
+  static_assert(std::is_floating_point<_RealType>::value,
+"template argument not a floating point type");
+
+  // Constant @f$ \pi @f$.
+  static constexpr _RealType __pi = 3.1415926535897932384626433832795029L;
+  // Constant @f$ \pi / 2 @f$.
+  static constexpr _RealType __pi_half =
1.5707963267948966192313216916397514L;
+  // Constant @f$ \pi / 3 @f$.
+  static constexpr _RealType __pi_third =
1.0471975511965977461542144610931676L;
+  // Constant @f$ \pi / 4 @f$.
+  static constexpr _RealType __pi_quarter =
0.7853981633974483096156608458198757L;
+  // Constant @f$ \sqrt(\pi / 2) @f$.
+  static constexpr _RealType __root_pi_div_2 =
1.2533141373155002512078826424055226L;
+  // Constant @f$ 1 / \pi @f$.
+  static constexpr _RealType __one_div_pi =
0.3183098861837906715377675267450287L;
+  // Constant @f$ 2 / \pi @f$.
+  static constexpr _RealType __two_div_pi =
0.6366197723675813430755350534900574L;
+  // Constant @f$ 2 / \sqrt(\pi) @f$.
+  static constexpr _RealType __two_div_root_pi =
1.1283791670955125738961589031215452L;
+
+  // Constant Euler's number @f$ e @f$.
+  static constexpr _RealType __e = 2.7182818284590452353602874713526625L;
+  // Constant @f$ 1 / e @f$.
+  static constexpr _RealType __one_div_e =
0.36787944117144232159552377016146087L;
+  // Constant @f$ \log_2(e) @f$.
+  static constexpr _RealType __log2_e =
1.4426950408889634073599246810018921L;
+  // Constant @f$ \log_10(e) @f$.
+  static constexpr _RealType __log10_e =
0.4342944819032518276511289189166051L;
+  // Constant @f$ \ln(2) @f$.
+  static constexpr _RealType __ln_2 =
0.6931471805599453094172321214581766L;
+  // Constant @f$ \ln(3) @f$.
+  static constexpr _RealType __ln_3 =
1.0986122886681096913952452369225257L;
+  // Constant @f$ \ln(10) @f$.
+  static constexpr _RealType __ln_10 =
2.3025850929940456840179914546843642L;
+
+  // Constant Euler-Mascheroni @f$ \gamma_E @f$.
+  static constexpr _RealType __gamma_e =
0.5772156649015328606065120900824024L;
+  // Constant Golden Ratio @f$ \phi @f$.
+  static constexpr _RealType __phi = 1.6180339887498948482045868343656381L;
+
+  // Constant @f$ \sqrt(2) @f$.
+  static constexpr _RealType __root_2 =
1.4142135623730950488016887242096981L;
+  // Constant @f$ \sqrt(3) @f$.
+  static constexpr _RealType __root_3 =
1.7320508075688772935274463415058724L;
+  // Constant @f$ \sqrt(5) @f$.
+  static constexpr _RealType __root_5 =
2.2360679774997896964091736687312762L;
+  // Constant @f$ \sqrt(7) @f$.
+  static constexpr _RealType __root_7 =
2.6457513110645905905016157536392604L;
+  // Constant @f$ 1 / \sqrt(2) @f$.
+  static constexpr _Real

more distributions

2013-03-01 Thread Ulrich Drepper
I have a few more distributions to be added.  The triangle
distribution is the result of combining to uniform distributions and
therefore quite frequently used.  The von Mises distribution (the
simple, 2D version) would be the first circular distribution.

The patch depends on the __math_constants patch.


d-gcc-tri-mises
Description: Binary data


Re: more distributions

2013-03-01 Thread Ulrich Drepper
On Fri, Mar 1, 2013 at 5:55 PM,  <3dw...@verizon.net> wrote:
> I was looking at a paper: Modeling Data using Directional Distributions by 
> Inderjit S. Dhillon and Suvrit Sra that looks
> like it would be very similar to your multi-variate normal distribution. 
> These generalize von Mises to higher dimension.
> Is this your next target?

von Mises-Fisher?  I might indeed look at it.  The relationship to the
MV Normal distribution is only marginal, though.  MV Normal is not a
circular distribution.

I have a whole bunch of optimization patches which aren't quite done
yet.  I might pick those up first again.


Re: more distributions

2013-03-02 Thread Ulrich Drepper
On Sat, Mar 2, 2013 at 5:21 AM, Paolo Carlini  wrote:
> Exceptionally, I think we can ho ahead with this one too.

Shall I check in the two patches?  I added the work-around for
copysign which is the only function used other than log, cos, acos,
sqrt.


Re: more distributions

2013-03-02 Thread Ulrich Drepper
On Sat, Mar 2, 2013 at 3:43 PM, Paolo Carlini  wrote:
> Yes. Personally, I'm also eager to see your further performance improvements, 
> but I'm afraid will have to wait for 4.9.0.

I checked in the code.  The performance improvements need some
discussions.  I need to do some more experimentation and then will
post some results.


von Mises distribution improvement

2013-03-03 Thread Ulrich Drepper
I'd like to check in this patch which would improve the performance of
the distribution quite a bit by pulling constant computations into the
constructor.  This patch will change the memory layout which can be
done easily only now.  It also fixes one small bug in operator== and
in a comment.

OK?


Index: libstdc++-v3/include/ext/random
===
--- libstdc++-v3/include/ext/random (revision 196416)
+++ libstdc++-v3/include/ext/random (working copy)
@@ -2621,6 +2621,12 @@
   const _RealType __pi = __gnu_cxx::__math_constants<_RealType>::__pi;
   _GLIBCXX_DEBUG_ASSERT(_M_mu >= -__pi && _M_mu <= __pi);
   _GLIBCXX_DEBUG_ASSERT(_M_kappa >= _RealType(0));
+
+  auto __tau = std::sqrt(_RealType(4) * _M_kappa * _M_kappa
+ + _RealType(1)) + _RealType(1);
+  auto __rho = ((__tau - std::sqrt(_RealType(2) * __tau))
+ / (_RealType(2) * _M_kappa));
+  _M_r = (_RealType(1) + __rho * __rho) / (_RealType(2) * __rho);
  }

  _RealType
@@ -2633,16 +2639,17 @@

  friend bool
  operator==(const param_type& __p1, const param_type& __p2)
- { return __p1._M_kappa == __p2._M_kappa; }
+ { return (__p1._M_mu == __p2._M_mu
+  && __p1._M_kappa == __p2._M_kappa); }

   private:
-
  _RealType _M_mu;
  _RealType _M_kappa;
+ _RealType _M_r;
   };

   /**
-   * @brief Constructs a beta distribution with parameters
+   * @brief Constructs a von Mises distribution with parameters
* @f$\mu@f$ and @f$\kappa@f$.
*/
   explicit
@@ -2727,20 +2734,13 @@
 = __gnu_cxx::__math_constants::__pi;
   std::__detail::_Adaptor<_UniformRandomNumberGenerator, result_type>
 __aurng(__urng);
-  result_type __tau = (std::sqrt(result_type(4) * this->kappa()
- * this->kappa() + result_type(1))
-   + result_type(1));
-  result_type __rho = ((__tau - std::sqrt(result_type(2) * __tau))
-   / (result_type(2) * this->kappa()));
-  result_type __r = ((result_type(1) + __rho * __rho)
- / (result_type(2) * __rho));

   result_type __f;
   while (1)
 {
   result_type __rnd = std::cos(__pi * __aurng());
-  __f = (result_type(1) + __r * __rnd) / (__r + __rnd);
-  result_type __c = this->kappa() * (__r - __f);
+  __f = (result_type(1) + __p._M_r * __rnd) / (__p._M_r + __rnd);
+  result_type __c = __p._M_kappa * (__p._M_r - __f);

   result_type __rnd2 = __aurng();
   if (__c * (result_type(2) - __c) > __rnd2)
@@ -2756,7 +2756,7 @@
   if (__aurng() < result_type(0.5))
 __res = -__res;
 #endif
-  __res += this->mu();
+  __res += __p._M_mu;
   if (__res > __pi)
 __res -= result_type(2) * __pi;
   else if (__res < -__pi)


std::atomic_flag::test

2020-05-08 Thread Ulrich Drepper via Gcc-patches
This is not yet implemented.  Here is a patch.

2020-05-08  Ulrich Drepper  

* include/bits/atomic_base.h (atomic_flag): Implement test
memeber function.
* include/std/version: Define __cpp_lib_atomic_flag_test.
* testsuite/29_atomics/atomic_flag/test/explicit.cc: New file.
* testsuite/29_atomics/atomic_flag/test/implicit.cc: New file.



libatomic does not have a function 'test' so I implemented it with
__atomic_load (which takes care of memory ordering) and then compare
with the set-value.

The code generated at least for x86-64 looks good, it's a
straight-forward load, nothing else.
2020-05-08  Ulrich Drepper  

* include/bits/atomic_base.h (atomic_flag): Implement test memeber 
function.
* include/std/version: Define __cpp_lib_atomic_flag_test.
* testsuite/29_atomics/atomic_flag/test/explicit.cc: New file.
* testsuite/29_atomics/atomic_flag/test/implicit.cc: New file.

diff --git libstdc++-v3/include/bits/atomic_base.h 
libstdc++-v3/include/bits/atomic_base.h
index 87fe0bd6000..3b66b040976 100644
--- libstdc++-v3/include/bits/atomic_base.h
+++ libstdc++-v3/include/bits/atomic_base.h
@@ -208,6 +208,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __atomic_test_and_set (&_M_i, int(__m));
 }
 
+#if __cplusplus > 201703L
+#define __cpp_lib_atomic_flag_test 201907L
+
+_GLIBCXX_ALWAYS_INLINE bool
+test(memory_order __m = memory_order_seq_cst) noexcept
+{
+  __atomic_flag_data_type __v;
+  __atomic_load(&_M_i, &__v, int(__m));
+  return __v == __GCC_ATOMIC_TEST_AND_SET_TRUEVAL;
+}
+
+_GLIBCXX_ALWAYS_INLINE bool
+test(memory_order __m = memory_order_seq_cst) volatile noexcept
+{
+  __atomic_flag_data_type __v;
+  __atomic_load(&_M_i, &__v, int(__m));
+  return __v == __GCC_ATOMIC_TEST_AND_SET_TRUEVAL;
+}
+#endif // C++20
+
 _GLIBCXX_ALWAYS_INLINE void
 clear(memory_order __m = memory_order_seq_cst) noexcept
 {
diff --git libstdc++-v3/include/std/version libstdc++-v3/include/std/version
index c3a5bd26e63..c6bde2cfbda 100644
--- libstdc++-v3/include/std/version
+++ libstdc++-v3/include/std/version
@@ -164,6 +164,7 @@
 
 #if __cplusplus > 201703L
 // c++2a
+#define __cpp_lib_atomic_flag_test 201907L
 #define __cpp_lib_atomic_float 201711L
 #define __cpp_lib_atomic_ref 201806L
 #define __cpp_lib_atomic_value_initialization 201911L
--- /dev/null   2020-05-07 16:14:59.793169510 +0200
+++ libstdc++-v3/testsuite/29_atomics/atomic_flag/test/explicit.cc  
2020-05-08 12:53:14.134152671 +0200
@@ -0,0 +1,32 @@
+// { dg-do run { target c++2a } }
+// { dg-require-thread-fence "" }
+
+// Copyright (C) 2008-2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+#include 
+
+int main()
+{
+  using namespace std;
+  atomic_flag af = ATOMIC_FLAG_INIT;
+
+  if (af.test(memory_order_acquire))
+af.clear(memory_order_release);
+
+  return 0;
+}
--- /dev/null   2020-05-07 16:14:59.793169510 +0200
+++ libstdc++-v3/testsuite/29_atomics/atomic_flag/test/implicit.cc  
2020-05-08 12:54:48.608014474 +0200
@@ -0,0 +1,32 @@
+// { dg-do run { target c++2a } }
+// { dg-require-thread-fence "" }
+
+// Copyright (C) 2008-2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+#include 
+
+int main()
+{
+  using namespace std;
+  atomic_flag af = ATOMIC_FLAG_INIT;
+
+  if (af.test())
+af.clear();
+
+  return 0;
+}


signature.asc
Description: OpenPGP digital signature


Re: [PATCH] i386: Add peephole2 for __atomic_sub_fetch (x, y, z) == 0 [PR98737]

2021-01-27 Thread Ulrich Drepper via Gcc-patches
On 1/27/21 11:37 AM, Jakub Jelinek wrote:
> Would equality comparison against 0 handle the most common cases.
> 
> The user can write it as
> __atomic_sub_fetch (x, y, z) == 0
> or
> __atomic_fetch_sub (x, y, z) - y == 0
> thouch, so the expansion code would need to be able to cope with both.

Please also keep !=0, <0, <=0, >0, and >=0 in mind.  They all can be
useful and can be handled with the flags.



OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] C++ API database

2022-09-28 Thread Ulrich Drepper via Gcc-patches
Ping.  Anyone having problems with this?  And the governance of the file?

On Mon, Sep 12, 2022 at 1:51 PM Ulrich Drepper  wrote:

> After my prior inquiry into the use of python as a build tool for
> maintainers didn't produce any negative comments and several active and
> even enthusiastic support message I'm going forward with submitting the
> patch.
>
> To repeat the detail, for the generation of the upcoming C++ standard
> library module and the hints for missing definitions/declarations in the
> std:: namespace we need a list of standard C++ APIs.  The information
> needed for the two use cases is different but the actual APIs overlap
> almost completely and therefore it would be a bad idea to have the data
> separated.
>
> We could opt for a file format that is easy to read in awk and writing the
> appropriate scripts to transform the data into the appropriate output
> format but this looks ugly, is hard to understand, and a nightmare to
> maintain.  On the other hand, writing the code in Python is simple and
> clean.
>
>
> Therefore, Jonathan and I worked on a CSV file which contains the
> necessary information and a Python to create the gperf input file to
> generate std-name-hint.h and also, in future, the complete source of the
> export interface description for the standard library module.  This mode is
> not yet used because the module support isn't ready yet.  The output file
> corresponds to the hand-coded version of the export code Jonathan uses
> right now.
>
> Note that in both of these cases the generated files are static, they
> don't depend on the local configuration and therefore are checked into the
> source code repository.  The script only has to run if the generated files
> are explicitly removed or, in maintainer mode, if the CSV file has
> changed.  For normal compilation from a healthy source code tree the tool
> is not needed.
>
>
> One remaining issue is the responsibility for the CSV file.  The file
> needs to live in the directory of the frontend and therefore nominally
> changes need to be approved by the frontend maintainers.  The content
> entirely consists of information from the standard library, though.  Any
> change that doesn't break the build on one machine (i.e., the Python script
> doesn't fail) will not course any problem because the output format of the
> script is correct.  Therefore we have been wondering whether the CSV file
> should at least have shared ownership between the frontend maintainers and
> the libstdc++ maintainers.
>
> The CSV file contain more hint information than the old hand-coded .gperf
> file.  So, an additional effect of this patch is the extension of the hints
> that are provided but given that the lookup is now fast this shouldn't have
> any negative impact.  The file is not complete, though, this will come over
> time and definitely before the module support is done.
>
> I build my complete set of compilers with this patch without problems.
>
> Any comments?
>
> 2022-09-12  Jonathan Wakely  
> Ulrich Drepper  
>
> contrib/ChangeLog
> * gcc_update: Add rule to generate gcc/cp/std-name-hint.gperf.
>
> gcc/cp/ChangeLog
> * Make-lang.in: Add rule to generate gcc/cp/std-name-hint.gperf.
> Adjust rule to generate $(srcdir)/cp/std-name-hint.h.
> Add explicit rule to depend cp/name-lookup.o on
> $(srcdir)/cp/std-name-hint.h.
> * cxxapi-data.csv: New file.  Database of C++ APIs.
> * gen-cxxapi-file.py: New file.  Script to generate source code for
> C++ standard library exports and to generate C++ std:: namespace
> fix hints.
> * std-name-hint.gperf: Regenerated.
> * std-name-hint.h: Regenerated.
>


add more C++ name hints

2022-08-05 Thread Ulrich Drepper via Gcc-patches
How about adding a few more names from the std namespace to get appropriate
hints?  This patch compiles and the appropriate messages are printed.  Is
there a problem with just adding more or even at some point all the symbols
of the standard library?

gcc/ChangeLog:

* cp/name-lookup.cc (get_std_name_hint): Add more symbols from the
, ,  and  headers.


d-g++-std-io-syms-hints
Description: Binary data


  1   2   >