Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package ugrep for openSUSE:Factory checked in at 2023-08-08 15:54:58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/ugrep (Old) and /work/SRC/openSUSE:Factory/.ugrep.new.22712 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "ugrep" Tue Aug 8 15:54:58 2023 rev:47 rq:1102773 version:3.12.6 Changes: -------- --- /work/SRC/openSUSE:Factory/ugrep/ugrep.changes 2023-08-06 16:29:53.075818957 +0200 +++ /work/SRC/openSUSE:Factory/.ugrep.new.22712/ugrep.changes 2023-08-08 15:55:08.177069080 +0200 @@ -1,0 +2,12 @@ +Mon Aug 7 05:23:38 UTC 2023 - Andreas Stieger <[email protected]> + +- update to 3.12.6: + * New option -S (--dereference-files) to follow symbolic links + only to files, not to directories, when using option -r for + recursive search. + * Updated default recursive search to strictly perform -r without + following any symbolic links. + * New option --index for fast index-based search with the new + ugrep-indexer tool. + +------------------------------------------------------------------- Old: ---- ugrep-3.12.5.tar.gz New: ---- ugrep-3.12.6.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ ugrep.spec ++++++ --- /var/tmp/diff_new_pack.EcX0QD/_old 2023-08-08 15:55:10.381082873 +0200 +++ /var/tmp/diff_new_pack.EcX0QD/_new 2023-08-08 15:55:10.385082899 +0200 @@ -17,7 +17,7 @@ Name: ugrep -Version: 3.12.5 +Version: 3.12.6 Release: 0 Summary: Universal grep: a feature-rich grep implementation with focus on speed License: BSD-3-Clause ++++++ ugrep-3.12.5.tar.gz -> ugrep-3.12.6.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/CONTRIBUTING.md new/ugrep-3.12.6/CONTRIBUTING.md --- old/ugrep-3.12.5/CONTRIBUTING.md 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/CONTRIBUTING.md 2023-08-06 22:29:36.000000000 +0200 @@ -2,7 +2,7 @@ ================= Thank you for taking the time to contribute. Let's keep moving forward -together to make ugrep the best Universal grep utility on the planet. +together to make ugrep the best grep utility on the planet. The following is a set of guidelines for contributing to ugrep. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/README.md new/ugrep-3.12.6/README.md --- old/ugrep-3.12.5/README.md 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/README.md 2023-08-06 22:29:36.000000000 +0200 @@ -1350,7 +1350,7 @@ <a name="recursion"/> -### Recursively list matching files with -l, -R, -r, --depth, -g, -O, and -t +### Recursively list matching files with -l, -R, -r, -S, --depth, -g, -O, and -t -L, --files-without-match Only the names of files not containing selected lines are written @@ -1367,13 +1367,15 @@ specified, outputs directories in a tree-like format. -R, --dereference-recursive Recursively read all files under each directory. Follow all - symbolic links to directories, unlike -r. See also option --sort. + symbolic links to files and directories, unlike -r. -r, --recursive Recursively read all files under each directory, following symbolic - links to files but not to directories. Note that when no FILE + links only if they are on the command line. Note that when no FILE arguments are specified and input is read from a terminal, - recursive searches are performed as if -r is specified. See also - option --sort. + recursive searches are performed as if -r is specified. + -S, --dereference-files + When -r is specified, symbolic links to files are followed, but not + to directories. The default is not to follow symbolic links. --depth=[MIN,][MAX], -1, -2, -3, ... -9, --10, --11, --12, ... Restrict recursive searches from MIN to MAX directory levels deep, where -1 (--depth=1) searches the specified path without recursing @@ -1413,8 +1415,7 @@ searches are performed as if `-r` is specified. To force reading from standard input, specify `-` as the FILE argument. -To recursively list all non-empty files in the working directory, following -symbolic links: +To recursively list all non-empty files in the working directory: ug -r -l '' @@ -2340,10 +2341,10 @@ ensure that fuzzy matches do not extend the pattern match beyond the number of lines specified by the regex pattern. -Option `-U` (`--binary`) restricts fuzzy matches to ASCII and binary only with -edit distances measured in bytes. Otherwise, fuzzy pattern matching is -performed with Unicode patterns and edit distances are measured in Unicode -characters. +Option `-U` (`--ascii` or `--binary`) restricts fuzzy matches to ASCII and +binary only with edit distances measured in bytes. Otherwise, fuzzy pattern +matching is performed with Unicode patterns and edit distances are measured in +Unicode characters. Option `--sort=best` orders files by best match. Files with at least one exact match anywhere in the file are shown first, followed by files with approximate @@ -2597,7 +2598,7 @@ ### Searching and displaying binary files with -U, -W, and -X - -U, --binary + -U, --ascii, --binary Disables Unicode matching for binary file matching, forcing PATTERN to match bytes, not Unicode characters. For example, -U '\xa3' matches byte A3 (hex) instead of the Unicode code point U+00A3 @@ -4397,7 +4398,13 @@ --index Perform indexing-based search on files indexed with ugrep-indexer. - Note: a beta release feature. + Recursive searches are performed by skipping non-matching files. + Binary files are skipped with option -I. Note that the start-up + time to search is increased, which may be significant when complex + search patterns are specified that contain large Unicode character + classes with `*' or `+' repeats, which should be avoided. Option + -U (--ascii) improves performance. Option --stats=vm displays a + detailed indexing-based search report. This is a beta feature. -J NUM, --jobs=NUM Specifies the number of threads spawned to search files. By @@ -4537,8 +4544,8 @@ pattern matching. -p, --no-dereference - If -R or -r is specified, no symbolic links are followed, even - when they are specified on the command line. + If -R or -r is specified, do not follow symbolic links, even when + symbolic links are specified on the command line. --pager[=COMMAND] When output is sent to the terminal, uses COMMAND to page through @@ -4573,15 +4580,14 @@ has been found. -R, --dereference-recursive - Recursively read all files under each directory. Follow all - symbolic links to directories, unlike -r. See also option --sort. + Recursively read all files under each directory. Follow symbolic + links to files and directories, unlike -r. -r, --recursive Recursively read all files under each directory, following - symbolic links to files but not to directories. Note that when no - FILE arguments are specified and input is read from a terminal, - recursive searches are performed as if -r is specified. See also - option --sort. + symbolic links only if they are on the command line. Note that + when no FILE arguments are specified and input is read from a + terminal, recursive searches are performed as if -r is specified. --replace=FORMAT Replace matching patterns in the output by the specified FORMAT @@ -4590,9 +4596,9 @@ outputs `%' and `%~' outputs a newline. See option --format, `ugrep --help format' and `man ugrep' section FORMAT for details. - -S, --dereference - If -r is specified, all symbolic links are followed, like -R. The - default is not to follow symbolic links to directories. + -S, --dereference-files + When -r is specified, follow symbolic links to files, but not to + directories. The default is not to follow symbolic links. -s, --no-messages Silent mode: nonexistent and unreadable files are ignored, i.e. @@ -4668,11 +4674,11 @@ options -c, -l or -L are used. This option is enabled by --pretty when the output is sent to a terminal. - -U, --binary - Disables Unicode matching for binary file matching, forcing - PATTERN to match bytes, not Unicode characters. For example, -U - '\xa3' matches byte A3 (hex) instead of the Unicode code point - U+00A3 represented by the UTF-8 sequence C2 A3. See also option + -U, --ascii, --binary + Disables Unicode matching for ASCII and binary matching. PATTERN + matches bytes, not Unicode characters. For example, -U '\xa3' + matches byte A3 (hex) instead of the Unicode code point U+00A3 + represented by the UTF-8 sequence C2 A3. See also option --dotall. -u, --ungroup @@ -5339,7 +5345,7 @@ - ugrep 3.12.5 August 4, 2023 UGREP(1) + ugrep 3.12.6 August 6, 2023 UGREP(1) ð [Back to table of contents](#toc) Binary files old/ugrep-3.12.5/bin/win32/ugrep.exe and new/ugrep-3.12.6/bin/win32/ugrep.exe differ Binary files old/ugrep-3.12.5/bin/win64/ugrep.exe and new/ugrep-3.12.6/bin/win64/ugrep.exe differ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/configure new/ugrep-3.12.6/configure --- old/ugrep-3.12.5/configure 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/configure 2023-08-06 22:29:36.000000000 +0200 @@ -1,6 +1,6 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.71 for ugrep 3.6. +# Generated by GNU Autoconf 2.71 for ugrep 3.12. # # # Copyright (C) 1992-1996, 1998-2017, 2020-2021 Free Software Foundation, @@ -609,8 +609,8 @@ # Identity of this package. PACKAGE_NAME='ugrep' PACKAGE_TARNAME='ugrep' -PACKAGE_VERSION='3.6' -PACKAGE_STRING='ugrep 3.6' +PACKAGE_VERSION='3.12' +PACKAGE_STRING='ugrep 3.12' PACKAGE_BUGREPORT='' PACKAGE_URL='' @@ -1349,7 +1349,7 @@ # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures ugrep 3.6 to adapt to many kinds of systems. +\`configure' configures ugrep 3.12 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... @@ -1420,7 +1420,7 @@ if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of ugrep 3.6:";; + short | recursive ) echo "Configuration of ugrep 3.12:";; esac cat <<\_ACEOF @@ -1562,7 +1562,7 @@ test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -ugrep configure 3.6 +ugrep configure 3.12 generated by GNU Autoconf 2.71 Copyright (C) 2021 Free Software Foundation, Inc. @@ -2099,7 +2099,7 @@ This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by ugrep $as_me 3.6, which was +It was created by ugrep $as_me 3.12, which was generated by GNU Autoconf 2.71. Invocation command line was $ $0$ac_configure_args_raw @@ -3586,7 +3586,7 @@ # Define the identity of the package. PACKAGE='ugrep' - VERSION='3.6' + VERSION='3.12' printf "%s\n" "#define PACKAGE \"$PACKAGE\"" >>confdefs.h @@ -9358,7 +9358,7 @@ # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" -This file was extended by ugrep $as_me 3.6, which was +This file was extended by ugrep $as_me 3.12, which was generated by GNU Autoconf 2.71. Invocation command line was CONFIG_FILES = $CONFIG_FILES @@ -9426,7 +9426,7 @@ cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_cs_config='$ac_cs_config_escaped' ac_cs_version="\\ -ugrep config.status 3.6 +ugrep config.status 3.12 configured by $0, generated by GNU Autoconf 2.71, with options \\"\$ac_cs_config\\" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/configure.ac new/ugrep-3.12.6/configure.ac --- old/ugrep-3.12.5/configure.ac 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/configure.ac 2023-08-06 22:29:36.000000000 +0200 @@ -1,4 +1,4 @@ -AC_INIT([ugrep],[3.6]) +AC_INIT([ugrep],[3.12]) AM_INIT_AUTOMAKE([foreign]) AC_CONFIG_HEADERS([config.h]) AC_COPYRIGHT([Copyright (C) 2019-2022 Robert van Engelen, Genivia Inc.]) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/include/reflex/pattern.h new/ugrep-3.12.6/include/reflex/pattern.h --- old/ugrep-3.12.5/include/reflex/pattern.h 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/include/reflex/pattern.h 2023-08-06 22:29:36.000000000 +0200 @@ -50,6 +50,7 @@ #include <list> #include <map> #include <set> +#include <array> #include <bitset> #include <vector> @@ -782,12 +783,12 @@ }; /// Indexing hash finite state automaton for indexed file search. struct HFA { - static const size_t MAX_DEPTH = 16; // max hashed pattern length must be between 3 and 16, long is accurate - static const size_t MAX_CHAIN = 8; // max length of hashed chars chain must be between 2 and 8 (8 is optimal) - static const size_t MAX_STATES = 1024; // max number of states must be 256 or greater - static const size_t MAX_RANGES = 262144; // max number of hashes ranges on an edge to the next state + static const size_t MAX_DEPTH = 16; ///< max hashed pattern length must be between 3 and 16, long is accurate + static const size_t MAX_CHAIN = 8; ///< max length of hashed chars chain must be between 2 and 8 (8 is optimal) + static const size_t MAX_STATES = 1024; ///< max number of states must be 256 or greater + static const size_t MAX_RANGES = 262144; ///< max number of hashes ranges on an edge to the next state typedef ORanges<Hash> HashRange; - typedef HashRange HashRanges[MAX_DEPTH]; + typedef std::array<HashRange,MAX_DEPTH> HashRanges; typedef std::map<DFA::State*,HashRanges> StateHashes; typedef uint16_t State; typedef std::map<State,HashRanges> Hashes; diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/man/ugrep.1 new/ugrep-3.12.6/man/ugrep.1 --- old/ugrep-3.12.5/man/ugrep.1 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/man/ugrep.1 2023-08-06 22:29:36.000000000 +0200 @@ -1,4 +1,4 @@ -.TH UGREP "1" "August 04, 2023" "ugrep 3.12.5" "User Commands" +.TH UGREP "1" "August 06, 2023" "ugrep 3.12.6" "User Commands" .SH NAME \fBugrep\fR, \fBug\fR -- file pattern searcher .SH SYNOPSIS @@ -519,7 +519,13 @@ .TP \fB\-\-index\fR Perform indexing\-based search on files indexed with ugrep\-indexer. -Note: a beta release feature. +Recursive searches are performed by skipping non\-matching files. +Binary files are skipped with option \fB\-I\fR. Note that the start\-up +time to search is increased, which may be significant when complex +search patterns are specified that contain large Unicode character +classes with `*' or `+' repeats, which should be avoided. Option +\fB\-U\fR (\fB\-\-ascii\fR) improves performance. Option \fB\-\-stats\fR=vm displays a +detailed indexing\-based search report. This is a beta feature. .TP \fB\-J\fR \fINUM\fR, \fB\-\-jobs\fR=\fINUM\fR Specifies the number of threads spawned to search files. By @@ -658,8 +664,8 @@ pattern matching. .TP \fB\-p\fR, \fB\-\-no\-dereference\fR -If \fB\-R\fR or \fB\-r\fR is specified, no symbolic links are followed, even when -they are specified on the command line. +If \fB\-R\fR or \fB\-r\fR is specified, do not follow symbolic links, even when +symbolic links are specified on the command line. .TP \fB\-\-pager\fR[=\fICOMMAND\fR] When output is sent to the terminal, uses COMMAND to page through @@ -693,15 +699,14 @@ has been found. .TP \fB\-R\fR, \fB\-\-dereference\-recursive\fR -Recursively read all files under each directory. Follow all -symbolic links to directories, unlike \fB\-r\fR. See also option \fB\-\-sort\fR. +Recursively read all files under each directory. Follow symbolic +links to files and directories, unlike \fB\-r\fR. .TP \fB\-r\fR, \fB\-\-recursive\fR Recursively read all files under each directory, following symbolic -links to files but not to directories. Note that when no FILE +links only if they are on the command line. Note that when no FILE arguments are specified and input is read from a terminal, -recursive searches are performed as if \fB\-r\fR is specified. See also -option \fB\-\-sort\fR. +recursive searches are performed as if \fB\-r\fR is specified. .TP \fB\-\-replace\fR=\fIFORMAT\fR Replace matching patterns in the output by the specified FORMAT @@ -710,9 +715,9 @@ outputs `%' and `%~' outputs a newline. See option \fB\-\-format\fR, `ugrep \fB\-\-help\fR format' and `man ugrep' section FORMAT for details. .TP -\fB\-S\fR, \fB\-\-dereference\fR -If \fB\-r\fR is specified, all symbolic links are followed, like \fB\-R\fR. The -default is not to follow symbolic links to directories. +\fB\-S\fR, \fB\-\-dereference\-files\fR +When \fB\-r\fR is specified, follow symbolic links to files, but not to +directories. The default is not to follow symbolic links. .TP \fB\-s\fR, \fB\-\-no\-messages\fR Silent mode: nonexistent and unreadable files are ignored, i.e. @@ -788,9 +793,9 @@ options \fB\-c\fR, \fB\-l\fR or \fB\-L\fR are used. This option is enabled by \fB\-\-pretty\fR when the output is sent to a terminal. .TP -\fB\-U\fR, \fB\-\-binary\fR -Disables Unicode matching for binary file matching, forcing PATTERN -to match bytes, not Unicode characters. For example, \fB\-U\fR '\\xa3' +\fB\-U\fR, \fB\-\-ascii\fR, \fB\-\-binary\fR +Disables Unicode matching for ASCII and binary matching. PATTERN +matches bytes, not Unicode characters. For example, \fB\-U\fR '\\xa3' matches byte A3 (hex) instead of the Unicode code point U+00A3 represented by the UTF\-8 sequence C2 A3. See also option \fB\-\-dotall\fR. .TP diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/src/flag.hpp new/ugrep-3.12.6/src/flag.hpp --- old/ugrep-3.12.5/src/flag.hpp 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/src/flag.hpp 2023-08-06 22:29:36.000000000 +0200 @@ -81,6 +81,7 @@ extern bool flag_csv; extern bool flag_decompress; extern bool flag_dereference; +extern bool flag_dereference_files; extern bool flag_files; extern bool flag_files_with_matches; extern bool flag_files_without_match; diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/src/stats.cpp new/ugrep-3.12.6/src/stats.cpp --- old/ugrep-3.12.5/src/stats.cpp 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/src/stats.cpp 2023-08-06 22:29:36.000000000 +0200 @@ -76,13 +76,13 @@ { fprintf(output, "Detected outdated or missing index files, run ugrep-indexer to re-index:\n"); if (changed > 1) - fprintf(output, " %zu files were changed after indexing and searched\n", changed); + fprintf(output, " searched %zu changed files\n", changed); else if (changed == 1) - fprintf(output, " 1 file was changed after indexing and searched\n"); + fprintf(output, " searched 1 changed file\n"); if (added > 1) - fprintf(output, " %zu new files are not indexed and searched\n", added); + fprintf(output, " searched %zu new files\n", added); else if (added == 1) - fprintf(output, " 1 new file is not indexed and searched\n"); + fprintf(output, " searched 1 new file\n"); } } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/src/ugrep.cpp new/ugrep-3.12.6/src/ugrep.cpp --- old/ugrep-3.12.5/src/ugrep.cpp 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/src/ugrep.cpp 2023-08-06 22:29:36.000000000 +0200 @@ -319,6 +319,7 @@ bool flag_csv = false; bool flag_decompress = false; bool flag_dereference = false; +bool flag_dereference_files = false; bool flag_files = false; bool flag_files_with_matches = false; bool flag_files_without_match = false; @@ -1865,7 +1866,221 @@ size_t slot; }; +#ifdef WITH_LOCK_FREE_JOB_QUEUE + + // a lock-free job queue for one producer and one consumer with a bounded circular buffer + struct JobQueue { + + JobQueue() + : + head(ring), + tail(ring), + todo(0) + { } + + bool empty() const + { + return head.load() == tail.load(); + } + + // add a sentinel NONE job to the queue + void enqueue() + { + enqueue("", Entry::UNDEFINED_COST, Job::NONE); + } + + // add a job to the queue + void enqueue(const char *pathname, uint16_t cost, size_t slot) + { + Job *job = tail.load(); + Job *next = job + 1; + if (next == &ring[MAX_JOB_QUEUE_SIZE]) + next = ring; + + while (next == head.load()) + { + // we must lock and wait until the buffer is not full + std::unique_lock<std::mutex> lock(queue_mutex); + queue_full.wait(lock); + } + + job->pathname.assign(pathname); + job->cost = cost; + job->slot = slot; + tail.store(next); + ++todo; + queue_data.notify_one(); + } + + // try to add a job to the queue if the queue is not too large + bool try_enqueue(const char *pathname, uint16_t cost, size_t slot) + { + Job *job = tail.load(); + Job *next = job + 1; + if (next == &ring[MAX_JOB_QUEUE_SIZE]) + next = ring; + + if (next == head.load()) + return false; + + job->pathname.assign(pathname); + job->cost = cost; + job->slot = slot; + tail.store(next); + ++todo; + queue_data.notify_one(); + + return true; + } + + // pop a job + void dequeue(Job& job) + { + while (empty()) + { + // we must lock and wait until the buffer is not empty + std::unique_lock<std::mutex> lock(queue_mutex); + queue_data.wait(lock); + } + + Job *next = head.load() + 1; + if (next == &ring[MAX_JOB_QUEUE_SIZE]) + next = ring; + + job = *head.load(); + head.store(next); + --todo; + queue_full.notify_one(); + } + + Job ring[MAX_JOB_QUEUE_SIZE]; + std::atomic<Job*> head; + std::atomic<Job*> tail; + std::mutex queue_mutex; // job queue mutex used when queue is empty or full + std::condition_variable queue_data; // cv to control the job queue + std::condition_variable queue_full; // cv to control the job queue + std::atomic_size_t todo; // number of jobs in the queue + }; + +#else + + // a job queue + struct JobQueue : public std::deque<Job> { + + JobQueue() + : + todo(0) + { } + + // add a sentinel NONE job to the queue + void enqueue() + { + std::unique_lock<std::mutex> lock(queue_mutex); + + emplace_back(); + ++todo; + + queue_work.notify_one(); + } + + // add a job to the queue + void enqueue(const char *pathname, uint16_t cost, size_t slot) + { + std::unique_lock<std::mutex> lock(queue_mutex); + + emplace_back(pathname, cost, slot); + ++todo; + + queue_work.notify_one(); + } + + // try to add a job to the queue if the queue is not too large + bool try_enqueue(const char *pathname, uint16_t cost, size_t slot) + { + if (todo >= MAX_JOB_QUEUE_SIZE) + return false; + + enqueue(pathname, cost, slot); + + return true; + } + + // pop a job + void dequeue(Job& job) + { + std::unique_lock<std::mutex> lock(queue_mutex); + + while (empty()) + queue_work.wait(lock); + + job = front(); + pop_front(); + --todo; + + // if we popped a Job::NONE sentinel but the queue has some jobs, then move the sentinel to the back of the queue + if (job.none() && !empty()) + { + emplace_back(); + job = front(); + pop_front(); + } + } + + // steal a job from this worker, if at least --min-steal jobs to do, returns true if successful + bool steal_job(Job& job) + { + std::unique_lock<std::mutex> lock(queue_mutex); + + if (empty()) + return false; + + job = front(); + + // we cannot steal a Job::NONE sentinel + if (job.none()) + return false; + + pop_front(); + --todo; + + return true; + } + + // move a stolen job to this worker, maintaining job slot order + void move_job(Job& job) + { + std::unique_lock<std::mutex> lock(queue_mutex); + + bool inserted = false; + + // insert job in the queue to maintain job order + for (auto j = begin(); j != end(); ++j) + { + if (j->slot > job.slot) + { + insert(j, std::move(job)); + inserted = true; + break; + } + } + + if (!inserted) + emplace_back(std::move(job)); + + ++todo; + + queue_work.notify_one(); + } + + std::mutex queue_mutex; // job queue mutex + std::condition_variable queue_work; // cv to control the job queue + std::atomic_size_t todo; // number of jobs in the queue, atomic for job stealing + }; + +#endif + #ifndef OS_WIN + // extend the reflex::Input::Handler to handle stdin from a TTY or from a slow pipe struct StdInHandler : public reflex::Input::Handler { @@ -1906,6 +2121,7 @@ return 1; } }; + #endif // extend the reflex::AbstractMatcher::Handler with a grep object reference and references to some of the grep::search locals @@ -3536,7 +3752,7 @@ struct GrepWorker; -// master submits jobs to workers and implements operations to support lock-free job stealing +// master submits jobs to workers and implements operations to support job stealing struct GrepMaster : public Grep { GrepMaster(FILE *file, reflex::AbstractMatcher *matcher, Matchers *matchers) @@ -3608,7 +3824,7 @@ // submit a job with a pathname to a worker, workers are visited round-robin void submit(const char *pathname, uint16_t cost); - // lock-free job stealing on behalf of a worker from a co-worker with at least --min-steal jobs still to do + // job stealing on behalf of a worker from a co-worker with at least --min-steal jobs still to do bool steal(GrepWorker *worker); std::list<GrepWorker> workers; // workers running threads @@ -3623,8 +3839,7 @@ GrepWorker(FILE *file, GrepMaster *master) : Grep(file, master->matcher_clone(), master->matchers_clone()), - master(master), - todo(0) + master(master) { // all workers synchronize their output on the master's sync object out.sync_on(&master->sync); @@ -3649,101 +3864,25 @@ // submit Job::NONE sentinel to this worker void submit_job() { - while (todo >= MAX_JOB_QUEUE_SIZE && !out.eof && !out.cancelled()) - std::this_thread::sleep_for(std::chrono::milliseconds(100)); // give the worker threads some slack - - std::unique_lock<std::mutex> lock(queue_mutex); - - jobs.emplace_back(); - ++todo; - - queue_work.notify_one(); + jobs.enqueue(); } // submit a job to this worker void submit_job(const char *pathname, uint16_t cost, size_t slot) { - while (todo >= MAX_JOB_QUEUE_SIZE && !out.eof && !out.cancelled()) - std::this_thread::sleep_for(std::chrono::milliseconds(100)); // give the worker threads some slack - - std::unique_lock<std::mutex> lock(queue_mutex); - - jobs.emplace_back(pathname, cost, slot); - ++todo; - - queue_work.notify_one(); + jobs.enqueue(pathname, cost, slot); } - // move a stolen job to this worker, maintaining job slot order - void move_job(Job& job) + // submit a job to this worker + bool try_submit_job(const char *pathname, uint16_t cost, size_t slot) { - std::unique_lock<std::mutex> lock(queue_mutex); - - bool inserted = false; - - // insert job in the queue to maintain job order - for (auto j = jobs.begin(); j != jobs.end(); ++j) - { - if (j->slot > job.slot) - { - jobs.insert(j, std::move(job)); - inserted = true; - break; - } - } - - if (!inserted) - jobs.emplace_back(std::move(job)); - - ++todo; - - queue_work.notify_one(); + return jobs.try_enqueue(pathname, cost, slot); } // receive a job for this worker, wait until one arrives void next_job(Job& job) { - std::unique_lock<std::mutex> lock(queue_mutex); - - while (jobs.empty()) - queue_work.wait(lock); - - job = jobs.front(); - - jobs.pop_front(); - --todo; - - // if we popped a Job::NONE sentinel but the queue has some jobs, then move the sentinel to the back of the queue - if (job.none() && !jobs.empty()) - { - jobs.emplace_back(); - job = jobs.front(); - jobs.pop_front(); - } - } - - // steal a job from this worker, if at least --min-steal jobs to do, returns true if successful - bool steal_job(Job& job) - { - // not enough jobs in the queue to steal from - if (todo < flag_min_steal) - return false; - - std::unique_lock<std::mutex> lock(queue_mutex); - - if (jobs.empty()) - return false; - - job = jobs.front(); - - // we cannot steal a Job::NONE sentinel - if (job.none()) - return false; - - jobs.pop_front(); - --todo; - - return true; + jobs.dequeue(job); } // submit Job::NONE sentinel to stop this worker @@ -3754,11 +3893,7 @@ std::thread thread; // thread of this worker, spawns GrepWorker::execute() GrepMaster *master; // the master of this worker - std::mutex queue_mutex; // job queue mutex - std::condition_variable queue_work; // cv to control the job queue - std::deque<Job> jobs; // queue of pending jobs submitted to this worker - std::atomic_size_t todo; // number of jobs in the queue, atomic for lock-free job stealing - + JobQueue jobs; // queue of pending jobs submitted to this worker }; // start worker threads @@ -3798,7 +3933,34 @@ // submit a job with a pathname to a worker, workers are visited round-robin void GrepMaster::submit(const char *pathname, uint16_t cost) { - iworker->submit_job(pathname, cost, sync.next++); + size_t min_todo = iworker->jobs.todo; + + // if this worker has some jobs that can't be stolen, then find a worker with the minimum number of jobs + if (min_todo > flag_min_steal) + { + auto min_worker = iworker; + + for (size_t num = 0; num < threads; ++num) + { + if (iworker->jobs.todo < min_todo) + { + min_todo = iworker->jobs.todo; + min_worker = iworker; + } + + ++iworker; + if (iworker == workers.end()) + iworker = workers.begin(); + } + + iworker = min_worker; + } + + // give the worker threads some slack + while (!iworker->try_submit_job(pathname, cost, sync.next) && !out.eof && !out.cancelled()) + std::this_thread::sleep_for(std::chrono::milliseconds(10)); + + ++sync.next; // around we go ++iworker; @@ -3806,48 +3968,52 @@ iworker = workers.begin(); } -// lock-free job stealing on behalf of a worker from a co-worker with at least --min-steal jobs still to do +#ifndef WITH_LOCK_FREE_JOB_QUEUE + +// job stealing on behalf of a worker from a co-worker with at least --min-steal jobs still to do bool GrepMaster::steal(GrepWorker *worker) { - // pick a random co-worker using thread-safe std::chrono::high_resolution_clock as a simple RNG - size_t n = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count() % threads; - auto iworker = workers.begin(); - - while (n > 0) - { - ++iworker; - --n; - } - - // try to steal a job from the random co-worker or the next co-workers + // try to steal a job from a co-worker with the most jobs + auto coworker = workers.begin(); + auto max_worker = coworker; + size_t max_todo = 0; + for (size_t i = 0; i < threads; ++i) { + if (&*coworker != worker && coworker->jobs.todo > max_todo) + { + max_todo = coworker->jobs.todo; + max_worker = coworker; + } + // around we go - if (iworker == workers.end()) - iworker = workers.begin(); + ++coworker; + if (coworker == workers.end()) + coworker = workers.begin(); + } - // if co-worker isn't this worker (no self-stealing!) - if (&*iworker != worker) - { - Job job; + // not enough jobs in the co-worker's queue to steal from + if (max_todo < flag_min_steal) + return false; - // if co-worker has at least --min-steal jobs then steal one for this worker - if (iworker->steal_job(job)) - { - worker->move_job(job); + coworker = max_worker; - return true; - } - } + Job job; - // try next co-worker - ++iworker; + // steal a job for this worker + if (coworker->jobs.steal_job(job)) + { + worker->jobs.move_job(job); + + return true; } // couldn't steal any job return false; } +#endif + // execute worker thread void GrepWorker::execute() { @@ -3871,9 +4037,11 @@ // end output in ORDERED mode (--sort) for this job slot out.end(); +#ifndef WITH_LOCK_FREE_JOB_QUEUE // if only one job is left to do or nothing to do, then try stealing another job from a co-worker - if (todo <= 1) + if (jobs.todo <= 1) master->steal(this); +#endif } } @@ -4342,6 +4510,8 @@ fprintf(file, "# Maximum decompression and de-archiving nesting levels, default: zmax=1\nzmax=%zu\n\n", flag_zmax); if (flag_dereference) fprintf(file, "# Dereference symlinks, default: no-dereference\ndereference\n\n"); + else if (flag_dereference_files) + fprintf(file, "# Dereference symlinks to files, not directories, default: no-dereference-files\ndereference-files\n\n"); if (flag_devices != NULL) fprintf(file, "# Search devices, default: devices=skip\ndevices=%s\n\n", flag_devices); if (flag_max_depth > 0) @@ -4435,10 +4605,12 @@ option_andnot(pattern_args, arg + 7); else if (strcmp(arg, "any-line") == 0) flag_any_line = true; + else if (strcmp(arg, "ascii") == 0) + flag_binary = true; else if (strcmp(arg, "after-context") == 0) usage("missing argument for --", arg); else - usage("invalid option --", arg, "--after-context, --and, --andnot or --any-line"); + usage("invalid option --", arg, "--after-context, --and, --andnot, --any-line or --ascii"); break; case 'b': @@ -4504,6 +4676,8 @@ strtopos2(arg + 6, flag_min_depth, flag_max_depth, "invalid argument --depth="); else if (strcmp(arg, "dereference") == 0) flag_dereference = true; + else if (strcmp(arg, "dereference-files") == 0) + flag_dereference_files = true; else if (strcmp(arg, "dereference-recursive") == 0) flag_directories = "dereference-recurse"; else if (strncmp(arg, "devices=", 8) == 0) @@ -4517,7 +4691,7 @@ strcmp(arg, "directories") == 0) usage("missing argument for --", arg); else - usage("invalid option --", arg, "--decompress, --depth, --dereference, --dereference-recursive, --devices, --directories or --dotall"); + usage("invalid option --", arg, "--decompress, --depth, --dereference, --dereference-files, --dereference-recursive, --devices, --directories or --dotall"); break; case 'e': @@ -4762,6 +4936,8 @@ flag_decompress = false; else if (strcmp(arg, "no-dereference") == 0) flag_no_dereference = true; + else if (strcmp(arg, "no-dereference-files") == 0) + flag_dereference_files = false; else if (strcmp(arg, "no-dotall") == 0) flag_dotall = false; else if (strcmp(arg, "no-empty") == 0) @@ -4821,7 +4997,7 @@ else if (strcmp(arg, "neg-regexp") == 0) usage("missing argument for --", arg); else - usage("invalid option --", arg, "--neg-regexp, --not, --no-any-line, --no-binary, --no-bool, --no-break, --no-byte-offset, --no-color, --no-confirm, --no-decompress, --no-dereference, --no-dotall, --no-empty, --no-filename, --no-filter, --glob-no-ignore-case, --no-group-separator, --no-heading, --no-hidden, --no-hyperlink, --no-ignore-binary, --no-ignore-case, --no-ignore-files --no-initial-tab, --no-invert-match, --no-line-number, --no-only-line-number, --no-only-matching, --no-messages, --no-mmap, --no-pager, --no-pretty, --no-smart-case, --no-sort, --no-stats, --no-tree, --no-ungroup, --no-view or --null"); + usage("invalid option --", arg, "--neg-regexp, --not, --no-any-line, --no-binary, --no-bool, --no-break, --no-byte-offset, --no-color, --no-confirm, --no-decompress, --no-dereference, --no-dereference-files, --no-dotall, --no-empty, --no-filename, --no-filter, --glob-no-ignore-case, --no-group-separator, --no-heading, --no-hidden, --no-hyperlink, --no-ignore-binary, --no-ignore-case, --no-ignore-files --no-initial-tab, --no-invert-match, --no-line-number, --no-only-line-number, --no-only-matching, --no-messages, --no-mmap, --no-pager, --no-pretty, --no-smart-case, --no-sort, --no-stats, --no-tree, --no-ungroup, --no-view or --null"); break; case 'o': @@ -5226,7 +5402,7 @@ break; case 'S': - flag_dereference = true; + flag_dereference_files = true; break; case 's': @@ -5718,7 +5894,7 @@ if (encoding_table[i].format == NULL) { - std::string msg = "invalid argument --encoding=ENCODING, valid arguments are"; + std::string msg("invalid argument --encoding=ENCODING, valid arguments are"); for (int i = 0; encoding_table[i].format != NULL; ++i) msg.append(" '").append(encoding_table[i].format).append("',"); @@ -6247,7 +6423,7 @@ if (type_table[i].type == NULL) { - std::string msg = "invalid argument -t TYPES, valid arguments are"; + std::string msg("invalid argument -t TYPES, valid arguments are"); for (int i = 0; type_table[i].type != NULL; ++i) msg.append(" '").append(type_table[i].type).append("',"); @@ -7317,7 +7493,7 @@ // -p (--no-dereference) and -S (--dereference): -p takes priority over -S and -R if (flag_no_dereference) - flag_dereference = false; + flag_dereference = flag_dereference_files = false; // display file name if more than one input file is specified or options -R, -r, and option -h --no-filename is not specified if (!flag_no_filename && (flag_all_threads || flag_directories_action == Action::RECURSE || arg_files.size() > 1 || (flag_stdin && !arg_files.empty()))) @@ -7371,8 +7547,11 @@ { unsigned int cores = std::thread::hardware_concurrency(); unsigned int concurrency = cores > 2 ? cores : 2; - // reduce concurrency by one for 8+ core CPUs - concurrency -= concurrency / 9; + // reduce concurrency by a few for 9+ core CPUs + if (concurrency >= 10) + concurrency -= concurrency / 5; + else + concurrency -= concurrency / 9; flag_jobs = std::min(concurrency, MAX_JOBS); } @@ -8081,8 +8260,8 @@ // check if directory if (type == DIRENT_TYPE_DIR || ((type == DIRENT_TYPE_UNKNOWN || type == DIRENT_TYPE_LNK) && S_ISDIR(buf.st_mode))) { - // if symlinked directory, then follow into directory? - if (follow || !symlink) + // if symlinked directory, then follow only if -R is specified or if FILE is a command line argument + if (!symlink || follow) { if (flag_directories_action == Action::READ) { @@ -8162,8 +8341,8 @@ } else if (type == DIRENT_TYPE_REG ? !is_output(inode) : (type == DIRENT_TYPE_UNKNOWN || type == DIRENT_TYPE_LNK) && S_ISREG(buf.st_mode) ? !is_output(buf.st_ino) : flag_devices_action == Action::READ) { - // if not -p or if follow or if not symlinked then search file - if (!flag_no_dereference || follow || !symlink) + // if symlinked files, then follow only if -R or -S is specified or if FILE is a command line argument + if (!symlink || follow || flag_dereference_files) { // --depth: recursion level not deep enough? if (flag_min_depth > 0 && level <= flag_min_depth) @@ -8351,7 +8530,8 @@ #endif // --ignore-files: check if one or more are present to read and extend the file and dir exclusions - size_t saved_all_exclude_size, saved_all_exclude_dir_size; + size_t saved_all_exclude_size = 0; + size_t saved_all_exclude_dir_size = 0; bool saved = false; if (!flag_ignore_files.empty()) @@ -12488,7 +12668,13 @@ "\ --index\n\ Perform indexing-based search on files indexed with ugrep-indexer.\n\ - Note: a beta release feature.\n\ + Recursive searches are performed by skipping non-matching files.\n\ + Binary files are skipped with option -I. Note that the start-up\n\ + time to search is increased, which may be significant when complex\n\ + search patterns are specified that contain large Unicode character\n\ + classes with `*' or `+' repeats, which should be avoided. Option\n\ + -U (--ascii) improves performance. Option --stats=vm displays a\n\ + detailed indexing-based search report. This is a beta feature.\n\ -J NUM, --jobs=NUM\n\ Specifies the number of threads spawned to search files. By\n\ default an optimum number of threads is spawned to search files\n\ @@ -12611,8 +12797,8 @@ Note that Perl pattern matching differs from the default grep POSIX\n\ pattern matching.\n\ -p, --no-dereference\n\ - If -R or -r is specified, no symbolic links are followed, even when\n\ - they are specified on the command line.\n\ + If -R or -r is specified, do not follow symbolic links, even when\n\ + symbolic links are specified on the command line.\n\ --pager[=COMMAND]\n\ When output is sent to the terminal, uses COMMAND to page through\n\ the output. The default COMMAND is `" DEFAULT_PAGER_COMMAND "'. Enables --heading\n\ @@ -12641,23 +12827,22 @@ Quiet mode: suppress all output. Only search a file until a match\n\ has been found.\n\ -R, --dereference-recursive\n\ - Recursively read all files under each directory. Follow all\n\ - symbolic links to directories, unlike -r. See also option --sort.\n\ + Recursively read all files under each directory. Follow symbolic\n\ + links to files and directories, unlike -r.\n\ -r, --recursive\n\ Recursively read all files under each directory, following symbolic\n\ - links to files but not to directories. Note that when no FILE\n\ + links only if they are on the command line. Note that when no FILE\n\ arguments are specified and input is read from a terminal,\n\ - recursive searches are performed as if -r is specified. See also\n\ - option --sort.\n\ + recursive searches are performed as if -r is specified.\n\ --replace=FORMAT\n\ Replace matching patterns in the output by the specified FORMAT\n\ with `%' fields. If -P is specified, FORMAT may include `%1' to\n\ `%9', `%[NUM]#' and `%[NAME]#' to output group captures. A `%%'\n\ outputs `%' and `%~' outputs a newline. See option --format,\n\ `ugrep --help format' and `man ugrep' section FORMAT for details.\n\ - -S, --dereference\n\ - If -r is specified, all symbolic links are followed, like -R. The\n\ - default is not to follow symbolic links to directories.\n\ + -S, --dereference-files\n\ + When -r is specified, follow symbolic links to files, but not to\n\ + directories. The default is not to follow symbolic links.\n\ -s, --no-messages\n\ Silent mode: nonexistent and unreadable files are ignored, i.e.\n\ their error messages and warnings are suppressed.\n\ @@ -12710,9 +12895,9 @@ Output directories with matching files in a tree-like format when\n\ options -c, -l or -L are used. This option is enabled by --pretty\n\ when the output is sent to a terminal.\n\ - -U, --binary\n\ - Disables Unicode matching for binary file matching, forcing PATTERN\n\ - to match bytes, not Unicode characters. For example, -U '\\xa3'\n\ + -U, --ascii, --binary\n\ + Disables Unicode matching for ASCII and binary matching. PATTERN\n\ + matches bytes, not Unicode characters. For example, -U '\\xa3'\n\ matches byte A3 (hex) instead of the Unicode code point U+00A3\n\ represented by the UTF-8 sequence C2 A3. See also option --dotall.\n\ -u, --ungroup\n\ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/src/ugrep.hpp new/ugrep-3.12.6/src/ugrep.hpp --- old/ugrep-3.12.5/src/ugrep.hpp 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/src/ugrep.hpp 2023-08-06 22:29:36.000000000 +0200 @@ -38,7 +38,7 @@ #define UGREP_HPP // ugrep version -#define UGREP_VERSION "3.12.5" +#define UGREP_VERSION "3.12.6" // disable mmap because mmap is almost always slower than the file reading speed improvements since 3.0.0 #define WITH_NO_MMAP @@ -46,6 +46,9 @@ // use a task-parallel thread to decompress the stream into a pipe to search, handles archives and increases decompression speed for larger files #define WITH_DECOMPRESSION_THREAD +// use a lock-free job queue, which is appears to be SLOWER than a standard simple lock-based queue for each worker +// #define WITH_LOCK_FREE_JOB_QUEUE + // drain stdin until eof to prevent broken pipe signal // #define WITH_STDIN_DRAIN diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/tests/out/dir-1.out new/ugrep-3.12.6/tests/out/dir-1.out --- old/ugrep-3.12.5/tests/out/dir-1.out 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/tests/out/dir-1.out 2023-08-06 22:29:36.000000000 +0200 @@ -1,4 +1,3 @@ [1;35mdir1/Hello.bat[m -[1;35mdir1/Hello.java[m [1;35mdir1/Hello.sh[m [1;35mdir1/makefile[m diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/tests/out/dir-2.out new/ugrep-3.12.6/tests/out/dir-2.out --- old/ugrep-3.12.5/tests/out/dir-2.out 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/tests/out/dir-2.out 2023-08-06 22:29:36.000000000 +0200 @@ -1,4 +1,3 @@ [1;35mdir1/Hello.bat[m -[1;35mdir1/Hello.java[m [1;35mdir1/Hello.sh[m [1;35mdir1/makefile[m diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/ugrep-3.12.5/tests/out/dir.out new/ugrep-3.12.6/tests/out/dir.out --- old/ugrep-3.12.5/tests/out/dir.out 2023-08-04 19:19:01.000000000 +0200 +++ new/ugrep-3.12.6/tests/out/dir.out 2023-08-06 22:29:36.000000000 +0200 @@ -1,4 +1,3 @@ [1;35mdir1/Hello.bat[m -[1;35mdir1/Hello.java[m [1;35mdir1/Hello.sh[m [1;35mdir1/makefile[m
