[PATCH] rebase: make resolve message clearer for inexperienced users

2017-07-16 Thread William Duclot
The git UI can be improved by addressing the error messages to those
they help: inexperienced and casual git users. To this intent, it is
helpful to make sure the terms used in those messages can be understood
by this segment of users, and that they guide them to resolve the
problem.

In particular, failure to apply a patch during a git rebase is a common
problem that can be very destabilizing for the inexperienced user. It is
important to lead them toward the resolution of the conflict (which is a
3-steps process, thus complex) and reassure them that they can escape a
situation they can't handle with "--abort". This commit answer those two
points by detailling the resolution process and by avoiding cryptic git
linguo.

Signed-off-by: William Duclot <william.duc...@gmail.com>
---
Those new messages have been written thanks to Junio feedbacks and users
tests with a few inexperienced git users.

 git-rebase.sh   | 7 ---
 t/t5520-pull.sh | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 2cf73b88e..ef3409679 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -55,9 +55,10 @@ LF='
 '
 ok_to_skip_pre_rebase=
 resolvemsg="
-$(gettext 'When you have resolved this problem, run "git rebase --continue".
-If you prefer to skip this patch, run "git rebase --skip" instead.
-To check out the original branch and stop rebasing, run "git rebase --abort".')
+$(gettext 'Resolve all conflicts manually, mark them as resolved with
+"git add/rm ", then run "git rebase --continue".
+You can instead skip this commit: run "git rebase --skip".
+To abort and get back to the state before "git rebase -i", run "git rebase 
--abort".')
 "
 unset onto
 unset restrict_revision
diff --git a/t/t5520-pull.sh b/t/t5520-pull.sh
index f15f7a332..59c4b778d 100755
--- a/t/t5520-pull.sh
+++ b/t/t5520-pull.sh
@@ -305,7 +305,7 @@ test_expect_success '--rebase with conflicts shows advice' '
test_tick &&
git commit -m "Create conflict" seq.txt &&
test_must_fail git pull --rebase . seq 2>err >out &&
-   test_i18ngrep "When you have resolved this problem" out
+   test_i18ngrep "Resolve all conflicts manually" out
 '
 
 test_expect_success 'failed --rebase shows advice' '
@@ -319,7 +319,7 @@ test_expect_success 'failed --rebase shows advice' '
git checkout -f -b fails-to-rebase HEAD^ &&
test_commit v2-without-cr file "2" file2-lf &&
test_must_fail git pull --rebase . diverging 2>err >out &&
-   test_i18ngrep "When you have resolved this problem" out
+   test_i18ngrep "Resolve all conflicts manually" out
 '
 
 test_expect_success '--rebase fails with multiple branches' '
-- 
2.13.0



Re: [PATCH/RFC] rebase: make resolve message clearer for inexperienced users

2017-07-10 Thread William Duclot
Junio C Hamano writes:
> William Duclot <william.duc...@gmail.com> writes:
> 
> > diff --git a/git-rebase.sh b/git-rebase.sh
> > index 2cf73b88e..50457f687 100755
> > --- a/git-rebase.sh
> > +++ b/git-rebase.sh
> > @@ -55,9 +55,10 @@ LF='
> >  '
> >  ok_to_skip_pre_rebase=
> >  resolvemsg="
> > -$(gettext 'When you have resolved this problem, run "git rebase 
> > --continue".
> > -If you prefer to skip this patch, run "git rebase --skip" instead.
> > -To check out the original branch and stop rebasing, run "git rebase 
> > --abort".')
> > +$(gettext 'Resolve this conflict manually, mark it as resolved with "git 
> > add ",
> > +then run "git rebase --continue".
> > +You can instead skip this commit: run "git rebase --skip".
> > +To stop the whole rebasing and get back to your pre-rebase state, run "git 
> > rebase --abort".')
> >  "
> 
> I find the updated one easier to follow in general.
> Disecting the phrases in the above:
> 
>  - The original said "When you have resolved this problem", without
>giving a guidance how to resolve, and without saying what the
>problem is.  The updated one says "conflict" to clarify the
>"problem", and suggests "git add" as the tool to use after a
>manual resolition.  
> 
>Modulo that there are cases where "git rm" is the right tool, the
>updated one is strict improvement.

I also wrote "" when there could be several. Maybe
'mark it as resolved with "git add/rm"' would be a better (and shorter)
formulation?

>  - The original said "to check out the original branch and stop
>rebasing", and the updated one says "to stop and get back to",
>which is in a more logical order.  
> 
>"the whole rebasing" used as a noun feels something is missing
>there, though.  I wonder if "To get back to the state before you
>started 'rebase -i', run 'git rebase --abort'" is sufficient,
>without saying anything further about abandoning the rebase in
>progress (i.e. "and stop rebasing" or "stop the whole rebasing").

Definitely seems clearer to me: straight to the point.

> Thanks.

Happy to see this patch seems interesting to you. I feel like a lot of
git messages could be improved this way to offer a UI more welcoming to
inexperienced user (which is a *broad* segment of users). But I am not
aware of the cost of translation of this kind of patch: would several
patches like this one be welcomed?


[PATCH/RFC] rebase: make resolve message clearer for inexperienced users

2017-07-09 Thread William Duclot
The git UI can be improved by addressing the error messages to those
they help: inexperienced and casual git users. To this intent, it is
helpful to make sure the terms used in those messages can be understood
by this segment of users, and that they guide them to resolve the
problem.

In particular, failure to apply a patch during a git rebase is a common
problem that can be very destabilizing for the inexperienced user. It is
important to lead them toward the resolution of the conflict (which is a
3-steps process, thus complex) and reassure them that they can escape a
situation they can't handle with "--abort". This commit answer those two
points by detailling the resolution process and by avoiding cryptic git
linguo.

Signed-off-by: William Duclot <william.duc...@gmail.com>
---
While I do not expect that this V1 wording will be to the liking of
everyone, I think (know?) that the heart of this patch isn't something
that I'm the only one bothered with :) I'd very much like to hear your
opinions about it

 git-rebase.sh | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index 2cf73b88e..50457f687 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -55,9 +55,10 @@ LF='
 '
 ok_to_skip_pre_rebase=
 resolvemsg="
-$(gettext 'When you have resolved this problem, run "git rebase --continue".
-If you prefer to skip this patch, run "git rebase --skip" instead.
-To check out the original branch and stop rebasing, run "git rebase --abort".')
+$(gettext 'Resolve this conflict manually, mark it as resolved with "git add 
",
+then run "git rebase --continue".
+You can instead skip this commit: run "git rebase --skip".
+To stop the whole rebasing and get back to your pre-rebase state, run "git 
rebase --abort".')
 "
 unset onto
 unset restrict_revision
-- 
2.13.0



Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory

2016-06-07 Thread William Duclot
On Mon, Jun 06, 2016 at 04:24:53PM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
>>> I think that call should reset line.buf to the original buffer on
>>> the stack, instead of saying "Ok, I'll ignore the original memory
>>> not owned by us and instead keep pointing at the allocated memory",
>>> as the allocation was done as a fallback measure.
>>
>> I am not sure I agree. Do we think accessing the stack buffer is somehow
>> cheaper than the heap buffer (perhaps because of cache effects)? If so,
>> how much cheaper?
> 
> This is not about stack vs heap or even "cheaper" (whatever your
> definition of cheap is).  The principle applies equally if the
> original buffer came from BSS.
> 
> Perhaps I made it clearer by using a more exaggerated example e.g. a
> typical expected buffer size of 128 bytes, but the third line of
> 1000 line input file was an anomaly that is 200k bytes long.  I do
> not want to keep that 200k bytes after finishing to process that
> third line while using its initial 80 bytes for the remaining 977
> lines.

"its initial 128 bytes", rather than "its initial 80 bytes", no?
Or else I'm lost :)
 
> By the way, William seemed to be unhappy with die(), but I actually
> think having a die() in the API may not be a bad thing if the check
> were about something more sensible.  For example, even if a strbuf
> that can grow dynamically, capping the maximum size and say "Hey
> this is a one-lne-at-a-timve text interface; if we need to grow the
> buffer to 10MB, there is something wrong and a producer of such an
> input does not even deserve a nice error message" could be an
> entirely sensible attitude.  But that does not mean an initial
> allocation should be 10MB.  If the expected typical workload fits
> within a lot lower bound, starting from there and allowing it to
> grow up to that maximum would be the more natural thing to do.
> 
> And the problem I have with the proposed "fixed" is that it does not
> allow us to do that.

The "fixed" feature was aimed to allow the users to use strbuf with
strings that they doesn't own themselves (a function parameter for
example). From Michael example in the original mail:

void f(char *path_buf, size_t path_buf_len)
{
struct strbuf path;
strbuf_wrap_fixed(, path_buf,
strlen(path_buf),
path_buf_len);
...
/*
 * no strbuf_release() required here, but if called it
 * is a NOOP
 */
}

I don't have enough knowledge of the codebase to judge if this is
useful, you seem to think it's not.

About this capping, I have troubles to understand if this is something
you'd like to see in this patch (assuming I include your changes)? Or is
this theoretical?


To sum up:
* Rename "wrap" to "attach"
* Forget about this "fixed" feature
* Make the strbuf reuse the preallocated buffer after a reset() (and a
detach() probably?)
* Introduce a more practical macro STRBUF_INIT_ON_STACK() (maybe the
name is too technical?)
* A few code corrections
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] userdiff: add built-in pattern for CSS

2016-06-06 Thread William Duclot
On Mon, Jun 06, 2016 at 11:00:38AM -0700, Junio C Hamano wrote:
> William Duclot <william.duc...@ensimag.grenoble-inp.fr> writes:
> 
> > On Fri, Jun 03, 2016 at 08:50:50AM -0700, Junio C Hamano wrote:
> >> William Duclot <william.duc...@ensimag.grenoble-inp.fr> writes:
> >> 
> >> > Here I have to disagree (with you and Junio): the IPATTERN is
> >> > case-insensitive only on the "pattern" regex, not the "word_regex"
> >> > regex.
> >> 
> >> Ahh, OK.  Obviously both of us overlooked that.  Thanks for pushing
> >> back.
> >> 
> >> > On the identifier line, I have "A-F" instead of "A-Z" though
> >> 
> >> Yeah, that does need updating.
> >
> > Note that I sent a V4 :)
> 
> Yup, thanks.  Isn't that what I queued as 0719f3ee (userdiff: add
> built-in pattern for CSS, 2016-06-03)?

It is, my bad
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/3] strbuf: allow to use preallocated memory

2016-06-06 Thread William Duclot
On Mon, Jun 06, 2016 at 10:19:07AM -0700, Junio C Hamano wrote:
> William Duclot <william.duc...@ensimag.grenoble-inp.fr> writes:
> 
>> +#define MAX_ALLOC(a, b) (((a)>(b))?(a):(b))
> 
> I do not see why this macro is called MAX_ALLOC(); is there anything
> "alloc" specific to what this does?  You may happen to use it only
> for "alloc" related things, but that is not a good reason for such a
> name (if the implementation can only be used for "alloc" specific
> purpose for some reason, then the name is appropriate).

Absolutely right. This macro won't be needed anyway, as you pointed out
in the end of this review
 
>> @@ -20,16 +23,47 @@ char strbuf_slopbuf[1];
>>  
>>  void strbuf_init(struct strbuf *sb, size_t hint)
>>  {
>> +sb->owns_memory = 0;
>> +sb->fixed_memory = 0;
>>  sb->alloc = sb->len = 0;
>>  sb->buf = strbuf_slopbuf;
>>  if (hint)
>>  strbuf_grow(sb, hint);
>>  }
> 
> I'll complain about _grow() again later, but if the implementation
> is updated to address that complaint, slopbuf-using strbuf could
> start it as a special case of "starts as fixed_memory".
> 
>> +static void strbuf_wrap_internal(struct strbuf *sb, char *buf,
>> + size_t buf_len, size_t alloc_len)
>> +{
>> +if (!buf)
>> +die("the buffer of a strbuf cannot be NULL");
>> +
>> +strbuf_release(sb);
>> +sb->len = buf_len;
>> +sb->alloc = alloc_len;
>> +sb->buf = buf;
>> +}
>> +
>> +void strbuf_wrap(struct strbuf *sb, char *buf,
>> + size_t buf_len, size_t alloc_len)
>> +{
>> +strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> +sb->owns_memory = 0;
>> +sb->fixed_memory = 0;
>> +}
>>
>> +void strbuf_wrap_fixed(struct strbuf *sb, char *buf,
>> +   size_t buf_len, size_t alloc_len)
>> +{
>> +strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
>> +sb->owns_memory = 0;
>> +sb->fixed_memory = 1;
>> +}
> 
> After seeing 2/3, I would say that the pretty side deserves the use
> of word "wrap" more.  What the above two does is an enhancement of
> what strbuf API has called "attach".  You are introducing a few new
> and different ways to attach a piece of memory to a strbuf.

The naming is still debatable indeed. If there is a consensus on using
"attach", fine with me.
That being said, I'm still hesitant on being able to initialize a strbuf
with an "attach" function, as the user shouldn't have to use
`strbuf_init()` or `= STRBUF_INIT` beforhand. The users can't use
`strbuf_attach()` with a non-initialized strbuf, is that a good idea to
allow them to use `strbuf_attach_preallocated()` on a non-initialized
buffer?

>> @@ -37,8 +71,13 @@ void strbuf_release(struct strbuf *sb)
>>  char *strbuf_detach(struct strbuf *sb, size_t *sz)
>>  {
>>  char *res;
>> +
>>  strbuf_grow(sb, 0);
>> -res = sb->buf;
>> +if (sb->owns_memory)
>> +res = sb->buf;
>> +else
>> +res = xmemdupz(sb->buf, sb->len);
>> +
>>  if (sz)
>>  *sz = sb->len;
>>  strbuf_init(sb, 0);
> 
> Note that the user of the API does not have to care if the memory is
> held by the strbuf or if the strbuf is merely borrowing it from the
> initial allocation made from elsewhere (e.g. via strbuf_wrap_*) when
> finalizing its use of the strbuf API to build the contents and
> utilize the resulting buffer.  Theoretically, if the caller knows it
> let the piece memory _it_ owns (i.e. !owns_memory) and the piece of
> memory it originally gave the strbuf hasn't been moved (i.e. your
> fixed_memory, which I will complain again soon), the caller
> shouldn't have to get a copy via xmemdupz(), which would both save
> an allocation here and free() in the caller once it is done.
> 
> But that is not done, and as the API it is a good thing.  It frees
> the caller from having to worry about _where_ the memory originally
> came from.
> 
>>  void strbuf_grow(struct strbuf *sb, size_t extra)
>>  {
>> -int new_buf = !sb->alloc;
>>  if (unsigned_add_overflows(extra, 1) ||
>>  unsigned_add_overflows(sb->len, extra + 1))
>>  die("you want to use way too much memory");
>> -if (new_buf)
>> -sb->buf = NULL;
>> -ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>> -if (new_buf)
>> - 

[PATCH V2 3/3] strbuf: allow to use preallocated memory

2016-06-06 Thread William Duclot
When working with strbufs (usually for dates or paths), the
malloc()/free() overhead could be easily avoided: as a sensible initial
buffer size is already known, it could be allocated on the stack. This
could avoid workarounds such as

void f()
{
static struct strbuf path;
strbuf_add(, ...);
...
strbuf_setlen(, 0);
}

which, by using a static variable, avoid most of the malloc()/free()
overhead, but makes the function unsafe to use recursively or from
multiple threads.

THE IDEA


The idea here is to enhance strbuf to allow it to use memory that it
doesn't own (for example, stack-allocated memory), while (optionally)
allowing it to switch over to using allocated memory if the string grows
past the size of the pre-allocated buffer.

API ENHANCEMENT
---

All functions of the API can still be reliably called without
knowledge of the initialization (normal/preallocated/fixed) with the
exception that strbuf_grow() may die() if the string tries to overflow a
fixed buffer.

The API contract is still respected:

- The API users may peek strbuf.buf in-place until they perform an
  operation that makes its size change (at which point the .buf pointer
  may point at a new piece of memory).

- The API users may strbuf_detach() to obtain a piece of memory that
  belongs to them (at which point the strbuf becomes empty), hence
  needs to be freed by the callers.

- The API users letting a strbuf go out of scope without taking
  ownership of the string with strbuf_detach() _must_ release the
  resource by calling strbuf_release().

Full credit to Michael Haggerty for the idea and some of the wording of
this commit (http://mid.gmane.org/53512db6.1070...@alum.mit.edu). The
implementation and bugs are all mine.

Signed-off by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off by: Simon Rabourg <simon.rabo...@ensimag.grenoble-inp.fr>
Signed-off by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
Changes since V1:
* Refactoring: use a `strbuf_wrap_internal()` static function
* Use bit fields for flags
* Completing documentation & commit message
* Rename `strbuf_wrap_preallocated()` to `strbuf_wrap()`
* Introduce two initialization macros for wrapping

 strbuf.c   |  82 ++-
 strbuf.h   |  44 -
 t/helper/test-strbuf.c | 130 -
 t/t0082-strbuf.sh  |  29 +++
 4 files changed, 270 insertions(+), 15 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 1ba600b..689469e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1,6 +1,9 @@
 #include "cache.h"
 #include "refs.h"
 #include "utf8.h"

+#define MAX_ALLOC(a, b) (((a)>(b))?(a):(b))
 
 int starts_with(const char *str, const char *prefix)
 {
@@ -20,16 +23,47 @@ char strbuf_slopbuf[1];
 
 void strbuf_init(struct strbuf *sb, size_t hint)
 {
+   sb->owns_memory = 0;
+   sb->fixed_memory = 0;
sb->alloc = sb->len = 0;
sb->buf = strbuf_slopbuf;
if (hint)
strbuf_grow(sb, hint);
 }
 
+static void strbuf_wrap_internal(struct strbuf *sb, char *buf,
+size_t buf_len, size_t alloc_len)
+{
+   if (!buf)
+   die("the buffer of a strbuf cannot be NULL");
+
+   strbuf_release(sb);
+   sb->len = buf_len;
+   sb->alloc = alloc_len;
+   sb->buf = buf;
+}
+
+void strbuf_wrap(struct strbuf *sb, char *buf,
+size_t buf_len, size_t alloc_len)
+{
+   strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
+   sb->owns_memory = 0;
+   sb->fixed_memory = 0;
+}
+
+void strbuf_wrap_fixed(struct strbuf *sb, char *buf,
+  size_t buf_len, size_t alloc_len)
+{
+   strbuf_wrap_internal(sb, buf, buf_len, alloc_len);
+   sb->owns_memory = 0;
+   sb->fixed_memory = 1;
+}
+
 void strbuf_release(struct strbuf *sb)
 {
if (sb->alloc) {
-   free(sb->buf);
+   if (sb->owns_memory)
+   free(sb->buf);
strbuf_init(sb, 0);
}
 }
@@ -37,8 +71,13 @@ void strbuf_release(struct strbuf *sb)
 char *strbuf_detach(struct strbuf *sb, size_t *sz)
 {
char *res;
+
strbuf_grow(sb, 0);
-   res = sb->buf;
+   if (sb->owns_memory)
+   res = sb->buf;
+   else
+   res = xmemdupz(sb->buf, sb->len);
+
if (sz)
*sz = sb->len;
strbuf_init(sb, 0);
@@ -47,25 +86,44 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz)
 
 void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc)
 {
-   strbuf_release(sb);
-   sb->buf   = buf;
-   sb->len   = len;
-   sb->alloc = alloc;
+   strbuf_wrap_internal(sb, buf, len, alloc);
+   sb->owns_memory =

[PATCH V2 0/3] strbuf: improve API

2016-06-06 Thread William Duclot
This patch series implements an improvment of the strbuf API, allowing
strbuf to use preallocated memory. This makes strbuf fit to be used
in performance-critical operations.

* The first patch is simply a preparatory work, adding tests for
existing strbuf implementation.
* The second patch is also preparatory: rename a function for the third
patch.
* Most of the work is made in the third patch: handle pre-allocated
memory, extend the API, document it and test it.

As said in the third commit message, the idea comes from Michael Haggerty.

William Duclot (3):
  strbuf: add tests
  pretty.c: rename strbuf_wrap() function
  strbuf: allow to use preallocated memory

Makefile   |   1 +
pretty.c   |   7 ---
strbuf.c   |  82 
+++---
strbuf.h   |  44 ++--
t/helper/test-strbuf.c | 229 
+++
t/t0082-strbuf.sh  |  48 
6 files changed, 394 insertions(+), 17 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 1/3] strbuf: add tests

2016-06-06 Thread William Duclot
Test the strbuf API. Being used throughout all Git the API could be
considered tested, but adding specific tests makes it easier to improve
and extend the API.

Signed-off-by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off-by: Simon Rabourg <simon.rabo...@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
Changes since V1:
* Use the parser API
* Coding style fix
* New test for string size

 Makefile   |   1 +
 t/helper/test-strbuf.c | 101 +
 t/t0082-strbuf.sh  |  19 ++
 3 files changed, 121 insertions(+)
 create mode 100644 t/helper/test-strbuf.c
 create mode 100755 t/t0082-strbuf.sh

diff --git a/Makefile b/Makefile
index 3f03366..dc84f43 100644
--- a/Makefile
+++ b/Makefile
@@ -613,6 +613,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sha1-array
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-strbuf
 TEST_PROGRAMS_NEED_X += test-string-list
 TEST_PROGRAMS_NEED_X += test-submodule-config
 TEST_PROGRAMS_NEED_X += test-subprocess
diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c
new file mode 100644
index 000..271592e
--- /dev/null
+++ b/t/helper/test-strbuf.c
@@ -0,0 +1,101 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "parse-options.h"
+#include "builtin.h"
+
+/*
+ * Check behavior on usual use cases
+ */
+static int strbuf_check_behavior(struct strbuf *sb)
+{
+   char *str_test = xstrdup("test"), *res, *old_buf;
+   size_t size, old_alloc;
+
+   strbuf_grow(sb, 1);
+   old_alloc = sb->alloc;
+   strbuf_grow(sb, sb->alloc - sb->len + 1000);
+   if (old_alloc == sb->alloc)
+   die("strbuf_grow does not realloc the buffer as expected");
+   old_buf = sb->buf;
+   res = strbuf_detach(sb, );
+   if (res != old_buf)
+   die("strbuf_detach does not return the expected buffer");
+   free(res);
+
+   str_test = xstrdup("test");
+   strbuf_attach(sb, str_test, strlen(str_test), strlen(str_test) + 1);
+   res = strbuf_detach(sb, );
+   if (size != strlen(str_test)) 
+   die ("size is not as expected");
+
+   if (res != str_test)
+   die("strbuf_detach does not return the expected buffer");
+   free(res);
+   strbuf_release(sb);
+
+   return 0;
+}
+
+static int basic_grow(struct strbuf *sb) 
+{
+   strbuf_init(sb, 0);
+   strbuf_grow(sb, 0);
+   if (sb->buf == strbuf_slopbuf)
+   die("strbuf_grow failed to alloc memory");
+   strbuf_release(sb);
+   if (sb->buf != strbuf_slopbuf)
+   die("strbuf_release does not reinitialize the strbuf");
+
+   return 0;
+}
+
+static void grow_overflow(struct strbuf *sb)
+{
+   strbuf_init(sb, 1000);
+   strbuf_grow(sb, maximum_unsigned_value_of_type((size_t)1));
+}
+
+int main(int argc, const char *argv[])
+{
+   struct strbuf sb;
+   enum {
+   MODE_UNSPECIFIED = 0,
+   MODE_BASIC_GROW ,
+   MODE_STRBUF_CHECK,
+   MODE_GROW_OVERFLOW
+   } cmdmode = MODE_UNSPECIFIED;
+   struct option options[] = {
+   OPT_CMDMODE(0, "basic_grow", ,
+   N_(" basic grow test"),
+   MODE_BASIC_GROW),
+   OPT_CMDMODE(0, "strbuf_check_behavior", ,
+   N_("check the strbuf's behavior"),
+   MODE_STRBUF_CHECK),
+   OPT_CMDMODE(0, "grow_overflow", ,
+   N_("test grow expecting overflow"),
+   MODE_GROW_OVERFLOW),
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, NULL, options, NULL, 0);
+
+   if (cmdmode == MODE_BASIC_GROW) {
+   /*
+* Check if strbuf_grow(0) allocate a new NUL-terminated buffer
+*/
+   return basic_grow();
+   } else if (cmdmode == MODE_STRBUF_CHECK) {
+   strbuf_init(, 0);
+   return strbuf_check_behavior();
+   } else if (cmdmode == MODE_GROW_OVERFLOW) {
+   /*
+* size_t overflow: should die().
+* If this does not die(), fall through to returning success.
+*/
+   grow_overflow();
+   } else {
+   usage("test-strbuf mode");
+   }
+
+   return 0;
+}
diff --git a/t/t0082-strbuf.sh b/t/t0082-strbuf.sh
new file mode 100755
index 000..6a579a3
--- /dev/null
+++ b/t/t0082-strbuf.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+test_description='Test the strbuf API.'
+
+. ./test-lib.sh

[PATCH V2 2/3] pretty.c: rename strbuf_wrap() function

2016-06-06 Thread William Duclot
The function strbuf_wrap() is not part of the strbuf API, yet prevent to
extend the API to include wrapping functions. Renaming it to something
more specific allow to use "strbuf_wrap" for the strbut API.

Signed-off-by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off-by: Simon Rabourg <simon.rabo...@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
 pretty.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/pretty.c b/pretty.c
index 87c4497..2b9e89a 100644
--- a/pretty.c
+++ b/pretty.c
@@ -904,8 +904,8 @@ static void parse_commit_message(struct 
format_commit_context *c)
c->commit_message_parsed = 1;
 }
 
-static void strbuf_wrap(struct strbuf *sb, size_t pos,
-   size_t width, size_t indent1, size_t indent2)
+static void strbuf_wrap_message(struct strbuf *sb, size_t pos,
+   size_t width, size_t indent1, size_t indent2)
 {
struct strbuf tmp = STRBUF_INIT;
 
@@ -926,7 +926,8 @@ static void rewrap_message_tail(struct strbuf *sb,
c->indent2 == new_indent2)
return;
if (c->wrap_start < sb->len)
-   strbuf_wrap(sb, c->wrap_start, c->width, c->indent1, 
c->indent2);
+   strbuf_wrap_message(sb, c->wrap_start, c->width, c->indent1,
+   c->indent2);
c->wrap_start = sb->len;
c->width = new_width;
c->indent1 = new_indent1;
-- 
2.9.0.rc1.1.geac644e

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] userdiff: add built-in pattern for CSS

2016-06-06 Thread William Duclot
On Fri, Jun 03, 2016 at 08:50:50AM -0700, Junio C Hamano wrote:
> William Duclot <william.duc...@ensimag.grenoble-inp.fr> writes:
> 
> > Here I have to disagree (with you and Junio): the IPATTERN is
> > case-insensitive only on the "pattern" regex, not the "word_regex"
> > regex.
> 
> Ahh, OK.  Obviously both of us overlooked that.  Thanks for pushing
> back.
> 
> > On the identifier line, I have "A-F" instead of "A-Z" though
> 
> Yeah, that does need updating.

Note that I sent a V4 :)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] strbuf: allow to use preallocated memory

2016-06-03 Thread William Duclot
On Tue, May 31, 2016 at 06:05:45AM +0200, Michael Haggerty wrote:
> When reading this patch series, I found I had trouble remembering
> whether "preallocated" meant "preallocated and movable" or "preallocated
> and immovable". So maybe we should brainstorm alternatives to
> "preallocated" and "fixed". For example,
> 
> * "growable"/"fixed"? Seems OK, though all strbufs are growable at least
> to the size of their initial allocation, so maybe "growable" is misleading.
> 
> * "movable"/"fixed"? This maybe better captures the essence of the
> distinction. I'll use those names below for concreteness, without
> claiming that they are the best.
> 
> [...]
> 
> The functions might be
> named more like `strbuf_attach()` to emphasize their similarity to that
> existing function. Maybe
> 
> strbuf_attach_fixed(struct strbuf *sb, void *s, size_t len, size_t
> alloc);
> strbuf_attach_movable(struct strbuf *sb, void *s, size_t len, size_t
> alloc);

Now that I am looking in detail into it, I am not so convinced by those
names. Using "attach" suggests the same behavior as strbuf_attach(),
which is _not_ the case to me:
- The aim of the attach() function is to give ownership of a
  buffer allocated by the caller to the strbuf.
- The aim of the wrap() functions is to give the right to use a
  buffer allocated by the caller to the strbuf, while keeping
  ownership.
- For completion: the aim of the init() function is to let the
  strbuf manage its own buffer.

I think that it is important to distinct those 3 use cases for the API user
to be able to understand. And to describe this API extension, "wrap"
seems clear to me.
Another point that would makes me skeptical about using
`strbuf_attach_preallocated()` is that the real difference with
`strbuf_attach()` isn't in the allocation of the buffer parameter, it is
in the ownership. Both takes a buffer allocated by the user as
parameter (so a preallocated buffer), even thought
`strbuf_attach_preallocated()` may use a stack-allocated one.

So I come to the conclusion that even using the word "preallocated" may
not be such a good idea, as even strbuf_attach() use a preallocated
buffer. What I think would be the clearer would be:

- strbuf_attach()   (unchanged)
- strbuf_wrap() (no need for the "preallocated")
- strbuf_wrap_fixed()
- STRBUF_WRAP   
- STRBUF_WRAP_FIXED

The two last ones would be macros, equivalent to the functions except
that they don't release the strbuf before initializing it.

What do you think about this?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] userdiff: add built-in pattern for CSS

2016-06-03 Thread William Duclot
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Logic behind the "pattern" regex is:
1. reject lines ending with a colon/semicolon (properties)
2. if a line begins with a name in column 1, pick the whole line

Credits to Johannes Sixt (j...@kdbg.org) for the pattern regex and most
of the tests.

Signed-off-by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
Changes since V3:
- Add a few tests
- Remove a redondant test
- Handle trailing spaces
- Reword in doc
- Improvement of the pattern regex

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh|  1 +
 t/t4018/css-brace-in-col-1  |  5 +
 t/t4018/css-colon-eol   |  4 
 t/t4018/css-colon-selector  |  5 +
 t/t4018/css-common  |  4 
 t/t4018/css-long-selector-list  |  6 ++
 t/t4018/css-prop-sans-indent|  5 +
 t/t4018/css-short-selector-list |  4 
 t/t4018/css-trailing-space  |  5 +
 t/t4034-diff-words.sh   |  1 +
 t/t4034/css/expect  | 16 
 t/t4034/css/post| 10 ++
 t/t4034/css/pre | 10 ++
 userdiff.c  | 12 
 15 files changed, 90 insertions(+)
 create mode 100644 t/t4018/css-brace-in-col-1
 create mode 100644 t/t4018/css-colon-eol
 create mode 100644 t/t4018/css-colon-selector
 create mode 100644 t/t4018/css-common
 create mode 100644 t/t4018/css-long-selector-list
 create mode 100644 t/t4018/css-prop-sans-indent
 create mode 100644 t/t4018/css-short-selector-list
 create mode 100644 t/t4018/css-trailing-space
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..8882a3e 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for cascading style sheets.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
bibtex
cpp
csharp
+   css
fortran
fountain
html
diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
new file mode 100644
index 000..7831577
--- /dev/null
+++ b/t/t4018/css-brace-in-col-1
@@ -0,0 +1,5 @@
+RIGHT label.control-label
+{
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-colon-eol b/t/t4018/css-colon-eol
new file mode 100644
index 000..5a30553
--- /dev/null
+++ b/t/t4018/css-colon-eol
@@ -0,0 +1,4 @@
+RIGHT h1 {
+color:
+ChangeMe;
+}
diff --git a/t/t4018/css-colon-selector b/t/t4018/css-colon-selector
new file mode 100644
index 000..c6d71fb
--- /dev/null
+++ b/t/t4018/css-colon-selector
@@ -0,0 +1,5 @@
+RIGHT a:hover {
+margin-top:
+10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-common b/t/t4018/css-common
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-common
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
new file mode 100644
index 000..7ccd25d
--- /dev/null
+++ b/t/t4018/css-long-selector-list
@@ -0,0 +1,6 @@
+p.header,
+label.control-label,
+div ul#RIGHT {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
new file mode 100644
index 000..a9e3c86
--- /dev/null
+++ b/t/t4018/css-prop-sans-indent
@@ -0,0 +1,5 @@
+RIGHT, label.control-label {
+margin-top: 10px!important;
+padding: 0;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-short-selector-list b/t/t4018/css-short-selector-list
new file mode 100644
index 000..6a0bdee
--- /dev/null
+++ b/t/t4018/css-short-selector-li

Re: [PATCH] userdiff: add built-in pattern for CSS

2016-06-03 Thread William Duclot
On Fri, Jun 03, 2016 at 07:52:45AM +0200, Johannes Sixt wrote:
> Am 03.06.2016 um 00:48 schrieb William Duclot:
>>Logic behind the "pattern" regex is:
> 
> The name of the macro parameter is "pattern", but the actual meaning
> is "function name" regex.
> 
>>diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
>>index e3b1de8..81f60ad 100644
>>--- a/Documentation/gitattributes.txt
>>+++ b/Documentation/gitattributes.txt
>>@@ -525,6 +525,8 @@ patterns are available:
>>
>>  - `csharp` suitable for source code in the C# language.
>>
>>+- `css` suitable for source code in the CSS language.
> 
> CSS is not so much source code. How about "suitable for cascaded
> style sheets"?

Technically correct yes
 
>>diff --git a/t/t4018/css-common b/t/t4018/css-common
>>new file mode 100644
>>index 000..84ed754
>>--- /dev/null
>>+++ b/t/t4018/css-common
>>@@ -0,0 +1,4 @@
>>+RIGHT label.control-label {
>>+margin-top: 10px!important;
>>+border : 10px ChangeMe #C6C6C6;
>>+}
> 
>>diff --git a/t/t4018/css-rule b/t/t4018/css-rule
>>new file mode 100644
>>index 000..84ed754
>>--- /dev/null
>>+++ b/t/t4018/css-rule
>>@@ -0,0 +1,4 @@
>>+RIGHT label.control-label {
>>+margin-top: 10px!important;
>>+border : 10px ChangeMe #C6C6C6;
>>+}
> 
> These two are the same. Please pick only one. I propose the name
> "common" because it is how CSS rules are written most commonly.

Right, I was distracted

>>+IPATTERN("css",
>>+  "!^.*;\n"
> 
> Is there a difference between this and "!;\n"? Is it necessary to
> anchor the pattern at the beginning of the line?
> 
> In the commit message you talk about colon (':'), but you actually
> use a semicolon (';'). Thinking a bit more about it, rejecting lines
> with either one would be even better. Consider this case (without
> the indentation):
> 
>h1 {
>color:
>red;
>}
> 
> (New test case, hint, hint!) Therefore, it could be: "![:;]\n".

You're right! (plus the potential trailing spaces)

>>+  "^[_a-z0-9].*$",
>>+  /* -- */
>>+  /*
>>+   * This regex comes from W3C CSS specs. Should theoretically also
>>+   * allow ISO 10646 characters U+00A0 and higher,
>>+   * but they are not handled in this regex.
>>+   */
>>+  "-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
> 
> Drop A-F.
> 
>>+  "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> 
> Here, too: it is an IPATTERN.

Here I have to disagree (with you and Junio): the IPATTERN is
case-insensitive only on the "pattern" regex, not the "word_regex"
regex. It can be seen in the macro definition, and a quick test confirm
that (and we can see that the fortran word_regex, for example, bother
with uppercase and lowercase even thought it use IPATTERN).
On the identifier line, I have "A-F" instead of "A-Z" though
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] userdiff: add built-in pattern for CSS

2016-06-02 Thread William Duclot
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Logic behind the "pattern" regex is:
1. reject lines containing a colon (properties)
2. if a line begins with a name in column 1, pick the whole line

Credits to Johannes Sixt (j...@kdbg.org) for the pattern regex and most
of the tests.

Signed-off-by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
Changes since V2:
- pattern regex has changed
- more tests

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh|  1 +
 t/t4018/css-brace-in-col-1  |  5 +
 t/t4018/css-common  |  4 
 t/t4018/css-long-selector-list  |  6 ++
 t/t4018/css-prop-sans-indent|  5 +
 t/t4018/css-rule|  4 
 t/t4018/css-short-selector-list |  4 
 t/t4034-diff-words.sh   |  1 +
 t/t4034/css/expect  | 16 
 t/t4034/css/post| 10 ++
 t/t4034/css/pre | 10 ++
 userdiff.c  | 12 
 13 files changed, 80 insertions(+)
 create mode 100644 t/t4018/css-brace-in-col-1
 create mode 100644 t/t4018/css-common
 create mode 100644 t/t4018/css-long-selector-list
 create mode 100644 t/t4018/css-prop-sans-indent
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4018/css-short-selector-list
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
bibtex
cpp
csharp
+   css
fortran
fountain
html
diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
new file mode 100644
index 000..7831577
--- /dev/null
+++ b/t/t4018/css-brace-in-col-1
@@ -0,0 +1,5 @@
+RIGHT label.control-label
+{
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-common b/t/t4018/css-common
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-common
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
new file mode 100644
index 000..7ccd25d
--- /dev/null
+++ b/t/t4018/css-long-selector-list
@@ -0,0 +1,6 @@
+p.header,
+label.control-label,
+div ul#RIGHT {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
new file mode 100644
index 000..a9e3c86
--- /dev/null
+++ b/t/t4018/css-prop-sans-indent
@@ -0,0 +1,5 @@
+RIGHT, label.control-label {
+margin-top: 10px!important;
+padding: 0;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4018/css-short-selector-list b/t/t4018/css-short-selector-list
new file mode 100644
index 000..6a0bdee
--- /dev/null
+++ b/t/t4018/css-short-selector-list
@@ -0,0 +1,4 @@
+label.control, div ul#RIGHT {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver

Re: [PATCH 0/2] strbuf: improve API

2016-06-02 Thread William Duclot
On Thu, Jun 02, 2016 at 02:58:22PM +0200, Matthieu Moy wrote:
> Michael Haggerty  writes:
> 
>> 1. The amount of added code complexity is small and quite
>>encapsulated.
> 
> Actually, STRBUF_OWNS_MEMORY can even be seen as a simplification if
> done properly: we already have the case where the strbuf does not own
> the memory with strbuf_slopbuf. I already pointed places in
> strbuf_grow() which could be simplified after the patch. Re-reading the
> code it seems at lesat the call to strbuf_grow(sb, 0); in strbuf_detach
> becomes useless. The same in strbuf_attach() probably is, too.
> 
> So, the final strbuf.[ch] code might not be "worse" that the previous.
> 
> I'm unsure about the complexity of the future code using the new API. I
> don't forsee cases where using the new API would lead to a high
> maintenance cost, but I don't claim I considered all possible uses.
> 
>> 2. The ability to use strbufs without having to allocate memory might
>>make enough *psychological* difference that it encourages some
>>devs to use strbufs where they would otherwise have done manual
>>memory management. I think this would be a *big* win in terms of
>>potential bugs and security vulnerabilities avoided.
> 
> Note that this can also be seen as a counter-argument, since it
> may psychologically encourage people to micro-optimize code and use
> contributors/reviewers neurons to spend time on "shall this be on-stack
> or malloced?".
> 
> I think we already have a tendency to micro-optimize non-critical code
> too much in Git's codebase, so it's not necessarily a step in the right
> direction.
> 
> In conclusion, I don't have a conclusion, sorry ;-).

Thank you all for your input and your tests, those are very valuable!
Me and Simon have to take a decision, as this contribution is part of a
school project that comes to an end. We won't have the time to create
tests that are representative of the use of strbuf in the Git codebase
(partly because we lack knowledge of the codebase) to settle on whether
this API improvement is useful or not.

Jeff made very good points, and the tests we ran ourselves seem to
agree with yours. That being said, the tests made by Michael, more
detailed, suggest that there may be room for an improvement of the
strbuf API (even thought that's to confront to the reality of the
codebase).

Having little time, we will refactor the patch and send it as a V2: this
way, if it appears one day that improving the API with stack-allocated
memory is indeed useful, the patch will be ready to merge :)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] strbuf: allow to use preallocated memory

2016-05-31 Thread William Duclot
On Tue, May 31, 2016 at 05:54:59PM +0200, Matthieu Moy wrote:
> William  writes:
> 
> > On Mon, May 30, 2016 at 11:34:42PM -0700, Junio C Hamano wrote:
> >
> >> As long as your "on stack strbuf" allows lengthening the string
> >> beyond the initial allocation (i.e. .alloc, not .len), the user of
> >> the API (i.e. the one that placed the strbuf on its stack) would not
> >> know when the implementation (i.e. the code in this patch) decides
> >> to switch to allocated memory, so it must call strbuf_release()
> >> before it leaves.  Which in turn means that your implementation of
> >> strbuf_release() must be prepared to be take a strbuf that still has
> >> its string on the stack.
> >
> > Well, my implementation does handle a strbuf that still has its
> > string on the stack: the buffer won't be freed in this case (only a
> > reset to STRBUF_INIT).
> > Unless I misunderstood you?
> 
> I think Junio meant:
> 
> void f()
> {
>   struct strbuf sb;
>   char buf[N];
>   strbuf_wrap_preallocated(, buf, ...);
>   strbuf_add(, ...);
> 
>   // is it safe to just let sb reach the end of its scope?
> }
> 
> To answer the last question, the user would need to know too much about
> the allocation policy, so in this case the user should call
> strbuf_release(), and let it chose whether to call "free()"
> (OWNS_MEMORY) or not. This is OK with your implementation, but the doc
> needs to reflect this.

Okay, I understand. The idea was that the only change in the API is the
initialization: the API user can then use the strbuf like they could
before this patch, with the same responsabilities (having to release the
strbuf when they don't need it anymore).
Maybe a clearer documentation about the life and death of a strbuf
(initialization, use and release) would be useful, yes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] strbuf: allow to use preallocated memory

2016-05-31 Thread William Duclot
On Tue, May 31, 2016 at 06:05:45AM +0200, Michael Haggerty wrote:
> On 05/30/2016 02:52 PM, Matthieu Moy wrote:
> > [...]
> 
> I feel bad bikeshedding about names, especially since you took some of
> the original names from my RFC. But names are very important, so I think
> it's worth considering whether the current names could be improved upon.
> 
> When reading this patch series, I found I had trouble remembering
> whether "preallocated" meant "preallocated and movable" or "preallocated
> and immovable". So maybe we should brainstorm alternatives to
> "preallocated" and "fixed". For example,
> 
> * "growable"/"fixed"? Seems OK, though all strbufs are growable at least
> to the size of their initial allocation, so maybe "growable" is misleading.
> 
> * "movable"/"fixed"? This maybe better captures the essence of the
> distinction. I'll use those names below for concreteness, without
> claiming that they are the best.

Yes, the names are debatable

> > * strbuf_attach() calls strbuf_release(), which allows reusing an
> >   existing strbuf. strbuf_wrap_preallocated() calls strbuf_init which
> >   would override silently any previous content. I think strbuf_attach()
> >   does the right thing here.
> 
> Hmmm
> 
> I think the best way to answer these questions is to think about use
> cases and make them as easy/consistent as possible.
> 
> I expect that a very common use of strbuf_wrap_fixed() will be to wrap a
> stack-allocated string, like
> 
> char pathbuf[PATH_MAX];
> struct strbuf path;
> 
> strbuf_wrap_fixed(, pathbuf, 0, sizeof(pathbuf));
> 
> In this use case, it would be a shame if `path` had to be initialized to
> STRBUF_INIT just because `strbuf_wrap_fixed()` calls `strbuf_release()`
> internally.
> 
> But maybe we could make this use case easier still. If there were a macro
> 
> #define STRBUF_FIXED_WRAPPER(sb, buf, len) { STRBUF_FIXED_MEMORY,
> sizeof(buf), (len), (buf) }
> 
> then we could write
> 
> char pathbuf[PATH_MAX];
> struct strbuf path = STRBUF_FIXED_WRAPPER(pathbuf, 0);
> 
> I think that would be pretty usable. One would have to be careful only
> to wrap arrays and not `char *` pointers, because `sizeof` wouldn't work
> on the latter. The BARF_UNLESS_AN_ARRAY macro could be used here to add
> a little safety.

That sounds like a good idea to me

> [...]
> If you provide macro forms like these for initializing strbufs, then I
> agree with Matthieu that the analogous functional forms should probably
> call strbuf_release() before wrapping the array. The functions might be
> named more like `strbuf_attach()` to emphasize their similarity to that
> existing function. Maybe
> 
> strbuf_attach_fixed(struct strbuf *sb, void *s, size_t len, size_t
> alloc);
> strbuf_attach_movable(struct strbuf *sb, void *s, size_t len, size_t
> alloc);

I'm not a big fan of the idea that the macro and the function don't have
the same behavior. Using "attach" instead of "wrap" or "init" may
justify that
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] strbuf: allow to use preallocated memory

2016-05-30 Thread William Duclot
Mike Hommey  writes:
>>  struct strbuf {
>> +unsigned int flags;
>>  size_t alloc;
>>  size_t len;
>>  char *buf;
>>  };
> 
> Depending whether the size of strbuf matters, it /might/ be worth
> considering some packing here. malloc() usually returns buffers that can
> contain more data than what is requested. Which means allocation sizes
> could be rounded and that wouldn't change the amount of allocated
> memory. On glibc malloc_usable_size(malloc(1)) apparently returns 24.
> On jemalloc, it's 4 or 8. It's in the same ballbark with many
> allocators.
> 
> So, it would be possible to round alloc such that it's always a multiple
> of, say, 4, and stick flags in the low, unused bits.

If I'm not mistaken, the memory allocated is not necessarily linear with
the size asked, depending on the algorithm used by the allocator and/or
the kernel. The system for exemple use powers of two, if the user asks
for exactly 2^x bytes, adding the space for the flags would lead to an
allocation of 2^(x+1) bytes. Way worse than storing an unsigned.
If the allocator use a fibonnaci system, we can't even rely on multiples
of 4 (or 2).
I'm not sure the fibonnaci system is actually used by any allocator, but
my point is that I'm not sure it is a good thing to rely on such 
low-level implementations.

> Whether it's worth doing is another question.

I'd say it is not, we generally lack time more than space, storing an
unsigned seems very reasonable compared to operations involved in using
malloc() leftovers :)
Not even talking about growing buffers, so realloc(), so a whole new set
of problems...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] strbuf: allow to use preallocated memory

2016-05-30 Thread William Duclot
Matthieu Moy <matthieu@grenoble-inp.fr> writes:
> William Duclot <william.duc...@ensimag.grenoble-inp.fr> writes:
> 
>> Matthieu Moy <matthieu@grenoble-inp.fr> writes:
>>>> +/**
>>>> + * Allow the caller to give a pre-allocated piece of memory for the
>>>> strbuf
>>>> + * to use and indicate that the strbuf must use exclusively this buffer,
>>>> + * never realloc() it or allocate a new one. It means that the string
>>>> can
>>>> + * be manipulated but cannot overflow the pre-allocated buffer. The
>>>> + * pre-allocated buffer will never be freed.
>>>> + */
>>> 
>>> Perhaps say explicitly that although the allocated buffer has a fixed
>>> size, the string itself can grow as long as it does not overflow the
>>> buffer?
>>
>> That's what I meant by "the string can be manipulated but cannot overflow
>> the pre-allocated buffer". I'll try to reformulate
> 
> Maybe "the string can grow, but cannot overflow"?

Seems OK

>>>> @@ -91,6 +116,8 @@ extern void strbuf_release(struct strbuf *);
>>>>   * Detach the string from the strbuf and returns it; you now own the
>>>>   * storage the string occupies and it is your responsibility from then
>>>>   on
>>>>   * to release it with `free(3)` when you are done with it.
>>>> + * Must allocate a copy of the buffer in case of a preallocated/fixed
>>>> buffer.
>>>> + * Performance-critical operations have to be aware of this.
>>> 
>>> Better than just warn about performance, you can give the alternative.
>>
>> I'm not sure what you mean, I don't think there really is an alternative
>> for
>> detaching a string?
> 
> So, is the comment above saying "You're doomed, there's no way you can
> get good performance anyway"?
> 
> The alternative is just that you don't have to call strbuf_release since
> the caller can access the .buf field and is already the one responsible
> for freeing it when needed, and it's safe to just call strbuf_init() if
> one needs to re-initialize the stbuf structure.

Actually, if the user _needs_ to detach(), then yes he's doomed. Or more
realistically, he have to refactor so he'll don't need detach() (by being
sure the strbuf is pre-allocated by himself for example).
The strbuf_release() function is not the problem here, it does nothing
problematic for performances. The problem is strbuf_wrap_preallocated(),
but you're right I can add a comment so the user know when he don't have
to call it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] strbuf: allow to use preallocated memory

2016-05-30 Thread William Duclot
Matthieu Moy <matthieu@grenoble-inp.fr> writes:
> William Duclot <william.duc...@ensimag.grenoble-inp.fr> writes:
> 
>> @@ -20,16 +28,37 @@ char strbuf_slopbuf[1];
>>  
>>  void strbuf_init(struct strbuf *sb, size_t hint)
>>  {
>> +sb->flags = 0;
>>  sb->alloc = sb->len = 0;
>>  sb->buf = strbuf_slopbuf;
>>  if (hint)
>>  strbuf_grow(sb, hint);
>>  }
> 
> If you set flags = 0 here, existing callers will have all flags off,
> including OWNS_MEMORY.
> 
> I *think* this is OK, as sb->buf is currently pointing to
> strbuf_slopbuf, which the the strbuf doesn't own. But that is too subtle
> to go without an explanatory comment IMHO.

Right

> 
> Also, doesn't this make the "new_buf" case useless in strbuf_grow?
> 
> With your patch, the code looks like:
> 
> void strbuf_grow(struct strbuf *sb, size_t extra)
> {
>   int new_buf = !sb->alloc;
> ...
>   if (sb->flags & STRBUF_OWNS_MEMORY) {
>   if (new_buf) // < (1)
>   sb->buf = NULL;
>   ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>   } else {
>   /*
>* The strbuf doesn't own the buffer: to avoid to realloc it,
>* the strbuf needs to use a new buffer without freeing the old
>*/
>   if (sb->len + extra + 1 > sb->alloc) {
>   size_t new_alloc = MAX(sb->len + extra + 1, 
> alloc_nr(sb->alloc));
>   char *buf = xmalloc(new_alloc);
>   memcpy(buf, sb->buf, sb->alloc);
>   sb->buf = buf;
>   sb->alloc = new_alloc;
>   sb->flags |= STRBUF_OWNS_MEMORY;
>   }
>   }
> 
>   if (new_buf) // < (2)
>   sb->buf[0] = '\0';
> }
> 
> I think (1) is now dead code, since sb->alloc == 0 implies that
> STRBUF_OWNS_MEMORY is set. (2) seems redundant since you've just
> memcpy-ed the existing '\0' into the buffer.

You're right for (1), I hadn't noticed that.
For (2), we'll still have to set sb->buf[new_alloc-1]='\0' after the memcpy, if 
we
have sb->alloc==0 then the memcpy won't copy it.

>> +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf,
>> +  size_t path_buf_len, size_t alloc_len)
>> +{
>> +if (!path_buf)
>> +die("you try to use a NULL buffer to initialize a strbuf");
>> +
>> +strbuf_init(sb, 0);
>> +strbuf_attach(sb, path_buf, path_buf_len, alloc_len);
>> +sb->flags &= ~STRBUF_OWNS_MEMORY;
>> +sb->flags &= ~STRBUF_FIXED_MEMORY;
>> +}
> 
> strbuf_wrap_preallocated seem very close to strbuf_attach. I'd rather
> see a symmetric code sharing like
> 
> void strbuf_attach_internal(struct strbuf *sb, ..., unsigned int flags)
> 
> and then strbuf_attach() and strbuf_wrap_preallocated() become
> straightforward wrappers around it.
> 
> This would avoid setting and then unsetting STRBUF_OWNS_MEMORY (the
> performance impact is probably small, but it looks weird to set the flag
> and then unset it right away).

We'll refactor the code with Johannes' remarks and yours in mind

> After your patch, there are differences between
> strbuf_wrap_preallocated() which I think are inconsistencies:
> 
> * strbuf_attach() does not check for NULL buffer, but it doesn't accept
>   them either if I read correctly. It would make sense to add the check
>   to strbuf_attach(), but it's weird to have the performance-critical
>   oriented function do the expensive stuff that the
>   non-performance-critical one doesn't.

I agree that strbuf_attach should do the check (it seems strange that it
doesn't already do it, as the "buffer never NULL" invariant is not new).
I don't understand your "but" part, what "expensive stuff" are you talking
about?

> * strbuf_attach() calls strbuf_release(), which allows reusing an
>   existing strbuf. strbuf_wrap_preallocated() calls strbuf_init which
>   would override silently any previous content. I think strbuf_attach()
>   does the right thing here.
>
>   In any case, you probably want to include calls to strbuf_attach() and
>   strbuf_wrap_*() functions on existing non-empty strbufs.
>
>> +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf,
>> +   size_t path_buf_len, size_t alloc_len)
>> +{
>> +strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len);
&

Re: [PATCH 2/2] strbuf: allow to use preallocated memory

2016-05-30 Thread William Duclot
Johannes Schindelin <johannes.schinde...@gmx.de> writes:
> On Mon, 30 May 2016, William Duclot wrote:
> 
>> It is unfortunate that it is currently impossible to use a strbuf
>> without doing a memory allocation. So code like
>> 
>> void f()
>> {
>> char path[PATH_MAX];
>> ...
>> }
>> 
>> typically gets turned into either
>> 
>> void f()
>> {
>> struct strbuf path;
>> strbuf_add(, ...); <-- does a malloc
>> ...
>> strbuf_release();  <-- does a free
>> }
>> 
>> which costs extra memory allocations, or
>> 
>> void f()
>> {
>> static struct strbuf path;
>> strbuf_add(, ...);
>> ...
>> strbuf_setlen(, 0);
>> }
>> 
>> which, by using a static variable, avoids most of the malloc/free
>> overhead, but makes the function unsafe to use recursively or from
>> multiple threads. Those limitations prevent strbuf to be used in
>> performance-critical operations.
> 
> This description is nice and verbose, but maybe something like this would
> introduce the subject in a quicker manner?
> 
>   When working e.g. with file paths or with dates, strbuf's
>   malloc()/free() dance of strbufs can be easily avoided: as
>   a sensible initial buffer size is already known, it can be
>   allocated on the heap.

strbuf already allow to indicate a sensible initial buffer size thanks
to strbuf_init() second parameter. The main perk of pre-allocation
is to use stack-allocated memory, and not heap-allocated :)
Unless I misunderstood your message?

>> diff --git a/strbuf.c b/strbuf.c
>> index 1ba600b..527b986 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -1,6 +1,14 @@
>>  #include "cache.h"
>>  #include "refs.h"
>>  #include "utf8.h"
>> +#include 
> 
> Why?

For the MAX macro. It may be a teeny tiny overkill

>> +/**
>> + * Flags
>> + * --
>> + */
>> +#define STRBUF_OWNS_MEMORY 1
>> +#define STRBUF_FIXED_MEMORY (1 << 1)
> 
> From reading the commit message, I expected STRBUF_OWNS_MEMORY.
> STRBUF_FIXED_MEMORY still needs to be explained.

Yes, that seems right

>> @@ -20,16 +28,37 @@ char strbuf_slopbuf[1];
>>  
>>  void strbuf_init(struct strbuf *sb, size_t hint)
>>  {
>> +sb->flags = 0;
>>  sb->alloc = sb->len = 0;
>>  sb->buf = strbuf_slopbuf;
>>  if (hint)
>>  strbuf_grow(sb, hint);
>>  }
>>  
>> +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf,
>> +  size_t path_buf_len, size_t alloc_len)
>> +{
>> +if (!path_buf)
>> +die("you try to use a NULL buffer to initialize a strbuf");
>> +
>> +strbuf_init(sb, 0);
>> +strbuf_attach(sb, path_buf, path_buf_len, alloc_len);
>> +sb->flags &= ~STRBUF_OWNS_MEMORY;
>> +sb->flags &= ~STRBUF_FIXED_MEMORY;
> 
> Shorter: sb->flags &= ~(STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY);

Okay with me

>> +}
>> +
>> +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf,
>> +   size_t path_buf_len, size_t alloc_len)
>> +{
>> +strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len);
>> +sb->flags |= STRBUF_FIXED_MEMORY;
>> +}
> 
> Rather than letting strbuf_wrap_preallocated() set sb->flags &=
> ~FIXED_MEMORY only to revert that decision right away, a static function
> could be called by both strbuf_wrap_preallocated() and
> strbuf_wrap_fixed().

Makes sense

>>  void strbuf_release(struct strbuf *sb)
>>  {
>>  if (sb->alloc) {
>> -free(sb->buf);
>> +if (sb->flags & STRBUF_OWNS_MEMORY)
>> +free(sb->buf);
>>  strbuf_init(sb, 0);
>>  }
> 
> Should we not reset the flags here, too?

Well, strbuf_init() reset the flags. The only way to have !sb->alloc
is that strbuf has been initialized and never used (even alloc_grow(0)
set sb->alloc=1), so sb==STRBUF_INIT, so the flags don't have to be reset 

>> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz)
>>  {
>>  char *res;
>>  strbuf_grow(sb, 0);
>> -res = sb->buf;
>> +if (sb->flags & STRBUF_OWNS_MEMORY)
>> +res = sb->buf;
>> +else
>> +res = xmemdupz(sb->buf, sb->alloc - 1);
> 
> This looks like a usage to be avoided: if we plan to detach the buffer,
> anyway, there is no 

[PATCH 1/2] strbuf: add tests

2016-05-30 Thread William Duclot
Test the strbuf API. Being used throughout all Git the API could be
considered tested, but adding specific tests makes it easier to improve
and extend the API.
---
 Makefile   |  1 +
 t/helper/test-strbuf.c | 69 ++
 t/t0082-strbuf.sh  | 19 ++
 3 files changed, 89 insertions(+)
 create mode 100644 t/helper/test-strbuf.c
 create mode 100755 t/t0082-strbuf.sh

diff --git a/Makefile b/Makefile
index 3f03366..dc84f43 100644
--- a/Makefile
+++ b/Makefile
@@ -613,6 +613,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sha1-array
 TEST_PROGRAMS_NEED_X += test-sigchain
+TEST_PROGRAMS_NEED_X += test-strbuf
 TEST_PROGRAMS_NEED_X += test-string-list
 TEST_PROGRAMS_NEED_X += test-submodule-config
 TEST_PROGRAMS_NEED_X += test-subprocess
diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c
new file mode 100644
index 000..622f627
--- /dev/null
+++ b/t/helper/test-strbuf.c
@@ -0,0 +1,69 @@
+#include "git-compat-util.h"
+#include "strbuf.h"
+
+/*
+ * Check behavior on usual use cases
+ */
+int test_usual(struct strbuf *sb)
+{
+   size_t size, old_alloc;
+   char *res, *old_buf, *str_test = malloc(5*sizeof(char));
+   strbuf_grow(sb, 1);
+   strcpy(str_test, "test");
+   old_alloc = sb->alloc;
+   strbuf_grow(sb, 1000);
+   if (old_alloc == sb->alloc)
+   die("strbuf_grow does not realloc the buffer as expected");
+   old_buf = sb->buf;
+   res = strbuf_detach(sb, );
+   if (res != old_buf)
+   die("strbuf_detach does not return the expected buffer");
+   free(res);
+
+   strcpy(str_test, "test");
+   strbuf_attach(sb, (void *)str_test, strlen(str_test), sizeof(str_test));
+   res = strbuf_detach(sb, );
+   if (res != str_test)
+   die("strbuf_detach does not return the expected buffer");
+   free(res);
+   strbuf_release(sb);
+
+   return 0;
+}
+
+int main(int argc, char *argv[])
+{
+   size_t size = 1;
+   struct strbuf sb;
+   char str_test[5] = "test";
+   char str_foo[7] = "foo";
+
+   if (argc != 2)
+   usage("test-strbuf mode");
+
+   if (!strcmp(argv[1], "basic_grow")) {
+   /*
+* Check if strbuf_grow(0) allocate a new NUL-terminated buffer
+*/
+   strbuf_init(, 0);
+   strbuf_grow(, 0);
+   if (sb.buf == strbuf_slopbuf)
+   die("strbuf_grow failed to alloc memory");
+   strbuf_release();
+   if (sb.buf != strbuf_slopbuf)
+   die("strbuf_release does not reinitialize the strbuf");
+   } else if (!strcmp(argv[1], "strbuf_check_behavior")) {
+   strbuf_init(, 0);
+   return test_usual();
+   } else if (!strcmp(argv[1], "grow_overflow")) {
+   /*
+* size_t overflow: should die()
+*/
+   strbuf_init(, 1000);
+   strbuf_grow(, maximum_unsigned_value_of_type((size_t)1));
+   } else {
+   usage("test-strbuf mode");
+   }
+
+   return 0;
+}
diff --git a/t/t0082-strbuf.sh b/t/t0082-strbuf.sh
new file mode 100755
index 000..0800d26
--- /dev/null
+++ b/t/t0082-strbuf.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+test_description="Test the strbuf API.
+"
+. ./test-lib.sh
+
+test_expect_success 'basic test on strbuf_grow()' '
+   test-strbuf basic_grow
+'
+
+test_expect_success 'check strbuf behavior in usual use cases ' '
+   test-strbuf strbuf_check_behavior
+'
+
+test_expect_success 'overflow while calling strbuf_grow' '
+   test_must_fail test-strbuf grow_overflow
+'
+
+test_done
-- 
2.8.2.403.ge2646ba.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] strbuf: allow to use preallocated memory

2016-05-30 Thread William Duclot
It is unfortunate that it is currently impossible to use a strbuf
without doing a memory allocation. So code like

void f()
{
char path[PATH_MAX];
...
}

typically gets turned into either

void f()
{
struct strbuf path;
strbuf_add(, ...); <-- does a malloc
...
strbuf_release();  <-- does a free
}

which costs extra memory allocations, or

void f()
{
static struct strbuf path;
strbuf_add(, ...);
...
strbuf_setlen(, 0);
}

which, by using a static variable, avoids most of the malloc/free
overhead, but makes the function unsafe to use recursively or from
multiple threads. Those limitations prevent strbuf to be used in
performance-critical operations.

THE IDEA


The idea here is to enhance strbuf to allow it to use memory that it
doesn't own (for example, stack-allocated memory), while (optionally)
allowing it to switch over to using allocated memory if the string grows
past the size of the pre-allocated buffer.

API ENHANCEMENT
---

All functions of the API can still be reliably called without
knowledge of the initialization (normal/preallocated/fixed) with the
exception that strbuf_grow() may die() if the string try to overflow a
fixed buffer.

The API contract is still respected:

- The API users may peek strbuf.buf in-place until they perform an
  operation that makes it longer (at which point the .buf pointer
  may point at a new piece of memory).

- The API users may strbuf_detach() to obtain a piece of memory that
  belongs to them (at which point the strbuf becomes empty), hence
  needs to be freed by the callers.

Full credit to Michael Haggerty for the idea and most of the wording of
this commit (http://mid.gmane.org/53512db6.1070...@alum.mit.edu). The
implementation and bugs are all mine.

Signed-off by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off by: Simon Rabourg <simon.rabo...@ensimag.grenoble-inp.fr>
Signed-off by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
 strbuf.c   | 68 ++
 strbuf.h   | 31 +--
 t/helper/test-strbuf.c | 42 +++
 t/t0082-strbuf.sh  | 28 +
 4 files changed, 162 insertions(+), 7 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 1ba600b..527b986 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1,6 +1,14 @@
 #include "cache.h"
 #include "refs.h"
 #include "utf8.h"
+#include 
+
+/**
+ * Flags
+ * --
+ */
+#define STRBUF_OWNS_MEMORY 1
+#define STRBUF_FIXED_MEMORY (1 << 1)
 
 int starts_with(const char *str, const char *prefix)
 {
@@ -20,16 +28,37 @@ char strbuf_slopbuf[1];
 
 void strbuf_init(struct strbuf *sb, size_t hint)
 {
+   sb->flags = 0;
sb->alloc = sb->len = 0;
sb->buf = strbuf_slopbuf;
if (hint)
strbuf_grow(sb, hint);
 }
 
+void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf,
+ size_t path_buf_len, size_t alloc_len)
+{
+   if (!path_buf)
+   die("you try to use a NULL buffer to initialize a strbuf");
+
+   strbuf_init(sb, 0);
+   strbuf_attach(sb, path_buf, path_buf_len, alloc_len);
+   sb->flags &= ~STRBUF_OWNS_MEMORY;
+   sb->flags &= ~STRBUF_FIXED_MEMORY;
+}
+
+void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf,
+  size_t path_buf_len, size_t alloc_len)
+{
+   strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len);
+   sb->flags |= STRBUF_FIXED_MEMORY;
+}
+
 void strbuf_release(struct strbuf *sb)
 {
if (sb->alloc) {
-   free(sb->buf);
+   if (sb->flags & STRBUF_OWNS_MEMORY)
+   free(sb->buf);
strbuf_init(sb, 0);
}
 }
@@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz)
 {
char *res;
strbuf_grow(sb, 0);
-   res = sb->buf;
+   if (sb->flags & STRBUF_OWNS_MEMORY)
+   res = sb->buf;
+   else
+   res = xmemdupz(sb->buf, sb->alloc - 1);
+
if (sz)
*sz = sb->len;
strbuf_init(sb, 0);
@@ -51,6 +84,8 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, 
size_t alloc)
sb->buf   = buf;
sb->len   = len;
sb->alloc = alloc;
+   sb->flags |= STRBUF_OWNS_MEMORY;
+   sb->flags &= ~STRBUF_FIXED_MEMORY;
strbuf_grow(sb, 0);
sb->buf[sb->len] = '\0';
 }
@@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra)
if (unsigned_add_overflows(extra, 1) ||
unsigned_add_overflows(sb->len, extra + 1))
die("you want to use way too much memory");
-   if (new_buf)
-   sb->buf = NULL;
-   ALLOC_GROW(sb->buf, sb->len + extra 

[PATCH 0/2] strbuf: improve API

2016-05-30 Thread William Duclot
This patch series implements an improvment of the strbuf API, allowing
strbuf to use preallocated memory. This makes strbuf fit to be used
in performance-critical operations.

The first patch is simply a preparatory work, adding tests for
existing strbuf implementation.
Most of the work is made in the second patch: handle pre-allocated
memory, extend the API, document it and test it.

As said in the second commit, the idea comes from Michael Haggerty.

William Duclot (2):
strbuf: add tests
strbuf: allow to use preallocated memory

Makefile   |   1 +
strbuf.c   |  68 
+++-
strbuf.h   |  31 +--
t/helper/test-strbuf.c | 111 
+++
t/t0082-strbuf.sh  |  47 +++
5 files changed, 251 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] userdiff: add built-in pattern for CSS

2016-05-27 Thread William Duclot
Junio C Hamano <gits...@pobox.com> writes:
> William Duclot <william.duc...@ensimag.grenoble-inp.fr> writes:
>
>> As the CSS pattern
>> does not deal with letters at all it seemed sensible to me to follow
>> the example of the HTML pattern, which use PATTERNS().
> 
> Did you notice that HTML pattern has to do an [Hh] that would be
> unnecessary if it chose to use IPATTTERN()?
> 
> You do not have to ask a person, but instead ask the history.
> IPATTERN() was added at 909a5494 (userdiff.c: add builtin fortran
> regex patterns, 2010-09-10) when adding fortran support.  Anything
> that existed before, including HTML, did [A-Za-z] when they could
> have done [a-z] if IPATTERN() existed back then.

I hadn't noticed that the HTML pattern was older, indeed

>>>  - In our codebase, we format multi-line comments in a particular
>>>way, namely
>>> 
>>>/*
>>>  * A multi-line comment begins with slash asterisk
>>>  * on its own line, and its closing asterisk slash
>>>  * also is on its own line.
>>>  */
>>
>> I take good note of that. I took example on the fortran pattern
>> comment, should I fix it too while I'm at it?
> 
> Not "while you are at it".
> 
> Making existing things better is welcome but such a change shouldn't
> be mixed with addition of new things.  You can do it as a separate
> patch, probably as a preliminary clean-up before your change, if you
> want to.

OK !


Johannes Sixt <j...@kdbg.org> writes:
> Am 24.05.2016 um 16:25 schrieb William Duclot:
>> +PATTERNS("css",
>> + "^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
> 
> This hunk header pattern is a bit too restrictive for my taste. Find
> below a few more test cases that you should squash in. One case fails
> because only the first CSS selector is picked up, for which I do not
> see a reason.
> 
> Another case fails because the opening brace is not on the line with
> the CSS selectors.

Yes, it seems you're right !

> I think what the hunk header pattern should do is:
> 
> 1. reject lines containing a colon (because that are properties)
> 2. if a line begins with a name in column 1, pick the whole line
> 
> See the cpp patterns: a pattern beginning with ! is a "reject" pattern.

That may be a good idea, I will look into that

> diff --git a/t/t4018/css-brace-in-col-1 b/t/t4018/css-brace-in-col-1
> new file mode 100644
> index 000..7831577
> --- /dev/null
> +++ b/t/t4018/css-brace-in-col-1
> @@ -0,0 +1,5 @@
> +RIGHT label.control-label
> +{
> +margin-top: 10px!important;
> +border : 10px ChangeMe #C6C6C6;
> +}
> diff --git a/t/t4018/css-rule b/t/t4018/css-common
> similarity index 100%
> rename from t/t4018/css-rule
> rename to t/t4018/css-common
> diff --git a/t/t4018/css-long-selector-list b/t/t4018/css-long-selector-list
> new file mode 100644
> index 000..7ccd25d
> --- /dev/null
> +++ b/t/t4018/css-long-selector-list
> @@ -0,0 +1,6 @@
> +p.header,
> +label.control-label,
> +div ul#RIGHT {
> +margin-top: 10px!important;
> +border : 10px ChangeMe #C6C6C6;
> +}
> diff --git a/t/t4018/css-prop-sans-indent b/t/t4018/css-prop-sans-indent
> new file mode 100644
> index 000..a9e3c86
> --- /dev/null
> +++ b/t/t4018/css-prop-sans-indent
> @@ -0,0 +1,5 @@
> +RIGHT, label.control-label {
> +margin-top: 10px!important;
> +padding: 0;
> +border : 10px ChangeMe #C6C6C6;
> +}
> diff --git a/t/t4018/css-short-selector-list
> b/t/t4018/css-short-selector-list
> new file mode 100644
> index 000..6a0bdee
> --- /dev/null
> +++ b/t/t4018/css-short-selector-list
> @@ -0,0 +1,4 @@
> +label.control, div ul#RIGHT {
> +margin-top: 10px!important;
> +border : 10px ChangeMe #C6C6C6;
> +}
> --
> 2.9.0.rc0.40.gb3c1388

Thanks for the test cases, I'll look into that as soon as I have time
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] userdiff: add built-in pattern for CSS

2016-05-24 Thread William Duclot
> It is not a big deal for a small single-patch topic like this, but
> it often is hard to reviewers if you do not respond to comments you
> received and instead just send a new version of the patch with
> "changes since..." comment.  Please make it a habit to do both.

Apologies, I am not quite used to work with a mailing list yet, I
will make sure to respect this in the future!
 
>  - Have you considered using IPATTERN()?  PATTERNS() that defaults
>case sensitive match is suitable for real languages with fixed
>case keywords, but the pattern you are defining for CSS does not
>do anything special for any set of fixed-case built-in keywords,
>and appears to be better served by IPATTERN().

I did have considered IPATTERN(), but assumed that case-sensitive was
default and case-insensitive was the exception. As the CSS pattern
does not deal with letters at all it seemed sensible to me to follow
the example of the HTML pattern, which use PATTERNS().

>  - In our codebase, we format multi-line comments in a particular
>way, namely
> 
>   /*
>  * A multi-line comment begins with slash asterisk
>  * on its own line, and its closing asterisk slash
>  * also is on its own line.
>  */

I take good note of that. I took example on the fortran pattern
comment, should I fix it too while I'm at it?
 
>  - Try not to write overlong lines.  If your expression needs to
>become long and there is no good place to fold lines, that is one
>thing, but an overlong comment is unexcuable, as you can fold
>lines anywhere between words.

Again, I take good note of that.

Thank you for your time!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] userdiff: add built-in pattern for CSS

2016-05-24 Thread William Duclot
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Signed-off-by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
changes since v1:
Fix a typo in the word_regex ("A-F" => "A-Z").
Clearer comment about ISO 10646 characters.

 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh|  1 +
 t/t4018/css-rule|  4 
 t/t4034-diff-words.sh   |  1 +
 t/t4034/css/expect  | 16 
 t/t4034/css/post| 10 ++
 t/t4034/css/pre | 10 ++
 userdiff.c  |  8 
 8 files changed, 52 insertions(+)
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
bibtex
cpp
csharp
+   css
fortran
fountain
html
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 000..ed10393
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+diff --git a/pre b/post
+index b8ae0bb..fe500b7 100644
+--- a/pre
 b/post
+@@ -1,10 +1,10 @@
+.class-formother-form label.control-label {
+margin-top: 1015px!important;
+border : 10px dasheddotted #C6C6C6;
+}
+#CC#CB
+10em
+padding-bottommargin-left
+150pxem
+10px
+!important
+divli.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 000..fe500b7
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,10 @@
+.other-form label.control-label {
+margin-top: 15px!important;
+border : 10px dotted #C6C6C6;
+}
+#CB
+10em
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 000..b8ae0bb
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,10 @@
+.class-form label.control-label {
+margin-top: 10px!important;
+border : 10px dashed #C6C6C6;
+}
+#CC
+10em
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..9273969 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,14 @@ PATTERNS("csharp",
 "[a-zA-Z_][a-zA-Z0-9_]*"
 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+PATTERNS("css",
+"^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
+/* -- */
+/* This regex comes from W3C CSS specs. Should theoretically also 
allow ISO 10646 characters U+00A0 and higher,
+ * but they are not handled with this regex. */
+"-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
+"|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
2.9.0.rc0.1.g521d471.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: run-command: output owner picking strategy

2016-05-20 Thread William Duclot
> When running in parallel we already may be out of order
> (relative to serial processing). See the second example in the
> commit message to produce a different order.

Right, I could (should) have understood that by myself.
 
> Consider we scheduled tasks to be run in 3 parallel processes:
> (As we NEEDSWORK comment only addresses the ouput selection,
> let's assume this is a fixes schedule, which we cannot alter.
> Which is true if we only change the code you quoted. That picks
> the process to output.)
> 
> [...]
 
> The output is produced by the current algorithm:
> (1) Start with process 1 (A) whose output will be live
> (2) Once A is done, flush all other done things, (B)
> (3) live output will be round robin, so process 2 (D)
> (4) Once D is done, flush all other done things (C, F, E)
> in order of who finshed first
> 
> 
> (1) is uncontroversial. We have no information about tasks A,B,C,
> so pick a random candidate. We hardcoded process 1 for now.
> 
> (2) also uncontroversial IMHO. There is not much we can do different.

Agreed
 
> (3) is what this NEEDSWORK comment is about. Instead of outputting D
> we might have choosen C. (for $REASONS, e.g.: C is running longer than
> D already, so we expect it to finish sooner, by assuming
> any task takes the same expected time to finish. And as C
> is expected to finish earlier than D, we may have smoother
> output. "Less buffered bursts")
> 
> [...]
> 
> This seems to be better than the current behavior as we have more
> different tasks with "live" output, i.e. you see stuff moving.
> I made up the data to make the point though. We would need to use
> live data and experiment with different strategies to find a
> good/better solution.

We should probably settle on what is the behavior we want to obtain, 
before trying to find a strategy to implement (or approximate) it:
- Do we want to be as close as possible to a serial processing output? 
- Do we want to see as much live output as possible?

I do not think that being close to serial processing is a relevant 
behavior: we applied an arbitrary order to tasks when naming them for
explanations (A, B, C...), but the tasks aren't really sorted in any
way (and that's why the parallelization is relevant).Neither the user
nor git have any interest in getting these ouputs in a specific order.

Therefore, a "as much live output as possible" behavior would be more
sensible. But I wonder: is there a worthy benefit in optimizing the
output owner strategy? I'm not used to working with submodules, but I
don't think that having a great number of submodules is a common thing.
Basically: we could solve a problem, but is there a problem?
I'm not trying to bury this NEEDSWORK, I'd be happy to look into it if
need be!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] userdiff: add built-in pattern for CSS

2016-05-20 Thread William Duclot
CSS is widely used, motivating it being included as a built-in pattern.

It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.

Signed-off-by: William Duclot <william.duc...@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu@grenoble-inp.fr>
---
 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh|  1 +
 t/t4018/css-rule|  4 
 t/t4034-diff-words.sh   |  1 +
 t/t4034/css/expect  | 16 
 t/t4034/css/post|  9 +
 t/t4034/css/pre |  9 +
 userdiff.c  |  8 
 8 files changed, 50 insertions(+)
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
bibtex
cpp
csharp
+   css
fortran
fountain
html
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 000..b025d88
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+diff --git a/pre b/post
+index 735f301..bdf6a90 100644
+--- a/pre
 b/post
+@@ -1,9 +1,9 @@
+.class-formother-form label.control-label {
+margin-top: 1015px!important;
+border : 10px dasheddotted #C6C6C6;
+}
+#CC
+padding-bottom#CB
+margin-left
+150pxem
+10px
+!important
+divli.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 000..bdf6a90
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,9 @@
+.other-form label.control-label {
+margin-top: 15px!important;
+border : 10px dotted #C6C6C6;
+}
+#CB
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 000..735f301
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,9 @@
+.class-form label.control-label {
+margin-top: 10px!important;
+border : 10px dashed #C6C6C6;
+}
+#CC
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..0f9cfbe 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,14 @@ PATTERNS("csharp",
 "[a-zA-Z_][a-zA-Z0-9_]*"
 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+PATTERNS("css",
+"^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
+/* -- */
+/* This regex comes from W3C CSS specs. Should theoretically also 
allow ISO 10646 characters U+00A0 and higher,
+ * this not handled in this regex. */
+"-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
+"|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
2.8.2.403.gdf0b511.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


run-command: output owner picking strategy

2016-05-20 Thread William Duclot
Hi,
I stumbled upon this piece of code (run-command.c:pp_collect_finish()), picking 
the owner of the output amongst parallel processes (introduced by Stephan 
Beller in commit c553c72eed64b5f7316ce227f6d5d783eae6f2ed)

/*
 * Pick next process to output live.
 * NEEDSWORK:
 * For now we pick it randomly by doing a round
 * robin. Later we may want to pick the one with
 * the most output or the longest or shortest
 * running process time.
 */
for (i = 0; i < n; i++)
   if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
  break;
pp->output_owner = (pp->output_owner + i) % n;


Would it be useful to improve this round-robin into something smarter (as 
stated by the NEEDSWORK)? It seems to be only used for submodules fetch/clone.

The options would be (as said in the comment):
1 - pick the process with the longest running process time
2 - pick the process with the shortest running process time
3 - pick the process for which the output buffer is the longest

But with one of those strategies, wouldn't we lose the advantage of having the 
same output order as a non-parallelized version? Cf the commit message.

What do you think ? 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] Add userdiff built-in pattern for CSS code

2016-05-19 Thread William Duclot
> On Thu, May 19, 2016 at 10:45 AM, Matthieu Moy
>  wrote:
> >> Add the info in documentation that CSS is now built-in.
> >
> > This doesn't add much to the patch (we can already see that from the patch
> > itself). I'd remove it.
> 
> I think you meant to say this "doesn't add much to the *commit
> message*", and that the sentence should be removed from the commit
> message, while retaining the actual documentation update in the patch.
> 
That's of course how I understood it, but thanks for pointing it out !

Beside that, all of Matthieu Moy comments seem correct to me and will be taken 
into account.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] Add userdiff built-in pattern for CSS code

2016-05-19 Thread William Duclot
CSS is widely used, motivating it being included as a built-in pattern.
It must be noted that the word_regex for CSS (i.e. the regex defining what is a 
word in the language) does not consider '.' and '#' characters (in CSS 
selectors) to be part of the word. This behavior is documented by the test 
t/t4018/css-rule.

Add the info in documentation that CSS is now built-in.
---
 Documentation/gitattributes.txt |  2 ++
 t/t4018-diff-funcname.sh|  1 +
 t/t4018/css-rule|  4 
 t/t4034-diff-words.sh   |  1 +
 t/t4034/css/expect  | 16 
 t/t4034/css/post|  9 +
 t/t4034/css/pre |  9 +
 userdiff.c  |  8 
 8 files changed, 50 insertions(+)
 create mode 100644 t/t4018/css-rule
 create mode 100644 t/t4034/css/expect
 create mode 100644 t/t4034/css/post
 create mode 100644 t/t4034/css/pre

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..81f60ad 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -525,6 +525,8 @@ patterns are available:
 
 - `csharp` suitable for source code in the C# language.
 
+- `css` suitable for source code in the CSS language.
+
 - `fortran` suitable for source code in the Fortran language.
 
 - `fountain` suitable for Fountain documents.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 67373dc..1795ffc 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -30,6 +30,7 @@ diffpatterns="
bibtex
cpp
csharp
+   css
fortran
fountain
html
diff --git a/t/t4018/css-rule b/t/t4018/css-rule
new file mode 100644
index 000..84ed754
--- /dev/null
+++ b/t/t4018/css-rule
@@ -0,0 +1,4 @@
+RIGHT label.control-label {
+margin-top: 10px!important;
+border : 10px ChangeMe #C6C6C6;
+}
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f2f55fc..912df91 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -302,6 +302,7 @@ test_language_driver ada
 test_language_driver bibtex
 test_language_driver cpp
 test_language_driver csharp
+test_language_driver css
 test_language_driver fortran
 test_language_driver html
 test_language_driver java
diff --git a/t/t4034/css/expect b/t/t4034/css/expect
new file mode 100644
index 000..b025d88
--- /dev/null
+++ b/t/t4034/css/expect
@@ -0,0 +1,16 @@
+diff --git a/pre b/post
+index 735f301..bdf6a90 100644
+--- a/pre
 b/post
+@@ -1,9 +1,9 @@
+.class-formother-form label.control-label {
+margin-top: 1015px!important;
+border : 10px dasheddotted #C6C6C6;
+}
+#CC
+padding-bottom#CB
+margin-left
+150pxem
+10px
+!important
+divli.class#id
diff --git a/t/t4034/css/post b/t/t4034/css/post
new file mode 100644
index 000..bdf6a90
--- /dev/null
+++ b/t/t4034/css/post
@@ -0,0 +1,9 @@
+.other-form label.control-label {
+margin-top: 15px!important;
+border : 10px dotted #C6C6C6;
+}
+#CB
+margin-left
+150em
+10px
+li.class#id
diff --git a/t/t4034/css/pre b/t/t4034/css/pre
new file mode 100644
index 000..735f301
--- /dev/null
+++ b/t/t4034/css/pre
@@ -0,0 +1,9 @@
+.class-form label.control-label {
+margin-top: 10px!important;
+border : 10px dashed #C6C6C6;
+}
+#CC
+padding-bottom
+150px
+10px!important
+div.class#id
diff --git a/userdiff.c b/userdiff.c
index 6bf2505..715a1fd 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -148,6 +148,14 @@ PATTERNS("csharp",
 "[a-zA-Z_][a-zA-Z0-9_]*"
 "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
 "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
+PATTERNS("css",
+"^([^,{}]+)((,[^}]*\\{)|([ \t]*\\{))$",
+/* -- */
+/* This regex comes from W3C CSS specs. Should theorically also allow 
ISO 10646 characters U+00A0 and higher,
+ * this not handled in this regex. */
+"-?[_a-zA-F][-_a-zA-F0-9]*" /* identifiers */
+"|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
+),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS
-- 
2.8.2.403.gdc9c9d0.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html