[PATCH v3 1/1] Use size_t instead of 'unsigned long' for data in memory
From: Torsten Bögershausen Currently the length of data which is stored in memory is stored in "unsigned long" at many places in the code base. This is OK when both "unsigned long" and size_t are 32 bits, (and is OK when both are 64 bits). On a 64 bit Windows system am "unsigned long" is 32 bit, and that may be too short to measure the size of objects in memory, a size_t is the natural choice. Improve the code base in "small steps", as small as possible. The smallest step seems to be much bigger than expected. See this code-snippet from convert.c: const char *ret; unsigned long sz; void *data = read_blob_data_from_index(istate, path, &sz); ret = gather_convert_stats_ascii(data, sz); The corrected version looks like this: const char *ret; size_t sz; void *data = read_blob_data_from_index(istate, path, &sz); ret = gather_convert_stats_ascii(data, sz); However, when the Git code base is compiled with a compiler that complains that "unsigned long" is different from size_t, we end up in this huge patch, before the code base cleanly compiles. And: there is more to be done in the zlib interface. Reviewed-by: Johannes Schindelin Signed-off-by: Torsten Bögershausen --- This is the 3rd version of the patch. - Dscho contributed with a code-review (converted 2 more unsigned long) - Thomas Braun and Philip Oakley have done more work: https://github.com/tboegi/git/pull/1 Those changes are not part of my patch series - Applying this patch on 'pu' gives 3 conflicts (blame.c, packfile.[ch]) apply.c | 78 archive-tar.c| 18 +- archive-zip.c| 2 +- archive.c| 2 +- archive.h| 2 +- bisect.c | 2 +- blame.c | 6 ++-- blame.h | 2 +- builtin/cat-file.c | 10 +++--- builtin/difftool.c | 2 +- builtin/fast-export.c| 6 ++-- builtin/fmt-merge-msg.c | 4 +-- builtin/fsck.c | 6 ++-- builtin/grep.c | 8 ++--- builtin/index-pack.c | 26 +++--- builtin/log.c| 4 +-- builtin/ls-tree.c| 2 +- builtin/merge-tree.c | 6 ++-- builtin/mktag.c | 4 +-- builtin/notes.c | 6 ++-- builtin/pack-objects.c | 70 ++-- builtin/reflog.c | 2 +- builtin/replace.c| 2 +- builtin/tag.c| 4 +-- builtin/unpack-file.c| 2 +- builtin/unpack-objects.c | 34 +- builtin/verify-commit.c | 4 +-- bundle.c | 2 +- cache.h | 11 +++--- combine-diff.c | 11 +++--- commit.c | 22 ++-- commit.h | 10 +++--- config.c | 2 +- convert.c| 16 - delta.h | 20 +-- diff-delta.c | 4 +-- diff.c | 30 diff.h | 2 +- diffcore-pickaxe.c | 4 +-- diffcore.h | 2 +- dir.c| 6 ++-- dir.h| 2 +- entry.c | 4 +-- fast-import.c| 26 +++--- fsck.c | 12 +++ fsck.h | 2 +- fuzz-pack-headers.c | 4 +-- grep.h | 2 +- http-push.c | 2 +- list-objects-filter.c| 2 +- mailmap.c| 2 +- match-trees.c| 4 +-- merge-blobs.c| 6 ++-- merge-blobs.h| 2 +- merge-recursive.c| 4 +-- notes-cache.c| 2 +- notes-merge.c| 4 +-- notes.c | 6 ++-- object-store.h | 22 ++-- object.c | 4 +-- object.h | 2 +- pack-check.c | 2 +- pack-objects.h | 14 pack.h | 2 +- packfile.c | 44 +++ packfile.h | 10 +++--- patch-delta.c| 8 ++--- range-diff.c | 2 +- read-cache.c | 48 - ref-filter.c | 16 - remote-testsvn.c | 4 +-- rerere.c | 2 +- sha1-file.c | 66 +- sha1dc_git.c | 2 +- sha1dc_git.h | 2 +- streaming.c | 12 +++ streaming.h | 2 +- submodule-config.c | 2 +- t/helper/test-delta.c| 2 +- tag.c| 6 ++-- tag.h| 2 +- tree-walk.c | 14 tree.c | 2 +- xdiff-interface.c| 4 +-- xdiff-interface.h| 4 +-- 85 files changed, 422 insertions(+), 420 deletions(-) diff --git a/apply.c b/apply.c index f15afa9f6a..7594859ce
[PATCH v2 1/1] trace2: NULL is not allowed for va_list
From: Torsten Bögershausen Some compilers don't allow NULL to be passed for a va_list, and e.g. "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out like this: trace2/tr2_tgt_event.c:193:18: error: invalid operands to binary && (have ‘int’ and ‘va_list {aka __va_list}’) if (fmt && *fmt && ap) { ^^ I couldn't find any hints that va_list and pointers can be mixed, and no hints that they can't either. Morten Welinder comments: "C99, Section 7.15, simply says that va_list "is an object type suitable for holding information needed by the macros va_start, va_end, and va_copy". So clearly not guaranteed to be mixable with pointers... The portable solution is to use "va_list" everywhere in the callchain. As a consequence, both trace2_region_enter_fl() and trace2_region_leave_fl() now take a variable argument list. Signed-off-by: Torsten Bögershausen --- trace2.c| 15 +++ trace2.h| 4 ++-- trace2/tr2_tgt_event.c | 2 +- trace2/tr2_tgt_normal.c | 2 +- trace2/tr2_tgt_perf.c | 2 +- 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/trace2.c b/trace2.c index d4ef09..8bbad56887 100644 --- a/trace2.c +++ b/trace2.c @@ -548,10 +548,14 @@ void trace2_region_enter_printf_va_fl(const char *file, int line, } void trace2_region_enter_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo) + const char *label, const struct repository *repo, ...) { + va_list ap; + va_start(ap, repo); trace2_region_enter_printf_va_fl(file, line, category, label, repo, -NULL, NULL); +NULL, ap); + va_end(ap); + } void trace2_region_enter_printf_fl(const char *file, int line, @@ -621,10 +625,13 @@ void trace2_region_leave_printf_va_fl(const char *file, int line, } void trace2_region_leave_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo) + const char *label, const struct repository *repo, ...) { + va_list ap; + va_start(ap, repo); trace2_region_leave_printf_va_fl(file, line, category, label, repo, -NULL, NULL); +NULL, ap); + va_end(ap); } void trace2_region_leave_printf_fl(const char *file, int line, diff --git a/trace2.h b/trace2.h index ae5020d0e6..b330a54a89 100644 --- a/trace2.h +++ b/trace2.h @@ -238,7 +238,7 @@ void trace2_def_repo_fl(const char *file, int line, struct repository *repo); * on this thread. */ void trace2_region_enter_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo); + const char *label, const struct repository *repo, ...); #define trace2_region_enter(category, label, repo) \ trace2_region_enter_fl(__FILE__, __LINE__, (category), (label), (repo)) @@ -278,7 +278,7 @@ void trace2_region_enter_printf(const char *category, const char *label, * in this nesting level. */ void trace2_region_leave_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo); + const char *label, const struct repository *repo, ...); #define trace2_region_leave(category, label, repo) \ trace2_region_leave_fl(__FILE__, __LINE__, (category), (label), (repo)) diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c index 107cb5317d..1cf4f62441 100644 --- a/trace2/tr2_tgt_event.c +++ b/trace2/tr2_tgt_event.c @@ -190,7 +190,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code) static void maybe_add_string_va(struct json_writer *jw, const char *field_name, const char *fmt, va_list ap) { - if (fmt && *fmt && ap) { + if (fmt && *fmt) { va_list copy_ap; struct strbuf buf = STRBUF_INIT; diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c index 547183d5b6..1a07d70abd 100644 --- a/trace2/tr2_tgt_normal.c +++ b/trace2/tr2_tgt_normal.c @@ -126,7 +126,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code) static void maybe_append_string_va(struct strbuf *buf, const char *fmt, va_list ap) { - if (fmt && *fmt && ap) { + if (fmt && *fmt) { va_list copy_ap; va_copy(copy_ap, ap); diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c index f0746fcf86..2a866d701b 100644 --- a/trace2/tr2_tgt_perf.c +++ b/trace2/tr2_tgt_perf.c @@ -211,7 +211,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code) static void maybe_append_string_va(struct strbuf *buf, const char *fmt,
[PATCH v1 1/1] trace2: NULL is not allowed for va_list
From: Torsten Bögershausen Some compilers don't allow NULL to be passed for a va_list. Use va_list instead. Signed-off-by: Torsten Bögershausen --- trace2.c| 15 +++ trace2.h| 4 ++-- trace2/tr2_tgt_event.c | 2 +- trace2/tr2_tgt_normal.c | 2 +- trace2/tr2_tgt_perf.c | 2 +- 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/trace2.c b/trace2.c index d4ef09..8bbad56887 100644 --- a/trace2.c +++ b/trace2.c @@ -548,10 +548,14 @@ void trace2_region_enter_printf_va_fl(const char *file, int line, } void trace2_region_enter_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo) + const char *label, const struct repository *repo, ...) { + va_list ap; + va_start(ap, repo); trace2_region_enter_printf_va_fl(file, line, category, label, repo, -NULL, NULL); +NULL, ap); + va_end(ap); + } void trace2_region_enter_printf_fl(const char *file, int line, @@ -621,10 +625,13 @@ void trace2_region_leave_printf_va_fl(const char *file, int line, } void trace2_region_leave_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo) + const char *label, const struct repository *repo, ...) { + va_list ap; + va_start(ap, repo); trace2_region_leave_printf_va_fl(file, line, category, label, repo, -NULL, NULL); +NULL, ap); + va_end(ap); } void trace2_region_leave_printf_fl(const char *file, int line, diff --git a/trace2.h b/trace2.h index ae5020d0e6..b330a54a89 100644 --- a/trace2.h +++ b/trace2.h @@ -238,7 +238,7 @@ void trace2_def_repo_fl(const char *file, int line, struct repository *repo); * on this thread. */ void trace2_region_enter_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo); + const char *label, const struct repository *repo, ...); #define trace2_region_enter(category, label, repo) \ trace2_region_enter_fl(__FILE__, __LINE__, (category), (label), (repo)) @@ -278,7 +278,7 @@ void trace2_region_enter_printf(const char *category, const char *label, * in this nesting level. */ void trace2_region_leave_fl(const char *file, int line, const char *category, - const char *label, const struct repository *repo); + const char *label, const struct repository *repo, ...); #define trace2_region_leave(category, label, repo) \ trace2_region_leave_fl(__FILE__, __LINE__, (category), (label), (repo)) diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c index 107cb5317d..1cf4f62441 100644 --- a/trace2/tr2_tgt_event.c +++ b/trace2/tr2_tgt_event.c @@ -190,7 +190,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code) static void maybe_add_string_va(struct json_writer *jw, const char *field_name, const char *fmt, va_list ap) { - if (fmt && *fmt && ap) { + if (fmt && *fmt) { va_list copy_ap; struct strbuf buf = STRBUF_INIT; diff --git a/trace2/tr2_tgt_normal.c b/trace2/tr2_tgt_normal.c index 547183d5b6..1a07d70abd 100644 --- a/trace2/tr2_tgt_normal.c +++ b/trace2/tr2_tgt_normal.c @@ -126,7 +126,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code) static void maybe_append_string_va(struct strbuf *buf, const char *fmt, va_list ap) { - if (fmt && *fmt && ap) { + if (fmt && *fmt) { va_list copy_ap; va_copy(copy_ap, ap); diff --git a/trace2/tr2_tgt_perf.c b/trace2/tr2_tgt_perf.c index f0746fcf86..2a866d701b 100644 --- a/trace2/tr2_tgt_perf.c +++ b/trace2/tr2_tgt_perf.c @@ -211,7 +211,7 @@ static void fn_atexit(uint64_t us_elapsed_absolute, int code) static void maybe_append_string_va(struct strbuf *buf, const char *fmt, va_list ap) { - if (fmt && *fmt && ap) { + if (fmt && *fmt) { va_list copy_ap; va_copy(copy_ap, ap); -- 2.21.0.135.g6e0cc67761
[PATCH/RFC v1 1/1] convert.c: Escape sequences only for a tty in trace_encoding()
From: Torsten Bögershausen The content of a buffer can be dumped using trace_encoding() before and after the encoding is converted. The current function trace_encoding() in convert.c tries to make the output easier to read: The byte position and the character itself are dimmed, allowing the eye to focus on the hex values in the byte stream. ANSI escape sequences are used to "dim" the display temporally, and to restore the normal brightness. When stdout is re-directed into a file, those sequences are not working as expected (but shown in the editor) which is disturbing. rather then helpful. Disable them, if stdout is not a tty. Signed-off-by: Torsten Bögershausen --- I am temped to remove the "dim" functionality all together, or to remove the printout of the values which are now dimmed, what do others think ? convert.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/convert.c b/convert.c index 5d0307fc10..70e58f1413 100644 --- a/convert.c +++ b/convert.c @@ -42,6 +42,9 @@ struct text_stat { unsigned printable, nonprintable; }; +static const char *terminal_half_bright; +static const char *terminal_reset_normal; + static void gather_stats(const char *buf, unsigned long size, struct text_stat *stats) { unsigned long i; @@ -330,14 +333,23 @@ static void trace_encoding(const char *context, const char *path, static struct trace_key coe = TRACE_KEY_INIT(WORKING_TREE_ENCODING); struct strbuf trace = STRBUF_INIT; int i; - + if (!terminal_half_bright || !terminal_reset_normal) { + if (isatty(1)) { + terminal_half_bright = "\033[2m"; + terminal_reset_normal = "\033[0m"; + } else { + terminal_half_bright = ""; + terminal_reset_normal = ""; + } + } strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding); for (i = 0; i < len && buf; ++i) { + char c = buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '; strbuf_addf( - &trace, "| \033[2m%2i:\033[0m %2x \033[2m%c\033[0m%c", - i, + &trace, "| %s%2i:%s %2x %s%c%s%c", + terminal_half_bright, i, terminal_reset_normal, (unsigned char) buf[i], - (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '), + terminal_half_bright, c, terminal_reset_normal, ((i+1) % 8 && (i+1) < len ? ' ' : '\n') ); } -- 2.21.0.135.g6e0cc67761
[PATCH v1 1/1] gitattributes.txt: fix typo
From: Yash Bhatambare `UTF-16-LE-BOM` to `UTF-16LE-BOM`. this closes https://github.com/git-for-windows/git/issues/2095 Signed-off-by: Yash Bhatambare Signed-off-by: Torsten Bögershausen --- This patch already made it into Git for Windows, so I send it upstream "as is". Documentation/gitattributes.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 9b41f81c06..bdd11a2ddd 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -346,7 +346,7 @@ automatic line ending conversion based on your platform. Use the following attributes if your '*.ps1' files are UTF-16 little endian encoded without BOM and you want Git to use Windows line endings -in the working directory (use `UTF-16-LE-BOM` instead of `UTF-16LE` if +in the working directory (use `UTF-16LE-BOM` instead of `UTF-16LE` if you want UTF-16 little endian with BOM). Please note, it is highly recommended to explicitly define the line endings with `eol` if the `working-tree-encoding` -- 2.19.1.593.gc670b1f876
[PATCH v3 1/1] Support working-tree-encoding "UTF-16LE-BOM"
From: Torsten Bögershausen Users who want UTF-16 files in the working tree set the .gitattributes like this: test.txt working-tree-encoding=UTF-16 The unicode standard itself defines 3 allowed ways how to encode UTF-16. The following 3 versions convert all back to 'g' 'i' 't' in UTF-8: a) UTF-16, without BOM, big endian: $ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 000g i t b) UTF-16, with BOM, little endian: $ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c 000g i t c) UTF-16, with BOM, big endian: $ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 000g i t Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the working tree. After a checkout, the resulting file has a BOM and is encoded in "UTF-16", in the version (c) above. This is what iconv generates, more details follow below. iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE: d) UTF-16 $ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c 000 376 377 \0 g \0 i \0 t e) UTF-16LE $ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c 000g \0 i \0 t \0 f) UTF-16BE $ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c 000 \0 g \0 i \0 t There is no way to generate version (b) from above in a Git working tree, but that is what some applications need. (All fully unicode aware applications should be able to read all 3 variants, but in practise we are not there yet). When producing UTF-16 as an output, iconv generates the big endian version with a BOM. (big endian is probably chosen for historical reasons). iconv can produce UTF-16 files with little endianess by using "UTF-16LE" as encoding, and that file does not have a BOM. Not all users (especially under Windows) are happy with this. Some tools are not fully unicode aware and can only handle version (b). Today there is no way to produce version (b) with iconv (or libiconv). Looking into the history of iconv, it seems as if version (c) will be used in all future iconv versions (for compatibility reasons). Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM". libiconv can not handle the encoding, so Git pick it up, handles the BOM and uses libiconv to convert the rest of the stream. (UTF-16BE-BOM is added for consistency) Rported-by: Adrián Gimeno Balaguer Signed-off-by: Torsten Bögershausen --- Changes since v2: Update the commit message (s/possible/allowed/) Update the documentation, as suggested by Junio: ...wonder if the following, instead of the above hunk, would work better.. Yes, it does. Documentation/gitattributes.txt | 4 ++- compat/precompose_utf8.c | 2 +- t/t0028-working-tree-encoding.sh | 12 - utf8.c | 42 utf8.h | 2 +- 5 files changed, 48 insertions(+), 14 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index b8392fc330..a2310fb920 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -344,7 +344,9 @@ automatic line ending conversion based on your platform. Use the following attributes if your '*.ps1' files are UTF-16 little endian encoded without BOM and you want Git to use Windows line endings -in the working directory. Please note, it is highly recommended to +in the working directory (use `UTF-16-LE-BOM` instead of `UTF-16LE` if +you want UTF-16 little endian with BOM). +Please note, it is highly recommended to explicitly define the line endings with `eol` if the `working-tree-encoding` attribute is used to avoid ambiguity. diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index de61c15d34..136250fbf6 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -79,7 +79,7 @@ void precompose_argv(int argc, const char **argv) size_t namelen; oldarg = argv[i]; if (has_non_ascii(oldarg, (size_t)-1, &namelen)) { - newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, NULL); + newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, 0, NULL); if (newarg) argv[i] = newarg; } diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index 7e87b5a200..e58ecbfc44 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -11,9 +11,12 @@ test_expect_success 'setup test files' ' text="hallo there!\ncan you read me?" && echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes && + echo "*.utf16lebom text working-tree-encoding=UTF-16LE-BOM" >>.gitattributes && printf "$text" >test.utf8.raw && printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw && printf "$text" | iconv -f UTF-8 -t UTF-32 >test.utf32.raw
[PATCH v2 1/1] Support working-tree-encoding "UTF-16LE-BOM"
From: Torsten Bögershausen Users who want UTF-16 files in the working tree set the .gitattributes like this: test.txt working-tree-encoding=UTF-16 The unicode standard itself defines 3 possible ways how to encode UTF-16. The following 3 versions convert all back to 'g' 'i' 't' in UTF-8: a) UTF-16, without BOM, big endian: $ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 000g i t b) UTF-16, with BOM, little endian: $ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c 000g i t c) UTF-16, with BOM, big endian: $ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c 000g i t Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the working tree. After a checkout, the resulting file has a BOM and is encoded in "UTF-16", in the version (c) above. This is what iconv generates, more details follow below. iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE: d) UTF-16 $ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c 000 376 377 \0 g \0 i \0 t e) UTF-16LE $ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c 000g \0 i \0 t \0 f) UTF-16BE $ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c 000 \0 g \0 i \0 t There is no way to generate version (b) from above in a Git working tree, but that is what some applications need. (All fully unicode aware applications should be able to read all 3 variants, but in practise we are not there yet). When producing UTF-16 as an output, iconv generates the big endian version with a BOM. (big endian is probably chosen for historical reasons). iconv can produce UTF-16 files with little endianess by using "UTF-16LE" as encoding, and that file does not have a BOM. Not all users (especially under Windows) are happy with this. Some tools are not fully unicode aware and can only handle version (b). Today there is no way to produce version (b) with iconv (or libiconv). Looking into the history of iconv, it seems as if version (c) will be used in all future iconv versions (for compatibility reasons). Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM". libiconv can not handle the encoding, so Git pick it up, handles the BOM and uses libiconv to convert the rest of the stream. Rported-by: Adrián Gimeno Balaguer Signed-off-by: Torsten Bögershausen --- I still think it makes sense to support UTF-16, little endian and with BOM in Git. This V2 should make more clear, what standards we follow, and why the naming scheme of Unicode does not cover all use cases in real world. Documentation/gitattributes.txt | 4 +-- compat/precompose_utf8.c | 2 +- t/t0028-working-tree-encoding.sh | 12 - utf8.c | 42 utf8.h | 2 +- 5 files changed, 47 insertions(+), 15 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index b8392fc330..4a88ab8be7 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -343,13 +343,13 @@ automatic line ending conversion based on your platform. Use the following attributes if your '*.ps1' files are UTF-16 little -endian encoded without BOM and you want Git to use Windows line endings +endian encoded with BOM and you want Git to use Windows line endings in the working directory. Please note, it is highly recommended to explicitly define the line endings with `eol` if the `working-tree-encoding` attribute is used to avoid ambiguity. -*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF +*.ps1 text working-tree-encoding=UTF-16LE-BOM eol=CRLF You can get a list of all available encodings on your platform with the diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index de61c15d34..136250fbf6 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -79,7 +79,7 @@ void precompose_argv(int argc, const char **argv) size_t namelen; oldarg = argv[i]; if (has_non_ascii(oldarg, (size_t)-1, &namelen)) { - newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, NULL); + newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, 0, NULL); if (newarg) argv[i] = newarg; } diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index 7e87b5a200..e58ecbfc44 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -11,9 +11,12 @@ test_expect_success 'setup test files' ' text="hallo there!\ncan you read me?" && echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes && + echo "*.utf16lebom text working-tree-encoding=UTF-16LE-BOM" >>.gitattributes && printf
[PATCH/RFC v2 1/1] test-lint: Only use only sed [-n] [-e command] [-f command_file]
From: Torsten Bögershausen From `man sed` (on a Mac OS X box): The -E, -a and -i options are non-standard FreeBSD extensions and may not be available on other operating systems. From `man sed` on a Linux box: REGULAR EXPRESSIONS POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences. The -E option switches to using extended regular expressions instead; the -E option has been supported for years by GNU sed, and is now included in POSIX. Well, there are still a lot of systems out there, which don't support it. Beside that, IEEE Std 1003.1TM-2017, see http://pubs.opengroup.org/onlinepubs/9699919799/ does not mention -E either. To be on the safe side, don't allow -E (or -r, which is GNU). Change check-non-portable-shell.pl to only accept the portable options: sed [-n] [-e command] [-f command_file] Reported-by: SZEDER Gábor Helped-by: Eric Sunshine Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: Torsten Bögershausen --- t/check-non-portable-shell.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl index b45bdac688..6c798608a9 100755 --- a/t/check-non-portable-shell.pl +++ b/t/check-non-portable-shell.pl @@ -35,7 +35,7 @@ sub err { chomp; } - /\bsed\s+-i/ and err 'sed -i is not portable'; + /\bsed\s+-[^efn]\s+/ and err 'Not portable option with sed (use only [-n] [-e command] [-f command_file])'; /\becho\s+-[neE]/ and err 'echo with option is not portable (use printf)'; /^\s*declare\s+/ and err 'arrays/declare not portable'; /^\s*[^#]\s*which\s/ and err 'which is not portable (use type)'; -- 2.20.1.2.gb21ebb671
[PATCH/RFC v1 1/1] test-lint: sed -E (or -a, -l) are not portable
From: Torsten Bögershausen From `man sed` (on a Mac OS X box): The -E, -a and -i options are non-standard FreeBSD extensions and may not be available on other operating systems. From `man sed` on a Linux box: REGULAR EXPRESSIONS POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences. The -E option switches to using extended regular expressions instead; the -E option has been supported for years by GNU sed, and is now included in POSIX. Well, there are still a lot of systems out there, which don't support it. Beside that, see IEEE Std 1003.1TM-2017 http://pubs.opengroup.org/onlinepubs/9699919799/ does not mention -E either. To be on the safe side, don't allow it. Reported-by: SZEDER Gábor Signed-off-by: Torsten Bögershausen --- I am somewhat unsure if we should disable all options except -e -f -n instead ? /\bsed\s+-[^efn]/ and err 'Not portable option with sed. Only -n -e -f are portable'; That would cause a false positive in t9001 here: "--cc-cmd=./cccmd-sed --suppress-cc=self" which could either be fixed by an anchor: /^\s*sed\s+-[^efn]/ Or by allowing '--' like this: /\bsed\s+-[^-efn]/ Any thoughts, please ? t/check-non-portable-shell.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl index b45bdac688..96b6afdeb8 100755 --- a/t/check-non-portable-shell.pl +++ b/t/check-non-portable-shell.pl @@ -35,7 +35,7 @@ sub err { chomp; } - /\bsed\s+-i/ and err 'sed -i is not portable'; + /\bsed\s+-[Eail]/ and err 'Not portable option with sed. Only -e -f -n are portable'; /\becho\s+-[neE]/ and err 'echo with option is not portable (use printf)'; /^\s*declare\s+/ and err 'arrays/declare not portable'; /^\s*[^#]\s*which\s/ and err 'which is not portable (use type)'; -- 2.20.1.2.gb21ebb671
[PATCH/RFC v1 1/1] Support working-tree-encoding "UTF-16LE-BOM"
From: Torsten Bögershausen Users who want UTF-16 files in the working tree set the .gitattributes like this: test.txt working-tree-encoding=UTF-16 After a checkout, the resulting file has a BOM and is encoded in "UTF-16". The unicode standard allows both little- and big-endianess (LE/BE) for those files, the BOM will tell which one is used inside the file. iconv seems to prefer the BE version. Not all users under Windows are happy with this when tools are not fully unicode aware and don't digest the BE version at all. Today there is no name for "UTF-16 with BOM, little endian please". Introduce "UTF-16LE-BOM". Rported-by: Adrián Gimeno Balaguer Signed-off-by: Torsten Bögershausen --- This feels like an RFC at the moment - please comment. Using UTF-16 in the way "UTF-16LE-BOM" is used in this patch could be an alternative - simply produce UTF-16 in LE version under Git - this could make people using Git happy as well. Documentation/gitattributes.txt | 4 +-- compat/precompose_utf8.c | 2 +- t/t0028-working-tree-encoding.sh | 12 - utf8.c | 42 utf8.h | 2 +- 5 files changed, 47 insertions(+), 15 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index b8392fc330..4a88ab8be7 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -343,13 +343,13 @@ automatic line ending conversion based on your platform. Use the following attributes if your '*.ps1' files are UTF-16 little -endian encoded without BOM and you want Git to use Windows line endings +endian encoded with BOM and you want Git to use Windows line endings in the working directory. Please note, it is highly recommended to explicitly define the line endings with `eol` if the `working-tree-encoding` attribute is used to avoid ambiguity. -*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF +*.ps1 text working-tree-encoding=UTF-16LE-BOM eol=CRLF You can get a list of all available encodings on your platform with the diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index de61c15d34..136250fbf6 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -79,7 +79,7 @@ void precompose_argv(int argc, const char **argv) size_t namelen; oldarg = argv[i]; if (has_non_ascii(oldarg, (size_t)-1, &namelen)) { - newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, NULL); + newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, 0, NULL); if (newarg) argv[i] = newarg; } diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index 7e87b5a200..e58ecbfc44 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -11,9 +11,12 @@ test_expect_success 'setup test files' ' text="hallo there!\ncan you read me?" && echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes && + echo "*.utf16lebom text working-tree-encoding=UTF-16LE-BOM" >>.gitattributes && printf "$text" >test.utf8.raw && printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw && printf "$text" | iconv -f UTF-8 -t UTF-32 >test.utf32.raw && + printf "\377\376" >test.utf16lebom.raw && + printf "$text" | iconv -f UTF-8 -t UTF-32LE >>test.utf16lebom.raw && # Line ending tests printf "one\ntwo\nthree\n" >lf.utf8.raw && @@ -32,7 +35,8 @@ test_expect_success 'setup test files' ' # Add only UTF-16 file, we will add the UTF-32 file later cp test.utf16.raw test.utf16 && cp test.utf32.raw test.utf32 && - git add .gitattributes test.utf16 && + cp test.utf16lebom.raw test.utf16lebom && + git add .gitattributes test.utf16 test.utf16lebom && git commit -m initial ' @@ -51,6 +55,12 @@ test_expect_success 're-encode to UTF-16 on checkout' ' test_cmp_bin test.utf16.raw test.utf16 ' +test_expect_success 're-encode to UTF-16-LE-BOM on checkout' ' + rm test.utf16lebom && + git checkout test.utf16lebom && + test_cmp_bin test.utf16lebom.raw test.utf16lebom +' + test_expect_success 'check $GIT_DIR/info/attributes support' ' test_when_finished "rm -f test.utf32.git" && test_when_finished "git reset --hard HEAD" && diff --git a/utf8.c b/utf8.c index eb78587504..83824dc2f4 100644 --- a/utf8.c +++ b/utf8.c @@ -4,6 +4,11 @@ /* This code is originally from http://www.cl.cam.ac.uk/~mgk25/ucs/ */ +static const char utf16_be_bom[] = {'\xFE', '\xFF'}; +static const char utf16_le_bom[] = {'\xFF', '\xFE'}; +static const char utf32_be_bom[] = {'\0', '\0', '\xFE', '\xFF'}; +static const cha
[PATCH v4 1/1] git clone C:\cygwin\home\USER\repo' is working (again)
From: Torsten Bögershausen A regression for cygwin users was introduced with commit 05b458c, "real_path: resolve symlinks by hand". In the the commit message we read: The current implementation of real_path uses chdir() in order to resolve symlinks. Unfortunately this isn't thread-safe as chdir() affects a process as a whole... The old (and non-thread-save) OS calls chdir()/pwd() had been replaced by a string operation. The cygwin layer "knows" that "C:\cygwin" is an absolute path, but the new string operation does not. "git clone C:\cygwin\home\USER\repo" fails like this: fatal: Invalid path '/home/USER/repo/C:\cygwin\home\USER\repo' The solution is to implement has_dos_drive_prefix(), skip_dos_drive_prefix() is_dir_sep(), offset_1st_component() and convert_slashes() for cygwin in the same way as it is done in 'Git for Windows' in compat/mingw.[ch] Extract the needed code into compat/win32/path-utils.[ch] and use it for cygwin as well. Reported-by: Steven Penny Helped-by: Johannes Schindelin Signed-off-by: Torsten Bögershausen Changes since v3: Rename e.g. mingw_skip_dos_drive_prefix() into win32_skip_dos_drive_prefix() as suggested by Dscho, thanls for that. Add a tweak in t5601 for cygwin. The test suite passes now on cygwin. The "Git for Windows" build was tested was tested on the gfw/master, with this commit cherry-picked on top. --- compat/cygwin.c | 19 --- compat/cygwin.h | 2 -- compat/mingw.c| 29 + compat/mingw.h| 20 compat/win32/path-utils.c | 28 compat/win32/path-utils.h | 20 config.mak.uname | 3 ++- git-compat-util.h | 3 ++- t/t5601-clone.sh | 2 +- 9 files changed, 54 insertions(+), 72 deletions(-) delete mode 100644 compat/cygwin.c delete mode 100644 compat/cygwin.h create mode 100644 compat/win32/path-utils.c create mode 100644 compat/win32/path-utils.h diff --git a/compat/cygwin.c b/compat/cygwin.c deleted file mode 100644 index b9862d606d..00 --- a/compat/cygwin.c +++ /dev/null @@ -1,19 +0,0 @@ -#include "../git-compat-util.h" -#include "../cache.h" - -int cygwin_offset_1st_component(const char *path) -{ - const char *pos = path; - /* unc paths */ - if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { - /* skip server name */ - pos = strchr(pos + 2, '/'); - if (!pos) - return 0; /* Error: malformed unc path */ - - do { - pos++; - } while (*pos && pos[0] != '/'); - } - return pos + is_dir_sep(*pos) - path; -} diff --git a/compat/cygwin.h b/compat/cygwin.h deleted file mode 100644 index 8e52de4644..00 --- a/compat/cygwin.h +++ /dev/null @@ -1,2 +0,0 @@ -int cygwin_offset_1st_component(const char *path); -#define offset_1st_component cygwin_offset_1st_component diff --git a/compat/mingw.c b/compat/mingw.c index 34b3880b29..b459e1a291 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -350,7 +350,7 @@ static inline int needs_hiding(const char *path) return 0; /* We cannot use basename(), as it would remove trailing slashes */ - mingw_skip_dos_drive_prefix((char **)&path); + win32_skip_dos_drive_prefix((char **)&path); if (!*path) return 0; @@ -2275,33 +2275,6 @@ pid_t waitpid(pid_t pid, int *status, int options) return -1; } -int mingw_skip_dos_drive_prefix(char **path) -{ - int ret = has_dos_drive_prefix(*path); - *path += ret; - return ret; -} - -int mingw_offset_1st_component(const char *path) -{ - char *pos = (char *)path; - - /* unc paths */ - if (!skip_dos_drive_prefix(&pos) && - is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { - /* skip server name */ - pos = strpbrk(pos + 2, "\\/"); - if (!pos) - return 0; /* Error: malformed unc path */ - - do { - pos++; - } while (*pos && !is_dir_sep(*pos)); - } - - return pos + is_dir_sep(*pos) - path; -} - int xutftowcsn(wchar_t *wcs, const char *utfs, size_t wcslen, int utflen) { int upos = 0, wpos = 0; diff --git a/compat/mingw.h b/compat/mingw.h index 8c24ddaa3e..30d9fb3e36 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -443,32 +443,12 @@ HANDLE winansi_get_osfhandle(int fd); * git specific compatibility */ -#define has_dos_drive_prefix(path) \ - (isalpha(*(path)) && (path)[1] == ':' ? 2 : 0) -int mingw_skip_dos_drive_prefix(char **path); -#define skip_dos_drive_prefix mingw_skip_dos_drive_prefix -static inline int mingw_is_dir_sep(int c) -{ - return c == '/' || c == '\\'; -} -#define is_dir_sep mingw_is_dir_sep -static inline char *mingw_find_last_dir_sep(co
[PATCH v3 1/1] git clone C:\cygwin\home\USER\repo' is working (again)
From: Torsten Bögershausen A regression for cygwin users was introduced with commit 05b458c, "real_path: resolve symlinks by hand". In the the commit message we read: The current implementation of real_path uses chdir() in order to resolve symlinks. Unfortunately this isn't thread-safe as chdir() affects a process as a whole... The old (and non-thread-save) OS calls chdir()/pwd() had been replaced by a string operation. The cygwin layer "knows" that "C:\cygwin" is an absolute path, but the new string operation does not. "git clone C:\cygwin\home\USER\repo" fails like this: fatal: Invalid path '/home/USER/repo/C:\cygwin\home\USER\repo' The solution is to implement has_dos_drive_prefix(), skip_dos_drive_prefix() is_dir_sep(), offset_1st_component() and convert_slashes() for cygwin in the same way as it is done in 'Git for Windows' in compat/mingw.[ch] Extract the needed code into compat/win32/path-utils.[ch] and use it for cygwin as well. Reported-by: Steven Penny Helped-by: Johannes Schindelin Signed-off-by: Torsten Bögershausen --- Changes since V2: - Settled on a better name: The common code is in compat/win32/path-utils.c/h - Skip the 2 patches which "only" do a cleanup (for a moment) put those cleanups onto the "todo stack". - The "DOS" moniker is still used for 2 reasons: Windows inherited the "drive letter" concept from DOS, and everybody (tm) familar with the code and the path handling in Git is used to that wording. Even if there was a better name, it needed to be addressed in a patch series different from this one. Here I want to fix a reported regression. And, before any cleanup is done, I sould like to ask if anybody can build the code with VS and confirm that it works, please ? Thanks for the reviews, testing and comment. compat/cygwin.c | 19 --- compat/cygwin.h | 2 -- compat/mingw.c| 29 + compat/mingw.h| 20 compat/win32/path-utils.c | 28 compat/win32/path-utils.h | 20 config.mak.uname | 3 ++- git-compat-util.h | 3 ++- 8 files changed, 53 insertions(+), 71 deletions(-) delete mode 100644 compat/cygwin.c delete mode 100644 compat/cygwin.h create mode 100644 compat/win32/path-utils.c create mode 100644 compat/win32/path-utils.h diff --git a/compat/cygwin.c b/compat/cygwin.c deleted file mode 100644 index b9862d606d..00 --- a/compat/cygwin.c +++ /dev/null @@ -1,19 +0,0 @@ -#include "../git-compat-util.h" -#include "../cache.h" - -int cygwin_offset_1st_component(const char *path) -{ - const char *pos = path; - /* unc paths */ - if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { - /* skip server name */ - pos = strchr(pos + 2, '/'); - if (!pos) - return 0; /* Error: malformed unc path */ - - do { - pos++; - } while (*pos && pos[0] != '/'); - } - return pos + is_dir_sep(*pos) - path; -} diff --git a/compat/cygwin.h b/compat/cygwin.h deleted file mode 100644 index 8e52de4644..00 --- a/compat/cygwin.h +++ /dev/null @@ -1,2 +0,0 @@ -int cygwin_offset_1st_component(const char *path); -#define offset_1st_component cygwin_offset_1st_component diff --git a/compat/mingw.c b/compat/mingw.c index 34b3880b29..27e397f268 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -350,7 +350,7 @@ static inline int needs_hiding(const char *path) return 0; /* We cannot use basename(), as it would remove trailing slashes */ - mingw_skip_dos_drive_prefix((char **)&path); + win_path_utils_skip_dos_drive_prefix((char **)&path); if (!*path) return 0; @@ -2275,33 +2275,6 @@ pid_t waitpid(pid_t pid, int *status, int options) return -1; } -int mingw_skip_dos_drive_prefix(char **path) -{ - int ret = has_dos_drive_prefix(*path); - *path += ret; - return ret; -} - -int mingw_offset_1st_component(const char *path) -{ - char *pos = (char *)path; - - /* unc paths */ - if (!skip_dos_drive_prefix(&pos) && - is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { - /* skip server name */ - pos = strpbrk(pos + 2, "\\/"); - if (!pos) - return 0; /* Error: malformed unc path */ - - do { - pos++; - } while (*pos && !is_dir_sep(*pos)); - } - - return pos + is_dir_sep(*pos) - path; -} - int xutftowcsn(wchar_t *wcs, const char *utfs, size_t wcslen, int utflen) { int upos = 0, wpos = 0; diff --git a/compat/mingw.h b/compat/mingw.h index 8c24ddaa3e..30d9fb3e36 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -443,32 +443,12 @@ HANDLE winansi_get_osfhandle(int fd); * git specific compatibility */ -#
[PATCH v2 1/3] git clone C:\cygwin\home\USER\repo' is working (again)
From: Torsten Bögershausen A regression for cygwin users was introduced with commit 05b458c, "real_path: resolve symlinks by hand". In the the commit message we read: The current implementation of real_path uses chdir() in order to resolve symlinks. Unfortunately this isn't thread-safe as chdir() affects a process as a whole... The old (and non-thread-save) OS calls chdir()/pwd() had been replaced by a string operation. The cygwin layer "knows" that "C:\cygwin" is an absolute path, but the new string operation does not. "git clone C:\cygwin\home\USER\repo" fails like this: fatal: Invalid path '/home/USER/repo/C:\cygwin\home\USER\repo' The solution is to implement has_dos_drive_prefix(), skip_dos_drive_prefix() is_dir_sep(), offset_1st_component() and convert_slashes() for cygwin in the same way as it is done in 'Git for Windows' in compat/mingw.[ch] Instead of duplicating the code, it is extracted into compat/mingw-cygwin.[ch] Some need for refactoring and cleanup came up in the review, they are adressed in a seperate commit. Reported-By: Steven Penny Signed-off-by: Torsten Bögershausen --- compat/cygwin.c | 19 --- compat/cygwin.h | 2 -- compat/mingw-cygwin.c | 28 compat/mingw-cygwin.h | 20 compat/mingw.c| 29 + compat/mingw.h| 20 config.mak.uname | 4 ++-- git-compat-util.h | 3 ++- 8 files changed, 53 insertions(+), 72 deletions(-) delete mode 100644 compat/cygwin.c delete mode 100644 compat/cygwin.h create mode 100644 compat/mingw-cygwin.c create mode 100644 compat/mingw-cygwin.h diff --git a/compat/cygwin.c b/compat/cygwin.c deleted file mode 100644 index b9862d606d..00 --- a/compat/cygwin.c +++ /dev/null @@ -1,19 +0,0 @@ -#include "../git-compat-util.h" -#include "../cache.h" - -int cygwin_offset_1st_component(const char *path) -{ - const char *pos = path; - /* unc paths */ - if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { - /* skip server name */ - pos = strchr(pos + 2, '/'); - if (!pos) - return 0; /* Error: malformed unc path */ - - do { - pos++; - } while (*pos && pos[0] != '/'); - } - return pos + is_dir_sep(*pos) - path; -} diff --git a/compat/cygwin.h b/compat/cygwin.h deleted file mode 100644 index 8e52de4644..00 --- a/compat/cygwin.h +++ /dev/null @@ -1,2 +0,0 @@ -int cygwin_offset_1st_component(const char *path); -#define offset_1st_component cygwin_offset_1st_component diff --git a/compat/mingw-cygwin.c b/compat/mingw-cygwin.c new file mode 100644 index 00..c63d7acb9c --- /dev/null +++ b/compat/mingw-cygwin.c @@ -0,0 +1,28 @@ +#include "../git-compat-util.h" + +int mingw_cygwin_skip_dos_drive_prefix(char **path) +{ + int ret = has_dos_drive_prefix(*path); + *path += ret; + return ret; +} + +int mingw_cygwin_offset_1st_component(const char *path) +{ + char *pos = (char *)path; + + /* unc paths */ + if (!skip_dos_drive_prefix(&pos) && + is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { + /* skip server name */ + pos = strpbrk(pos + 2, "\\/"); + if (!pos) + return 0; /* Error: malformed unc path */ + + do { + pos++; + } while (*pos && !is_dir_sep(*pos)); + } + + return pos + is_dir_sep(*pos) - path; +} diff --git a/compat/mingw-cygwin.h b/compat/mingw-cygwin.h new file mode 100644 index 00..66ccc909ae --- /dev/null +++ b/compat/mingw-cygwin.h @@ -0,0 +1,20 @@ +#define has_dos_drive_prefix(path) \ + (isalpha(*(path)) && (path)[1] == ':' ? 2 : 0) +int mingw_cygwin_skip_dos_drive_prefix(char **path); +#define skip_dos_drive_prefix mingw_cygwin_skip_dos_drive_prefix +static inline int mingw_cygwin_is_dir_sep(int c) +{ + return c == '/' || c == '\\'; +} +#define is_dir_sep mingw_cygwin_is_dir_sep +static inline char *mingw_cygwin_find_last_dir_sep(const char *path) +{ + char *ret = NULL; + for (; *path; ++path) + if (is_dir_sep(*path)) + ret = (char *)path; + return ret; +} +#define find_last_dir_sep mingw_cygwin_find_last_dir_sep +int mingw_cygwin_offset_1st_component(const char *path); +#define offset_1st_component mingw_cygwin_offset_1st_component diff --git a/compat/mingw.c b/compat/mingw.c index 34b3880b29..038e96af9d 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -350,7 +350,7 @@ static inline int needs_hiding(const char *path) return 0; /* We cannot use basename(), as it would remove trailing slashes */ - mingw_skip_dos_drive_prefix((char **)&path); + mingw_cygwin_skip_dos_drive_prefix((char **)&path); if (!*path) return 0;
[PATCH v2 3/3] Refactor mingw_cygwin_offset_1st_component()
From: Torsten Bögershausen The Windows version of offset_1st_component() needs to hande 3 cases: - The path is an UNC path, starting with "//" or "". Skip the servername and the name of the share. - The path is a DOS drive, starting with e.g. "X:" The driver letter and the ':' must be skipped - The path is pointing to a subdirectory somewhere in the path and the directory seperator needs to be skipped ('/' or '\\'). Refactor the code to make it easier to read. Suggested-by: Johannes Schindelin Signed-off-by: Torsten Bögershausen --- compat/mingw-cygwin.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/compat/mingw-cygwin.c b/compat/mingw-cygwin.c index 5552c3ac20..c379a72775 100644 --- a/compat/mingw-cygwin.c +++ b/compat/mingw-cygwin.c @@ -10,10 +10,8 @@ size_t mingw_cygwin_skip_dos_drive_prefix(char **path) size_t mingw_cygwin_offset_1st_component(const char *path) { char *pos = (char *)path; - - /* unc paths */ - if (!skip_dos_drive_prefix(&pos) && - is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { + if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { + /* unc path */ /* skip server name */ pos = strpbrk(pos + 2, "\\/"); if (!pos) @@ -22,7 +20,8 @@ size_t mingw_cygwin_offset_1st_component(const char *path) do { pos++; } while (*pos && !is_dir_sep(*pos)); + } else { + skip_dos_drive_prefix(&pos); } - return pos + is_dir_sep(*pos) - path; } -- 2.19.0.271.gfe8321ec05
[PATCH v2 2/3] offset_1st_component(), dos_drive_prefix() return size_t
From: Torsten Bögershausen Change the return value for offset_1st_component(), has_dos_drive_prefix() and skip_dos_drive_prefix from int into size_t, which is the natural type for length of data in memory. While at it, remove possible "parameter not used" warnings in for the non-Windows builds in git-compat-util.h Signed-off-by: Torsten Bögershausen --- abspath.c | 2 +- compat/mingw-cygwin.c | 6 +++--- compat/mingw-cygwin.h | 4 ++-- git-compat-util.h | 8 +--- setup.c | 4 ++-- 5 files changed, 13 insertions(+), 11 deletions(-) diff --git a/abspath.c b/abspath.c index 9857985329..12055a1d8f 100644 --- a/abspath.c +++ b/abspath.c @@ -51,7 +51,7 @@ static void get_next_component(struct strbuf *next, struct strbuf *remaining) /* copies root part from remaining to resolved, canonicalizing it on the way */ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining) { - int offset = offset_1st_component(remaining->buf); + size_t offset = offset_1st_component(remaining->buf); strbuf_reset(resolved); strbuf_add(resolved, remaining->buf, offset); diff --git a/compat/mingw-cygwin.c b/compat/mingw-cygwin.c index c63d7acb9c..5552c3ac20 100644 --- a/compat/mingw-cygwin.c +++ b/compat/mingw-cygwin.c @@ -1,13 +1,13 @@ #include "../git-compat-util.h" -int mingw_cygwin_skip_dos_drive_prefix(char **path) +size_t mingw_cygwin_skip_dos_drive_prefix(char **path) { - int ret = has_dos_drive_prefix(*path); + size_t ret = has_dos_drive_prefix(*path); *path += ret; return ret; } -int mingw_cygwin_offset_1st_component(const char *path) +size_t mingw_cygwin_offset_1st_component(const char *path) { char *pos = (char *)path; diff --git a/compat/mingw-cygwin.h b/compat/mingw-cygwin.h index 66ccc909ae..0e8a0c9074 100644 --- a/compat/mingw-cygwin.h +++ b/compat/mingw-cygwin.h @@ -1,6 +1,6 @@ #define has_dos_drive_prefix(path) \ (isalpha(*(path)) && (path)[1] == ':' ? 2 : 0) -int mingw_cygwin_skip_dos_drive_prefix(char **path); +size_t mingw_cygwin_skip_dos_drive_prefix(char **path); #define skip_dos_drive_prefix mingw_cygwin_skip_dos_drive_prefix static inline int mingw_cygwin_is_dir_sep(int c) { @@ -16,5 +16,5 @@ static inline char *mingw_cygwin_find_last_dir_sep(const char *path) return ret; } #define find_last_dir_sep mingw_cygwin_find_last_dir_sep -int mingw_cygwin_offset_1st_component(const char *path); +size_t mingw_cygwin_offset_1st_component(const char *path); #define offset_1st_component mingw_cygwin_offset_1st_component diff --git a/git-compat-util.h b/git-compat-util.h index 7ece969b22..65eaaf0d50 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -355,16 +355,18 @@ static inline int noop_core_config(const char *var, const char *value, void *cb) #endif #ifndef has_dos_drive_prefix -static inline int git_has_dos_drive_prefix(const char *path) +static inline size_t git_has_dos_drive_prefix(const char *path) { + (void)path; return 0; } #define has_dos_drive_prefix git_has_dos_drive_prefix #endif #ifndef skip_dos_drive_prefix -static inline int git_skip_dos_drive_prefix(char **path) +static inline size_t git_skip_dos_drive_prefix(char **path) { + (void)path; return 0; } #define skip_dos_drive_prefix git_skip_dos_drive_prefix @@ -379,7 +381,7 @@ static inline int git_is_dir_sep(int c) #endif #ifndef offset_1st_component -static inline int git_offset_1st_component(const char *path) +static inline size_t git_offset_1st_component(const char *path) { return is_dir_sep(path[0]); } diff --git a/setup.c b/setup.c index 1be5037f12..538bc1ff99 100644 --- a/setup.c +++ b/setup.c @@ -29,7 +29,7 @@ static int abspath_part_inside_repo(char *path) size_t len; size_t wtlen; char *path0; - int off; + size_t off; const char *work_tree = get_git_work_tree(); if (!work_tree) @@ -800,7 +800,7 @@ static const char *setup_bare_git_dir(struct strbuf *cwd, int offset, struct repository_format *repo_fmt, int *nongit_ok) { - int root_len; + size_t root_len; if (check_repository_format_gently(".", repo_fmt, nongit_ok)) return NULL; -- 2.19.0.271.gfe8321ec05
[PATCH v1/RFC 1/1] 'git clone C:\cygwin\home\USER\repo' is working (again)
From: Torsten Bögershausen A regression for cygwin users was introduced with commit 05b458c, "real_path: resolve symlinks by hand". In the the commit message we read: The current implementation of real_path uses chdir() in order to resolve symlinks. Unfortunately this isn't thread-safe as chdir() affects a process as a whole... The old (and non-thread-save) OS calls chdir()/pwd() had been replaced by a string operation. The cygwin layer "knows" that "C:\cygwin" is an absolute path, but the new string operation does not. "git clone C:\cygwin\home\USER\repo" fails like this: fatal: Invalid path '/home/USER/repo/C:\cygwin\home\USER\repo' The solution is to implement has_dos_drive_prefix(), skip_dos_drive_prefix() is_dir_sep(), offset_1st_component() and convert_slashes() for cygwin in the same way as it is done in 'Git for Windows' in compat/mingw.[ch] Reported-By: Steven Penny Signed-off-by: Torsten Bögershausen --- This is the first vesion of a patch. Is there a chance that you test it ? abspath.c | 2 +- compat/cygwin.c | 18 ++ compat/cygwin.h | 32 3 files changed, 47 insertions(+), 5 deletions(-) diff --git a/abspath.c b/abspath.c index 9857985329..77a281f789 100644 --- a/abspath.c +++ b/abspath.c @@ -55,7 +55,7 @@ static void get_root_part(struct strbuf *resolved, struct strbuf *remaining) strbuf_reset(resolved); strbuf_add(resolved, remaining->buf, offset); -#ifdef GIT_WINDOWS_NATIVE +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__) convert_slashes(resolved->buf); #endif strbuf_remove(remaining, 0, offset); diff --git a/compat/cygwin.c b/compat/cygwin.c index b9862d606d..c4a10cb5a1 100644 --- a/compat/cygwin.c +++ b/compat/cygwin.c @@ -1,19 +1,29 @@ #include "../git-compat-util.h" #include "../cache.h" +int cygwin_skip_dos_drive_prefix(char **path) +{ + int ret = has_dos_drive_prefix(*path); + *path += ret; + return ret; +} + int cygwin_offset_1st_component(const char *path) { - const char *pos = path; + char *pos = (char *)path; + /* unc paths */ - if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { + if (!skip_dos_drive_prefix(&pos) && + is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { /* skip server name */ - pos = strchr(pos + 2, '/'); + pos = strpbrk(pos + 2, "\\/"); if (!pos) return 0; /* Error: malformed unc path */ do { pos++; - } while (*pos && pos[0] != '/'); + } while (*pos && !is_dir_sep(*pos)); } + return pos + is_dir_sep(*pos) - path; } diff --git a/compat/cygwin.h b/compat/cygwin.h index 8e52de4644..46f29c0a90 100644 --- a/compat/cygwin.h +++ b/compat/cygwin.h @@ -1,2 +1,34 @@ +#define has_dos_drive_prefix(path) \ + (isalpha(*(path)) && (path)[1] == ':' ? 2 : 0) + + +int cygwin_offset_1st_component(const char *path); +#define offset_1st_component cygwin_offset_1st_component + + +#define has_dos_drive_prefix(path) \ + (isalpha(*(path)) && (path)[1] == ':' ? 2 : 0) +int cygwin_skip_dos_drive_prefix(char **path); +#define skip_dos_drive_prefix cygwin_skip_dos_drive_prefix +static inline int cygwin_is_dir_sep(int c) +{ + return c == '/' || c == '\\'; +} +#define is_dir_sep cygwin_is_dir_sep +static inline char *cygwin_find_last_dir_sep(const char *path) +{ + char *ret = NULL; + for (; *path; ++path) + if (is_dir_sep(*path)) + ret = (char *)path; + return ret; +} +static inline void convert_slashes(char *path) +{ + for (; *path; path++) + if (*path == '\\') + *path = '/'; +} +#define find_last_dir_sep cygwin_find_last_dir_sep int cygwin_offset_1st_component(const char *path); #define offset_1st_component cygwin_offset_1st_component -- 2.19.0.271.gfe8321ec05
[PATCH v1 1/1] t5601-99: Enable colliding file detection for MINGW
From: Torsten Bögershausen Commit b878579ae7 (clone: report duplicate entries on case-insensitive filesystems - 2018-08-17) adds a warning to user when cloning a repo with case-sensitive file names on a case-insensitive file system. This test has never been enabled for MINGW. It had been working since day 1, but I forget to report that to the author. Enable it after a re-test. Signed-off-by: Torsten Bögershausen --- The other day, I wanted to test Duys patch - under MINGW - to see if the problem is catch(ed) but hehe git am failed to apply - not a big desaster, because is is already in master Here is a follow-up, end we can end the match t/t5601-clone.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index c28d51bd59..8bbc7068ac 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' ' ) ' -test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' ' +test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' ' grep X icasefs/warning && grep x icasefs/warning && test_i18ngrep "the following paths have collided" icasefs/warning -- 2.19.0.271.gfe8321ec05
[PATCH v2 1/1] Use size_t instead of 'unsigned long' for data in memory
From: Torsten Bögershausen Currently the length of data which is stored in memory is stored in "unsigned long" at many places in the code base. This is OK when both "unsigned long" and size_t are 32 bits, (and is OK when both are 64 bits). On a 64 bit Windows system am "unsigned long" is 32 bit, and that may be too short to measure the size of objects in memory, a size_t is the natural choice. Improve the code base in "small steps", as small as possible. The smallest step seems to be much bigger than expected. See this code-snippet from convert.c: const char *ret; unsigned long sz; void *data = read_blob_data_from_index(istate, path, &sz); ret = gather_convert_stats_ascii(data, sz); The corrected version looks like this: const char *ret; size_t sz; void *data = read_blob_data_from_index(istate, path, &sz); ret = gather_convert_stats_ascii(data, sz); However, when the Git code base is compiled with a compiler that complains that "unsigned long" is different from size_t, we end up in this huge patch, before the code base cleanly compiles. Signed-off-by: Torsten Bögershausen --- Thanks for all the comments on V1. Changes since V1: - Make the motivation somewhat clearer in the commit message - Rebase to the November 19 pu What we really need for this patch to fly are this branches: mk/use-size-t-in-zlib tb/print-size-t-with-uintmax-format And then it is rebased on top of all cooking stuff, too many branches to be mentioned here. It may be usefull to examine all "unsigned long" which are left after this patch and turn them into (what ? unsigned int? size_t? uint32_t ?). And once they are settled, re-do this patch with help of a coccinelle script. I don't know. I probably will rebase it until Junio says stop or someone else comes with a better solution. apply.c | 14 - archive-tar.c| 18 +-- archive-zip.c| 2 +- archive.c| 2 +- archive.h| 2 +- bisect.c | 2 +- blame.c | 6 ++-- blame.h | 2 +- builtin/cat-file.c | 10 +++--- builtin/difftool.c | 2 +- builtin/fast-export.c| 6 ++-- builtin/fmt-merge-msg.c | 3 +- builtin/fsck.c | 6 ++-- builtin/grep.c | 8 ++--- builtin/index-pack.c | 27 builtin/log.c| 4 +-- builtin/ls-tree.c| 2 +- builtin/merge-tree.c | 6 ++-- builtin/mktag.c | 4 +-- builtin/notes.c | 6 ++-- builtin/pack-objects.c | 56 +- builtin/reflog.c | 2 +- builtin/replace.c| 2 +- builtin/tag.c| 4 +-- builtin/unpack-file.c| 2 +- builtin/unpack-objects.c | 35 ++--- builtin/verify-commit.c | 4 +-- bundle.c | 2 +- cache.h | 10 +++--- combine-diff.c | 11 --- commit.c | 22 +++--- commit.h | 10 +++--- config.c | 2 +- convert.c| 18 +-- delta.h | 20 ++-- diff-delta.c | 4 +-- diff.c | 30 +- diff.h | 2 +- diffcore-pickaxe.c | 4 +-- diffcore.h | 2 +- dir.c| 6 ++-- dir.h| 2 +- entry.c | 4 +-- fast-import.c| 26 fsck.c | 12 fsck.h | 2 +- fuzz-pack-headers.c | 4 +-- grep.h | 2 +- http-push.c | 2 +- list-objects-filter.c| 2 +- mailmap.c| 2 +- match-trees.c| 4 +-- merge-blobs.c| 6 ++-- merge-blobs.h| 2 +- merge-recursive.c| 4 +-- notes-cache.c| 2 +- notes-merge.c| 4 +-- notes.c | 6 ++-- object-store.h | 20 ++-- object.c | 4 +-- object.h | 2 +- pack-check.c | 2 +- pack-objects.h | 14 - pack.h | 2 +- packfile.c | 40 packfile.h | 8 ++--- patch-delta.c| 8 ++--- range-diff.c | 2 +- read-cache.c | 48 ++--- ref-filter.c | 30 +- remote-testsvn.c | 4 +-- rerere.c | 2 +- sha1-file.c | 66 sha1dc_git.c | 2 +- sha1dc_git.h | 2 +- streaming.c | 12 streaming.h | 2 +- submodule-config.c | 2 +- t/helper/test-delta.c| 2 +- tag.c| 6 ++-- tag.h| 2 +- tree-walk.c | 14 - tree.c
[PATCH/RFC v2 1/1] Use size_t instead of 'unsigned long' for data in memory
From: Torsten Bögershausen Currently the length of data which is stored in memory is stored in "unsigned long" at many places in the code base. This is OK when both "unsigned long" and size_t are 32 bits, (and is OK when both are 64 bits). On a 64 bit Windows system am "unsigned long" is 32 bit, and that may be too short to measure the size of objects in memory, a size_t is the natural choice. Improve the code base in "small steps", as small as possible. The smallest step seems to be much bigger than expected. See this code-snippet from convert.c: const char *ret; unsigned long sz; void *data = read_blob_data_from_index(istate, path, &sz); ret = gather_convert_stats_ascii(data, sz); The corrected version looks like this: const char *ret; size_t sz; void *data = read_blob_data_from_index(istate, path, &sz); ret = gather_convert_stats_ascii(data, sz); However, when the Git code base is compiled with a compiler that complains that "unsigned long" is different from size_t, we end up in this huge patch, before the code base cleanly compiles. Signed-off-by: Torsten Bögershausen --- Thanks for all the comments on V1. Changes since V1: - Make the motivation somewhat clearer in the commit message - Rebase to the November 19 pu What we really need for this patch to fly are this branches: mk/use-size-t-in-zlib tb/print-size-t-with-uintmax-format And then it is rebased on top of all cooking stuff, too many branches to be mentioned here. It may be usefull to examine all "unsigned long" which are left after this patch and turn them into (what ? unsigned int? size_t? uint32_t ?). And once they are settled, re-do this patch with help of a coccinelle script. I don't know. I probably will rebase it until Junio says stop or someone else comes with a better solution. apply.c | 14 - archive-tar.c| 18 +-- archive-zip.c| 2 +- archive.c| 2 +- archive.h| 2 +- bisect.c | 2 +- blame.c | 6 ++-- blame.h | 2 +- builtin/cat-file.c | 10 +++--- builtin/difftool.c | 2 +- builtin/fast-export.c| 6 ++-- builtin/fmt-merge-msg.c | 3 +- builtin/fsck.c | 6 ++-- builtin/grep.c | 8 ++--- builtin/index-pack.c | 27 builtin/log.c| 4 +-- builtin/ls-tree.c| 2 +- builtin/merge-tree.c | 6 ++-- builtin/mktag.c | 4 +-- builtin/notes.c | 6 ++-- builtin/pack-objects.c | 56 +- builtin/reflog.c | 2 +- builtin/replace.c| 2 +- builtin/tag.c| 4 +-- builtin/unpack-file.c| 2 +- builtin/unpack-objects.c | 35 ++--- builtin/verify-commit.c | 4 +-- bundle.c | 2 +- cache.h | 10 +++--- combine-diff.c | 11 --- commit.c | 22 +++--- commit.h | 10 +++--- config.c | 2 +- convert.c| 18 +-- delta.h | 20 ++-- diff-delta.c | 4 +-- diff.c | 30 +- diff.h | 2 +- diffcore-pickaxe.c | 4 +-- diffcore.h | 2 +- dir.c| 6 ++-- dir.h| 2 +- entry.c | 4 +-- fast-import.c| 26 fsck.c | 12 fsck.h | 2 +- fuzz-pack-headers.c | 4 +-- grep.h | 2 +- http-push.c | 2 +- list-objects-filter.c| 2 +- mailmap.c| 2 +- match-trees.c| 4 +-- merge-blobs.c| 6 ++-- merge-blobs.h| 2 +- merge-recursive.c| 4 +-- notes-cache.c| 2 +- notes-merge.c| 4 +-- notes.c | 6 ++-- object-store.h | 20 ++-- object.c | 4 +-- object.h | 2 +- pack-check.c | 2 +- pack-objects.h | 14 - pack.h | 2 +- packfile.c | 40 packfile.h | 8 ++--- patch-delta.c| 8 ++--- range-diff.c | 2 +- read-cache.c | 48 ++--- ref-filter.c | 30 +- remote-testsvn.c | 4 +-- rerere.c | 2 +- sha1-file.c | 66 sha1dc_git.c | 2 +- sha1dc_git.h | 2 +- streaming.c | 12 streaming.h | 2 +- submodule-config.c | 2 +- t/helper/test-delta.c| 2 +- tag.c| 6 ++-- tag.h| 2 +- tree-walk.c | 14 - tree.c
[PATCH/RFC v1 1/1] Use size_t instead of unsigned long
From: Torsten Bögershausen Currently Git users can not commit files >4Gib under 64 bit Windows, where "long" is 32 bit but size_t is 64 bit. Improve the code base in small steps, as small as possible. What started with a small patch to replace "unsigned long" with size_t in one file (convert.c) ended up with a change in many files. Signed-off-by: Torsten Bögershausen --- This needs to go on top of pu, to cover all the good stuff cooking here. I have started this series on November 1st, since that 2 or 3 rebases had been done to catch up, and now it is on pu from November 15. I couldn't find a reason why changing "unsigned ling" into "size_t" may break anything, any thoughts, please ? Side question: One thing I wondered about is why Git creates a conflict like this, using git cherry-pick: <<< HEAD unsigned long size; void *data = read_object_file(oid, &type, &size); === size_t size; void *data = repo_read_object_file(the_repository, oid, &type, &size); >>> 3ee0abef4c... Use size_t instead of unsigned long One commit changed "unsigned long size" into "size_t size", the other commit swapped repo_read_object_file() with read_object_file(). Both changed are on different lines, but Git sees a conflict here. apply.c | 14 - archive-tar.c| 18 +-- archive-zip.c| 2 +- archive.c| 2 +- archive.h| 2 +- bisect.c | 2 +- blame.c | 6 ++-- blame.h | 2 +- builtin/cat-file.c | 10 +++--- builtin/difftool.c | 3 +- builtin/fast-export.c| 6 ++-- builtin/fmt-merge-msg.c | 4 ++- builtin/fsck.c | 6 ++-- builtin/grep.c | 8 ++--- builtin/index-pack.c | 27 builtin/log.c| 4 +-- builtin/ls-tree.c| 2 +- builtin/merge-tree.c | 6 ++-- builtin/mktag.c | 5 +-- builtin/notes.c | 6 ++-- builtin/pack-objects.c | 56 +- builtin/reflog.c | 2 +- builtin/replace.c| 2 +- builtin/tag.c| 4 +-- builtin/unpack-file.c| 2 +- builtin/unpack-objects.c | 35 ++--- builtin/verify-commit.c | 4 +-- bundle.c | 2 +- cache.h | 10 +++--- combine-diff.c | 11 --- commit.c | 22 +++--- commit.h | 10 +++--- config.c | 2 +- convert.c| 18 +-- delta.h | 20 ++-- diff-delta.c | 4 +-- diff.c | 30 +- diff.h | 2 +- diffcore-pickaxe.c | 4 +-- diffcore.h | 2 +- dir.c| 6 ++-- dir.h| 2 +- entry.c | 4 +-- fast-import.c| 26 fsck.c | 12 fsck.h | 2 +- fuzz-pack-headers.c | 4 +-- grep.h | 2 +- http-push.c | 2 +- list-objects-filter.c| 2 +- mailmap.c| 2 +- match-trees.c| 4 +-- merge-blobs.c| 6 ++-- merge-blobs.h| 2 +- merge-recursive.c| 4 +-- notes-cache.c| 2 +- notes-merge.c| 4 +-- notes.c | 6 ++-- object-store.h | 20 ++-- object.c | 4 +-- object.h | 2 +- pack-check.c | 2 +- pack-objects.h | 14 - pack.h | 2 +- packfile.c | 40 packfile.h | 8 ++--- patch-delta.c| 8 ++--- range-diff.c | 2 +- read-cache.c | 48 ++--- ref-filter.c | 30 +- remote-testsvn.c | 4 +-- rerere.c | 2 +- sha1-file.c | 66 sha1dc_git.c | 2 +- sha1dc_git.h | 2 +- streaming.c | 12 streaming.h | 2 +- submodule-config.c | 2 +- t/helper/test-delta.c| 2 +- tag.c| 6 ++-- tag.h| 2 +- tree-walk.c | 14 - tree.c | 2 +- xdiff-interface.c| 4 +-- xdiff-interface.h| 4 +-- 85 files changed, 391 insertions(+), 384 deletions(-) diff --git a/apply.c b/apply.c index 3703bfc8d0..5e11b85d17 100644 --- a/apply.c +++ b/apply.c @@ -3096,7 +3096,7 @@ static int apply_binary_fragment(struct apply_state *state, struct patch *patch) { struct fragment *fragment = patch->fragments; - unsigned long len; + size_t len;
[PATCH v2 1/1] Upcast size_t variables to uintmax_t when printing
From: Torsten Bögershausen When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen --- Changes since V1: - fixed typos in the commit message, thanks to Eric Sunshime for careful reading Applying it on pu gives 1 conflict from the index/repo changes, Should be easy to fix. archive-tar.c | 2 +- builtin/cat-file.c | 4 ++-- builtin/fast-export.c | 2 +- builtin/index-pack.c | 9 + builtin/ls-tree.c | 2 +- builtin/pack-objects.c | 12 ++-- diff.c | 2 +- fast-import.c | 4 ++-- http-push.c| 2 +- ref-filter.c | 2 +- sha1-file.c| 6 +++--- 11 files changed, 24 insertions(+), 23 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 7a535cba24..a58e1a8ebf 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -202,7 +202,7 @@ static void prepare_header(struct archiver_args *args, unsigned int mode, unsigned long size) { xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 0); - xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0); + xsnprintf(header->size, sizeof(header->size), "%011"PRIoMAX , S_ISREG(mode) ? (uintmax_t)size : (uintmax_t)0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 8d97c84725..05decee33f 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -92,7 +92,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, oi.sizep = &size; if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0) die("git cat-file: could not get object info"); - printf("%lu\n", size); + printf("%"PRIuMAX"\n", (uintmax_t)size); return 0; case 'e': @@ -238,7 +238,7 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, if (data->mark_query) data->info.sizep = &data->size; else - strbuf_addf(sb, "%lu", data->size); + strbuf_addf(sb, "%"PRIuMAX , (uintmax_t)data->size); } else if (is_atom("objectsize:disk", atom, len)) { if (data->mark_query) data->info.disk_sizep = &data->disk_size; diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 456797c12a..5790f0d554 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -253,7 +253,7 @@ static void export_blob(const struct object_id *oid) mark_next_object(object); - printf("blob\nmark :%"PRIu32"\ndata %lu\n", last_idnum, size); + printf("blob\nmark :%"PRIu32"\ndata %"PRIuMAX"\n", last_idnum, (uintmax_t)size); if (size && fwrite(buf, size, 1, stdout) != 1) die_errno("could not write blob '%s'", oid_to_hex(oid)); printf("\n"); diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 2004e25da2..2a8ada432b 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -450,7 +450,8 @@ static void *unpack_entry_data(off_t offset, unsigned long size, int hdrlen; if (!is_delta_type(type)) { - hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), size) + 1; + hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX, + type_name(type),(uintmax_t)size) + 1; the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, hdrlen); } else @@ -1628,10 +1629,10 @@ static void show_pack_info(int stat_only) chain_histogram[obj_stat[i].delta_depth - 1]++; if (stat_only) continue; - printf("%s %-6s %lu %lu %"PRIuMAX, + printf("%s %-6s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX, oid_to_hex(&obj->idx.oid), - type_name(obj->real_type), obj->size, - (unsigned long)(obj[1].idx.offset - obj->idx.offset), + type_name(obj->real_type), (uintmax_t)obj->size, + (uintmax_t)(obj[1].idx.offset - obj->idx.offset), (uintmax_t)obj->idx.offset); if (is_delta_type(obj->type)) { struct object_entry *bobj
[PATCH v2 1/1] remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms)
From: Torsten Bögershausen When setting DEVELOPER = 1 DEVOPTS = extra-all "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with "comparison is always false due to limited range of data type" "[-Werror=type-limits]" It turns out that the function xcurl_off_t() has 2 flavours: - It gives a warning 32 bit systems, like Linux - It takes the signed ssize_t as a paramter, but the only caller is using a size_t (which is typically unsigned these days) The original motivation of this function is to make sure that sizes > 2GiB are handled correctly. The curl documentation says: "For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit wide signed integral data type" On a 32 bit system "size_t" can be promoted into a 64 bit signed value without loss of data, and therefore we may see the "comparison is always false" warning. On a 64 bit system it may happen, at least in theory, that size_t is > 2^63, and then the promotion from an unsigned "size_t" into a signed "curl_off_t" may be a problem. One solution to suppress a possible compiler warning could be to remove the function xcurl_off_t(). However, to be on the very safe side, we keep it and improve it: - The len parameter is changed from ssize_t to size_t - A temporally variable "size" is used, promoted int uintmax_t and the comopared with "maximum_signed_value_of_type(curl_off_t)". Thanks to Junio C Hamano for this hint. Signed-off-by: Torsten Bögershausen --- This is a re-semd, the orignal patch was part of a 2 patch-series. This patch needed some rework, and here should be the polished version. remote-curl.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 762a55a75f..1220dffcdc 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -617,10 +617,11 @@ static int probe_rpc(struct rpc_state *rpc, struct slot_results *results) return err; } -static curl_off_t xcurl_off_t(ssize_t len) { - if (len > maximum_signed_value_of_type(curl_off_t)) +static curl_off_t xcurl_off_t(size_t len) { + uintmax_t size = len; + if (size > maximum_signed_value_of_type(curl_off_t)) die("cannot handle pushes this big"); - return (curl_off_t) len; + return (curl_off_t)size; } static int post_rpc(struct rpc_state *rpc) -- 2.19.0.271.gfe8321ec05
[PATCH v1 1/1] Upcast size_t variables to uintmax_t when printing
From: Torsten Bögershausen When printing variables which contains a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, link Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsligned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen --- archive-tar.c | 2 +- builtin/cat-file.c | 4 ++-- builtin/fast-export.c | 2 +- builtin/index-pack.c | 9 + builtin/ls-tree.c | 2 +- builtin/pack-objects.c | 12 ++-- diff.c | 2 +- fast-import.c | 4 ++-- http-push.c| 2 +- ref-filter.c | 2 +- sha1-file.c| 6 +++--- 11 files changed, 24 insertions(+), 23 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 7a535cba24..a58e1a8ebf 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -202,7 +202,7 @@ static void prepare_header(struct archiver_args *args, unsigned int mode, unsigned long size) { xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 0); - xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0); + xsnprintf(header->size, sizeof(header->size), "%011"PRIoMAX , S_ISREG(mode) ? (uintmax_t)size : (uintmax_t)0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 8d97c84725..05decee33f 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -92,7 +92,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, oi.sizep = &size; if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0) die("git cat-file: could not get object info"); - printf("%lu\n", size); + printf("%"PRIuMAX"\n", (uintmax_t)size); return 0; case 'e': @@ -238,7 +238,7 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, if (data->mark_query) data->info.sizep = &data->size; else - strbuf_addf(sb, "%lu", data->size); + strbuf_addf(sb, "%"PRIuMAX , (uintmax_t)data->size); } else if (is_atom("objectsize:disk", atom, len)) { if (data->mark_query) data->info.disk_sizep = &data->disk_size; diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 456797c12a..5790f0d554 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -253,7 +253,7 @@ static void export_blob(const struct object_id *oid) mark_next_object(object); - printf("blob\nmark :%"PRIu32"\ndata %lu\n", last_idnum, size); + printf("blob\nmark :%"PRIu32"\ndata %"PRIuMAX"\n", last_idnum, (uintmax_t)size); if (size && fwrite(buf, size, 1, stdout) != 1) die_errno("could not write blob '%s'", oid_to_hex(oid)); printf("\n"); diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 2004e25da2..2a8ada432b 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -450,7 +450,8 @@ static void *unpack_entry_data(off_t offset, unsigned long size, int hdrlen; if (!is_delta_type(type)) { - hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), size) + 1; + hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX, + type_name(type),(uintmax_t)size) + 1; the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, hdrlen); } else @@ -1628,10 +1629,10 @@ static void show_pack_info(int stat_only) chain_histogram[obj_stat[i].delta_depth - 1]++; if (stat_only) continue; - printf("%s %-6s %lu %lu %"PRIuMAX, + printf("%s %-6s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX, oid_to_hex(&obj->idx.oid), - type_name(obj->real_type), obj->size, - (unsigned long)(obj[1].idx.offset - obj->idx.offset), + type_name(obj->real_type), (uintmax_t)obj->size, + (uintmax_t)(obj[1].idx.offset - obj->idx.offset), (uintmax_t)obj->idx.offset); if (is_delta_type(obj->type)) { struct object_entry *bobj = &objects[obj_stat[i].base_object_no]; diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c index fe3b952cb3..7d581d6463 100644 --- a/builtin/ls-tree.c +++ b/builtin/ls-tree.c @@ -100,7 +100,
[PATCH v2 1/1] remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms)
From: Torsten Bögershausen When setting DEVELOPER = 1 DEVOPTS = extra-all "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with "comparison is always false due to limited range of data type" "[-Werror=type-limits]" It turns out that the function xcurl_off_t() has 2 flavours: - It gives a warning 32 bit systems, like Linux - It takes the signed ssize_t as a paramter, but the only caller is using a size_t (which is typically unsigned these days) The original motivation of this function is to make sure that sizes > 2GiB are handled correctly. The curl documentation says: "For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit wide signed integral data type" On a 32 bit system "size_t" can be promoted into a 64 bit signed value without loss of data, and therefore we may see the "comparison is always false" warning. On a 64 bit system it may happen, at least in theory, that size_t is > 2^63, and then the promotion from an unsigned "size_t" into a signed "curl_off_t" may be a problem. One solution to suppress a possible compiler warning could be to remove the function xcurl_off_t(). However, to be on the very safe side, we keep it and improve it: - The len parameter is changed from ssize_t to size_t - A temporally variable "size" is used, promoted int uintmax_t and the comopared with "maximum_signed_value_of_type(curl_off_t)". Thanks to Junio C Hamano for this hint. Signed-off-by: Torsten Bögershausen --- remote-curl.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 762a55a75f..1220dffcdc 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -617,10 +617,11 @@ static int probe_rpc(struct rpc_state *rpc, struct slot_results *results) return err; } -static curl_off_t xcurl_off_t(ssize_t len) { - if (len > maximum_signed_value_of_type(curl_off_t)) +static curl_off_t xcurl_off_t(size_t len) { + uintmax_t size = len; + if (size > maximum_signed_value_of_type(curl_off_t)) die("cannot handle pushes this big"); - return (curl_off_t) len; + return (curl_off_t)size; } static int post_rpc(struct rpc_state *rpc) -- 2.19.0.271.gfe8321ec05
[PATCH v1 1/2] path.c: char is not (always) signed
From: Torsten Bögershausen If a "char" in C is signed or unsigned is not specified, because it is out of tradition "implementation dependent". Therefore constructs like "if (name[i] < 0)" are not portable, use "if (name[i] & 0x80)" instead. Detected by "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" when setting DEVELOPER = 1 DEVOPTS = extra-all Signed-off-by: Torsten Bögershausen --- path.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.c b/path.c index 34f0f98349..ba06ec5b2d 100644 --- a/path.c +++ b/path.c @@ -1369,7 +1369,7 @@ static int is_ntfs_dot_generic(const char *name, saw_tilde = 1; } else if (i >= 6) return 0; - else if (name[i] < 0) { + else if (name[i] & 0x80) { /* * We know our needles contain only ASCII, so we clamp * here to make the results of tolower() sane. -- 2.11.0
[PATCH v1 2/2] curl_off_t xcurl_off_t is not portable
From: Torsten Bögershausen Comparing signed and unsigned values is not always portable. When setting DEVELOPER = 1 DEVOPTS = extra-all "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with "comparison is always false due to limited range of data type" "[-Werror=type-limits]" Solution: Use a valid cast & compare, similar to xsize_t() Signed-off-by: Torsten Bögershausen --- remote-curl.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 762a55a75f..c89fd6d1c3 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -618,9 +618,10 @@ static int probe_rpc(struct rpc_state *rpc, struct slot_results *results) } static curl_off_t xcurl_off_t(ssize_t len) { - if (len > maximum_signed_value_of_type(curl_off_t)) + curl_off_t size = (curl_off_t) len; + if (len != (ssize_t) size) die("cannot handle pushes this big"); - return (curl_off_t) len; + return size; } static int post_rpc(struct rpc_state *rpc) -- 2.11.0
[PATCH/RFC v2 1/1] Use off_t instead of size_t for functions dealing with streamed checkin
From: Torsten Bögershausen When streaming data from disk into a blob, it should be possible to commit a file with a file size > 4 GiB using the streaming functionality in Git. Because of the streaming there is no need to load the whole data into memory at once. Today this is not possible on e.g. a 32 bit Linux system. There is no good reason to limit the length of the file by using a size_t in the code, which is a 32 bit value. Loosen this restriction and use off_t instead of size_t in the call chain. Signed-off-by: Torsten Bögershausen --- This is a suggestion for V2, changing even sha1-file.c, so that the whole patch makes more sense. The initial commit of a >4Gib file was tested on a 32 bit system I didn't remove the wrapper functions, as I don't know what their purpose is. And: The commit message may need some tweaking, though bulk-checkin.c | 6 +++--- bulk-checkin.h | 2 +- sha1-file.c| 5 ++--- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 409ecb566b..34dbf5c4ea 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -96,7 +96,7 @@ static int already_written(struct bulk_checkin_state *state, struct object_id *o */ static int stream_to_pack(struct bulk_checkin_state *state, git_hash_ctx *ctx, off_t *already_hashed_to, - int fd, size_t size, enum object_type type, + int fd, off_t size, enum object_type type, const char *path, unsigned flags) { git_zstream s; @@ -189,7 +189,7 @@ static void prepare_to_stream(struct bulk_checkin_state *state, static int deflate_to_pack(struct bulk_checkin_state *state, struct object_id *result_oid, - int fd, size_t size, + int fd, off_t size, enum object_type type, const char *path, unsigned flags) { @@ -258,7 +258,7 @@ static int deflate_to_pack(struct bulk_checkin_state *state, } int index_bulk_checkin(struct object_id *oid, - int fd, size_t size, enum object_type type, + int fd, off_t size, enum object_type type, const char *path, unsigned flags) { int status = deflate_to_pack(&state, oid, fd, size, type, diff --git a/bulk-checkin.h b/bulk-checkin.h index f438f93811..09b2affdf3 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -7,7 +7,7 @@ #include "cache.h" extern int index_bulk_checkin(struct object_id *oid, - int fd, size_t size, enum object_type type, + int fd, off_t size, enum object_type type, const char *path, unsigned flags); extern void plug_bulk_checkin(void); diff --git a/sha1-file.c b/sha1-file.c index a4367b8f04..98d0f50ffa 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1934,7 +1934,7 @@ static int index_core(struct object_id *oid, int fd, size_t size, * binary blobs, they generally do not want to get any conversion, and * callers should avoid this code path when filters are requested. */ -static int index_stream(struct object_id *oid, int fd, size_t size, +static int index_stream(struct object_id *oid, int fd, off_t size, enum object_type type, const char *path, unsigned flags) { @@ -1959,8 +1959,7 @@ int index_fd(struct object_id *oid, int fd, struct stat *st, ret = index_core(oid, fd, xsize_t(st->st_size), type, path, flags); else - ret = index_stream(oid, fd, xsize_t(st->st_size), type, path, - flags); + ret = index_stream(oid, fd, st->st_size, type, path, flags); close(fd); return ret; } -- 2.11.0
[PATCH v1 1/1] index_bulk_checkin(): Take off_t, not size_t
From: Torsten Bögershausen When streaming data from disk into a blob, use off_t instead of size_t, which is a better choice for file length. Signed-off-by: Torsten Bögershausen --- This is based on an old patch from 2017, which never made it to the list. I think it make sense to have off_t/size_t more consistent, reviews/comments are welcome. bulk-checkin.c | 4 ++-- bulk-checkin.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 409ecb566b..2631e82d6c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -189,7 +189,7 @@ static void prepare_to_stream(struct bulk_checkin_state *state, static int deflate_to_pack(struct bulk_checkin_state *state, struct object_id *result_oid, - int fd, size_t size, + int fd, off_t size, enum object_type type, const char *path, unsigned flags) { @@ -258,7 +258,7 @@ static int deflate_to_pack(struct bulk_checkin_state *state, } int index_bulk_checkin(struct object_id *oid, - int fd, size_t size, enum object_type type, + int fd, off_t size, enum object_type type, const char *path, unsigned flags) { int status = deflate_to_pack(&state, oid, fd, size, type, diff --git a/bulk-checkin.h b/bulk-checkin.h index f438f93811..09b2affdf3 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -7,7 +7,7 @@ #include "cache.h" extern int index_bulk_checkin(struct object_id *oid, - int fd, size_t size, enum object_type type, + int fd, off_t size, enum object_type type, const char *path, unsigned flags); extern void plug_bulk_checkin(void); -- 2.19.0.271.gfe8321ec05
[PATCH v2 1/1] zlib.c: use size_t for size
From: Martin Koegler Signed-off-by: Martin Koegler Signed-off-by: Junio C Hamano Signed-off-by: Torsten Bögershausen --- After doing a review, I decided to send the result as a patch. In general, the changes from off_t to size_t seem to be not really motivated. But if they are, they could and should go into an own patch. For the moment, change only "unsigned long" into size_t, thats all builtin/pack-objects.c | 8 cache.h| 10 +- pack-check.c | 4 ++-- packfile.h | 2 +- wrapper.c | 8 zlib.c | 8 6 files changed, 20 insertions(+), 20 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e6316d294d..23c4cd8c77 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -269,12 +269,12 @@ static void copy_pack_data(struct hashfile *f, off_t len) { unsigned char *in; - unsigned long avail; + size_t avail; while (len) { in = use_pack(p, w_curs, offset, &avail); if (avail > len) - avail = (unsigned long)len; + avail = xsize_t(len); hashwrite(f, in, avail); offset += avail; len -= avail; @@ -1478,8 +1478,8 @@ static void check_object(struct object_entry *entry) struct pack_window *w_curs = NULL; const unsigned char *base_ref = NULL; struct object_entry *base_entry; - unsigned long used, used_0; - unsigned long avail; + size_t used, used_0; + size_t avail; off_t ofs; unsigned char *buf, c; enum object_type type; diff --git a/cache.h b/cache.h index d508f3d4f8..fce53fe620 100644 --- a/cache.h +++ b/cache.h @@ -20,10 +20,10 @@ #include typedef struct git_zstream { z_stream z; - unsigned long avail_in; - unsigned long avail_out; - unsigned long total_in; - unsigned long total_out; + size_t avail_in; + size_t avail_out; + size_t total_in; + size_t total_out; unsigned char *next_in; unsigned char *next_out; } git_zstream; @@ -40,7 +40,7 @@ void git_deflate_end(git_zstream *); int git_deflate_abort(git_zstream *); int git_deflate_end_gently(git_zstream *); int git_deflate(git_zstream *, int flush); -unsigned long git_deflate_bound(git_zstream *, unsigned long); +size_t git_deflate_bound(git_zstream *, size_t); /* The length in bytes and in hex digits of an object name (SHA-1 value). */ #define GIT_SHA1_RAWSZ 20 diff --git a/pack-check.c b/pack-check.c index fa5f0ff8fa..d1e7f554ae 100644 --- a/pack-check.c +++ b/pack-check.c @@ -33,7 +33,7 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, uint32_t data_crc = crc32(0, NULL, 0); do { - unsigned long avail; + size_t avail; void *data = use_pack(p, w_curs, offset, &avail); if (avail > len) avail = len; @@ -68,7 +68,7 @@ static int verify_packfile(struct packed_git *p, the_hash_algo->init_fn(&ctx); do { - unsigned long remaining; + size_t remaining; unsigned char *in = use_pack(p, w_curs, offset, &remaining); offset += remaining; if (!pack_sig_ofs) diff --git a/packfile.h b/packfile.h index 442625723d..e2daf63426 100644 --- a/packfile.h +++ b/packfile.h @@ -78,7 +78,7 @@ extern void close_pack_index(struct packed_git *); extern uint32_t get_pack_fanout(struct packed_git *p, uint32_t value); -extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *); +extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, size_t *); extern void close_pack_windows(struct packed_git *); extern void close_pack(struct packed_git *); extern void close_all_packs(struct raw_object_store *o); diff --git a/wrapper.c b/wrapper.c index e4fa9d84cd..1a510bd6fc 100644 --- a/wrapper.c +++ b/wrapper.c @@ -67,11 +67,11 @@ static void *do_xmalloc(size_t size, int gentle) ret = malloc(1); if (!ret) { if (!gentle) - die("Out of memory, malloc failed (tried to allocate %lu bytes)", - (unsigned long)size); + die("Out of memory, malloc failed (tried to allocate %" PRIuMAX " bytes)", + (uintmax_t)size); else { - error("Out of memory, malloc failed (tried to allocate %lu bytes)", - (unsigned long)size); + error("Out of memory, malloc failed (tried to allocate %" PRIuMA
[PATCH v1 1/1] Make git_check_attr() a void function
From: Torsten Bögershausen git_check_attr() returns always 0. Remove all the error handling code of the callers, which is never executed. Change git_check_attr() to be a void function. Signed-off-by: Torsten Bögershausen --- archive.c | 3 ++- attr.c | 8 +++- attr.h | 4 ++-- builtin/check-attr.c | 3 +-- builtin/pack-objects.c | 3 +-- convert.c | 42 ++-- ll-merge.c | 16 +++ userdiff.c | 3 +-- ws.c | 44 +++--- 9 files changed, 57 insertions(+), 69 deletions(-) diff --git a/archive.c b/archive.c index 0a07b140fe..c1870105eb 100644 --- a/archive.c +++ b/archive.c @@ -110,7 +110,8 @@ static const struct attr_check *get_archive_attrs(struct index_state *istate, static struct attr_check *check; if (!check) check = attr_check_initl("export-ignore", "export-subst", NULL); - return git_check_attr(istate, path, check) ? NULL : check; + git_check_attr(istate, path, check); + return check; } static int check_attr_export_ignore(const struct attr_check *check) diff --git a/attr.c b/attr.c index 98e4953f6e..60d284796d 100644 --- a/attr.c +++ b/attr.c @@ -1143,9 +1143,9 @@ static void collect_some_attrs(const struct index_state *istate, fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem); } -int git_check_attr(const struct index_state *istate, - const char *path, - struct attr_check *check) +void git_check_attr(const struct index_state *istate, + const char *path, + struct attr_check *check) { int i; @@ -1158,8 +1158,6 @@ int git_check_attr(const struct index_state *istate, value = ATTR__UNSET; check->items[i].value = value; } - - return 0; } void git_all_attrs(const struct index_state *istate, diff --git a/attr.h b/attr.h index 2be86db36e..b0378bfe5f 100644 --- a/attr.h +++ b/attr.h @@ -63,8 +63,8 @@ void attr_check_free(struct attr_check *check); */ const char *git_attr_name(const struct git_attr *); -int git_check_attr(const struct index_state *istate, - const char *path, struct attr_check *check); +void git_check_attr(const struct index_state *istate, + const char *path, struct attr_check *check); /* * Retrieve all attributes that apply to the specified path. diff --git a/builtin/check-attr.c b/builtin/check-attr.c index c05573ff9c..30a2f84274 100644 --- a/builtin/check-attr.c +++ b/builtin/check-attr.c @@ -65,8 +65,7 @@ static void check_attr(const char *prefix, if (collect_all) { git_all_attrs(&the_index, full_path, check); } else { - if (git_check_attr(&the_index, full_path, check)) - die("git_check_attr died"); + git_check_attr(&the_index, full_path, check); } output_attr(check, file); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d1144a8f7e..eb71dab5be 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -951,8 +951,7 @@ static int no_try_delta(const char *path) if (!check) check = attr_check_initl("delta", NULL); - if (git_check_attr(&the_index, path, check)) - return 0; + git_check_attr(&the_index, path, check); if (ATTR_FALSE(check->items[0].value)) return 1; return 0; diff --git a/convert.c b/convert.c index 6057f1f580..e0848226d2 100644 --- a/convert.c +++ b/convert.c @@ -1297,6 +1297,7 @@ static void convert_attrs(const struct index_state *istate, struct conv_attrs *ca, const char *path) { static struct attr_check *check; + struct attr_check_item *ccheck = NULL; if (!check) { check = attr_check_initl("crlf", "ident", "filter", @@ -1306,30 +1307,25 @@ static void convert_attrs(const struct index_state *istate, git_config(read_convert_config, NULL); } - if (!git_check_attr(istate, path, check)) { - struct attr_check_item *ccheck = check->items; - ca->crlf_action = git_path_check_crlf(ccheck + 4); - if (ca->crlf_action == CRLF_UNDEFINED) - ca->crlf_action = git_path_check_crlf(ccheck + 0); - ca->ident = git_path_check_ident(ccheck + 1); - ca->drv = git_path_check_convert(ccheck + 2); - if (ca->crlf_action != CRLF_BINARY) { - enum eol eol_attr = git_path_check_eol(ccheck + 3); - if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF) - ca->crlf_action = CRLF_AUTO_INPUT; - else if (ca->crlf_action == CRLF_AUTO && eol_attr ==
[PATCH v1 1/1] test: Correct detection of UTF8_NFD_TO_NFC for APFS
From: Torsten Bögershausen On HFS (which is the default Mac filesystem prior to High Sierra), unicode names are "decomposed" before recording. On APFS, which appears to be the new default filesystem in Mac OS High Sierra, filenames are recorded as specified by the user. APFS continues to allow the user to access it via any name that normalizes to the same thing. This difference causes t0050-filesystem.sh to fail two tests. Improve the test for a NFD/NFC in test-lib.sh: Test if the same file can be reached in pre- and decomposed unicode. Reported-By: Elijah Newren Signed-off-by: Torsten Bögershausen --- t/test-lib.sh | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/t/test-lib.sh b/t/test-lib.sh index ea2bbaaa7a..e206250d1b 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1106,12 +1106,7 @@ test_lazy_prereq UTF8_NFD_TO_NFC ' auml=$(printf "\303\244") aumlcdiar=$(printf "\141\314\210") >"$auml" && - case "$(echo *)" in - "$aumlcdiar") - true ;; - *) - false ;; - esac + test -r "$aumlcdiar" ' test_lazy_prereq AUTOIDENT ' -- 2.16.0.rc0.8.g5497051b43
[no subject]
>From 9f7d43f29eaf6017b7b16261ce91d8ef182cf415 Mon Sep 17 00:00:00 2001 In-Reply-To: <20171218131249.gb4...@sigill.intra.peff.net> References: <20171218131249.gb4...@sigill.intra.peff.net> From: =?UTF-8?q?Torsten=20B=C3=B6gershausen?= Date: Fri, 23 Feb 2018 20:53:34 +0100 Subject: [PATCH 0/1] Auto diff of UTF-16 files in UTF-8 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make it possible to show a user-readable diff for UTF-16 encoded files. This would replace the "binary files differ" with something useful, without breaking anything for existing users (?). For future repos the w-t-e encoding can be used, which allows e.g. easier merging. People which stick to native UTF-16 because they need the compatiblity with e.g. libgit2 can still get a readable diff. Opinions ? Torsten Bögershausen (1): Auto diff of UTF-16 files in UTF-8 diff.c | 43 - diffcore.h | 3 ++ t/t4066-diff-encoding.sh | 98 utf8.h | 11 ++ 4 files changed, 153 insertions(+), 2 deletions(-) create mode 100755 t/t4066-diff-encoding.sh -- 2.16.1.194.gb2e45c695d
[PATCH/RFC 1/1] Auto diff of UTF-16 files in UTF-8
From: Torsten Bögershausen When an UTF-16 file is commited and later changed, `git diff` shows "Binary files XX and YY differ". When the user wants a diff in UTF-8, a textconv needs to be specified in .gitattributes and the textconv must be configured. A more user-friendly diff can be produced for UTF-16 if - the user did not use `git diff --binary` - the blob is identified as binary - the blob has an UTF-16 BOM - the blob can be converted into UTF-8 Enhance the diff machinery to auto-detect UTF-16 blobs and show them as UTF-8, unless the user specifies `git diff --binary` which creates a binary diff. Signed-off-by: Torsten Bögershausen --- diff.c | 43 - diffcore.h | 3 ++ t/t4066-diff-encoding.sh | 98 utf8.h | 11 ++ 4 files changed, 153 insertions(+), 2 deletions(-) create mode 100755 t/t4066-diff-encoding.sh diff --git a/diff.c b/diff.c index fb22b19f09..51831ee94d 100644 --- a/diff.c +++ b/diff.c @@ -3192,6 +3192,10 @@ static void builtin_diff(const char *name_a, strbuf_reset(&header); } + if (one && one->reencoded_from_utf16) + strbuf_addf(&header, "a is converted to UTF-8 from UTF-16\n"); + if (two && two->reencoded_from_utf16) + strbuf_addf(&header, "b is converted to UTF-8 from UTF-16\n"); mf1.size = fill_textconv(textconv_one, one, &mf1.ptr); mf2.size = fill_textconv(textconv_two, two, &mf2.ptr); @@ -3611,8 +3615,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags) s->size = size; s->should_free = 1; } - } - else { + if (!s->binary && buffer_is_binary(s->data, s->size) && + buffer_has_utf16_bom(s->data, s->size)) { + int outsz = 0; + char *outbuf; + outbuf = reencode_string_len(s->data, (int)s->size, +"UTF-8", "UTF-16", &outsz); + if (outbuf) { + if (s->should_free) + free(s->data); + if (s->should_munmap) + munmap(s->data, s->size); + s->should_munmap = 0; + s->data = outbuf; + s->size = outsz; + s->reencoded_from_utf16 = 1; + s->should_free = 1; + } + } + } else { enum object_type type; if (size_only || (flags & CHECK_BINARY)) { type = sha1_object_info(s->oid.hash, &s->size); @@ -3629,6 +3650,19 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags) s->data = read_sha1_file(s->oid.hash, &type, &s->size); if (!s->data) die("unable to read %s", oid_to_hex(&s->oid)); + if (!s->binary && buffer_is_binary(s->data, s->size) && + buffer_has_utf16_bom(s->data, s->size)) { + int outsz = 0; + char *buf; + buf = reencode_string_len(s->data, (int)s->size, + "UTF-8", "UTF-16", &outsz); + if (buf) { + free(s->data); + s->data = buf; + s->size = outsz; + s->reencoded_from_utf16 = 1; + } + } s->should_free = 1; } return 0; @@ -5695,6 +5729,10 @@ static int diff_filespec_is_identical(struct diff_filespec *one, static int diff_filespec_check_stat_unmatch(struct diff_filepair *p) { + if (p->binary) { + p->one->binary = 1; + p->two->binary = 1; + } if (p->done_skip_stat_unmatch) return p->skip_stat_unmatch_result; @@ -5735,6 +5773,7 @@ static void diffcore_skip_stat_unmatch(struct diff_options *diffopt) for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; + p->binary = diffopt->flags.binary; if (diff_filespec_check_stat_unmatch(p)) diff_q(&outq, p); else { diff --git a/diffcore.h b/diffcore.h index a30da161da..3cd97bb93b 100644 --- a/diffcore.h +++ b/diffcore.h @@ -47,6 +47,8 @@ struct diff_filespec { unsigned has_more_entries : 1; /* only appear in combined diff */ /* data should be considered "binary"; -1 means "don't know yet" */ signed int is_binary : 2; +
[PATCH v5 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
From: Lars Schneider Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we allocate the buffer for the lower case string with xmallocz(). This already ensures a NUL at the end of the allocated buffer. Remove the unnecessary assignment. Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- strbuf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 8007be8fb..490f7850e 100644 --- a/strbuf.c +++ b/strbuf.c @@ -781,7 +781,6 @@ char *xstrdup_tolower(const char *string) result = xmallocz(len); for (i = 0; i < len; i++) result[i] = tolower(string[i]); - result[i] = '\0'; return result; } -- 2.16.0.rc0.2.g64d3e4d0cc.dirty
[PATCH v5 5/7] convert: add 'working-tree-encoding' attribute
From: Lars Schneider Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as most Git web front ends do not visualize the content. Add an attribute to tell Git what encoding the user has defined for a given file. If the content is added to the index, then Git converts the content to a canonical UTF-8 representation. On checkout Git will reverse the conversion. Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- Documentation/gitattributes.txt | 60 convert.c| 190 - convert.h| 1 + sha1_file.c | 2 +- t/t0028-working-tree-encoding.sh | 196 +++ 5 files changed, 447 insertions(+), 2 deletions(-) create mode 100755 t/t0028-working-tree-encoding.sh diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 30687de81..a8dbf4be3 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -272,6 +272,66 @@ few exceptions. Even though... catch potential problems early, safety triggers. +`working-tree-encoding` +^^^ + +Git recognizes files encoded with ASCII or one of its supersets (e.g. +UTF-8 or ISO-8859-1) as text files. All other encodings are usually +interpreted as binary and consequently built-in Git text processing +tools (e.g. 'git diff') as well as most Git web front ends do not +visualize the content. + +In these cases you can tell Git the encoding of a file in the working +directory with the `working-tree-encoding` attribute. If a file with this +attributes is added to Git, then Git reencodes the content from the +specified encoding to UTF-8 and stores the result in its internal data +structure (called "the index"). On checkout the content is encoded +back to the specified encoding. + +Please note that using the `working-tree-encoding` attribute may have a +number of pitfalls: + +- Git clients that do not support the `working-tree-encoding` attribute + will checkout the respective files UTF-8 encoded and not in the + expected encoding. Consequently, these files will appear different + which typically causes trouble. This is in particular the case for + older Git versions and alternative Git implementations such as JGit + or libgit2 (as of January 2018). + +- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause + errors as the conversion might not be round trip safe. + +- Reencoding content requires resources that might slow down certain + Git operations (e.g 'git checkout' or 'git add'). + +Use the `working-tree-encoding` attribute only if you cannot store a file in +UTF-8 encoding and if you want Git to be able to process the content as +text. + +Use the following attributes if your '*.txt' files are UTF-16 encoded +with byte order mark (BOM) and you want Git to perform automatic line +ending conversion based on your platform. + + +*.txt text working-tree-encoding=UTF-16 + + +Use the following attributes if your '*.txt' files are UTF-16 little +endian encoded without BOM and you want Git to use Windows line endings +in the working directory. + + +*.txt working-tree-encoding=UTF-16LE text eol=CRLF + + +You can get a list of all available encodings on your platform with the +following command: + + +iconv --list + + + `ident` ^^^ diff --git a/convert.c b/convert.c index b976eb968..0c372069b 100644 --- a/convert.c +++ b/convert.c @@ -7,6 +7,7 @@ #include "sigchain.h" #include "pkt-line.h" #include "sub-process.h" +#include "utf8.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -265,6 +266,147 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, } +static struct encoding { + const char *name; + struct encoding *next; +} *encoding, **encoding_tail; +static const char *default_encoding = "UTF-8"; + +static int encode_to_git(const char *path, const char *src, size_t src_len, +struct strbuf *buf, struct encoding *enc, int conv_flags) +{ + char *dst; + int dst_len; + + /* +* No encoding is specified or there is nothing to encode. +* Tell the caller that the content was not modified. +*/ + if (!enc || (src && !src_len)) + return 0; + + /* +* Looks like we got called from "would_convert_to_git()". +* This means Git wants to know if it would encode (= modify!) +* the content. Let's answer with "yes", since an encoding was +* specified. +*/ + if (!buf && !src) +
[PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding
From: Torsten Bögershausen UTF-16 encoded files are treated as "binary" by Git, and no CRLF conversion is done. When the UTF-16 encoded files are converted into UF-8 using the new "working-tree-encoding", the CRLF are converted if core.autocrlf is true. This may lead to confusion: A tool writes an UTF-16 encoded file with CRLF. The file is commited with core.autocrlf=true, the CLRF are converted into LF. The repo is pushed somewhere and cloned by a different user, who has decided to use core.autocrlf=false. He uses the same tool, and now the CRLF are not there as expected, but LF, make the file useless for the tool. Avoid this (possible) confusion by ignoring core.autocrlf for all files which have "working-tree-encoding" defined. The user can still use a .gitattributes file and specify the line endings like "text=auto", "text", or "text eol=crlf" and let that .gitattribute file travel together with push and clone. Change convert.c to e more careful, simplify the initialization when attributes are retrived (and none are specified) and update the documentation. Signed-off-by: Torsten Bögershausen --- Documentation/gitattributes.txt | 9 ++--- convert.c | 15 --- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index a8dbf4be3..3665c4677 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -308,12 +308,15 @@ Use the `working-tree-encoding` attribute only if you cannot store a file in UTF-8 encoding and if you want Git to be able to process the content as text. +Note that when `working-tree-encoding` is defined, core.autocrlf is ignored. +Set the `text` attribute (or `text=auto`) to enable CRLF conversions. + Use the following attributes if your '*.txt' files are UTF-16 encoded -with byte order mark (BOM) and you want Git to perform automatic line -ending conversion based on your platform. +with byte order mark (BOM) and you want Git to perform line +ending conversion based on core.eol. -*.txt text working-tree-encoding=UTF-16 +*.txt working-tree-encoding=UTF-16 text Use the following attributes if your '*.txt' files are UTF-16 little diff --git a/convert.c b/convert.c index 13fad490c..e7f11d1db 100644 --- a/convert.c +++ b/convert.c @@ -1264,15 +1264,24 @@ static void convert_attrs(struct conv_attrs *ca, const char *path) } ca->checkout_encoding = git_path_check_encoding(ccheck + 5); } else { - ca->drv = NULL; - ca->crlf_action = CRLF_UNDEFINED; - ca->ident = 0; + memset(ca, 0, sizeof(*ca)); } /* Save attr and make a decision for action */ ca->attr_action = ca->crlf_action; if (ca->crlf_action == CRLF_TEXT) ca->crlf_action = text_eol_is_crlf() ? CRLF_TEXT_CRLF : CRLF_TEXT_INPUT; + /* +* Often UTF-16 encoded files are read and written by programs which +* really need CRLF, and it is important to keep the CRLF "as is" when +* files are committed with core.autocrlf=true and the repo is pushed. +* The CRLF would be converted into LF when the repo is cloned to +* a machine with core.autocrlf=false. +* Obey the "text" and "eol" attributes and be independent on the +* local core.autocrlf for all "encoded" files. +*/ + if ((ca->crlf_action == CRLF_UNDEFINED) && ca->checkout_encoding) + ca->crlf_action = CRLF_BINARY; if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_FALSE) ca->crlf_action = CRLF_BINARY; if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_TRUE) -- 2.16.0.rc0.2.g64d3e4d0cc.dirty
[PATCH v5 2/7] strbuf: add xstrdup_toupper()
From: Lars Schneider Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- strbuf.c | 12 strbuf.h | 1 + 2 files changed, 13 insertions(+) diff --git a/strbuf.c b/strbuf.c index 490f7850e..a20af696b 100644 --- a/strbuf.c +++ b/strbuf.c @@ -784,6 +784,18 @@ char *xstrdup_tolower(const char *string) return result; } +char *xstrdup_toupper(const char *string) +{ + char *result; + size_t len, i; + + len = strlen(string); + result = xmallocz(len); + for (i = 0; i < len; i++) + result[i] = toupper(string[i]); + return result; +} + char *xstrvfmt(const char *fmt, va_list ap) { struct strbuf buf = STRBUF_INIT; diff --git a/strbuf.h b/strbuf.h index 14c8c10d6..df7ced53e 100644 --- a/strbuf.h +++ b/strbuf.h @@ -607,6 +607,7 @@ __attribute__((format (printf,2,3))) extern int fprintf_ln(FILE *fp, const char *fmt, ...); char *xstrdup_tolower(const char *); +char *xstrdup_toupper(const char *); /** * Create a newly allocated string using printf format. You can do this easily -- 2.16.0.rc0.2.g64d3e4d0cc.dirty
[PATCH v5 6/7] convert: add tracing for 'working-tree-encoding' attribute
From: Lars Schneider Add the GIT_TRACE_CHECKOUT_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- convert.c| 28 t/t0028-working-tree-encoding.sh | 2 ++ 2 files changed, 30 insertions(+) diff --git a/convert.c b/convert.c index 0c372069b..13fad490c 100644 --- a/convert.c +++ b/convert.c @@ -266,6 +266,29 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, } +static void trace_encoding(const char *context, const char *path, + const char *encoding, const char *buf, size_t len) +{ + static struct trace_key coe = TRACE_KEY_INIT(CHECKOUT_ENCODING); + struct strbuf trace = STRBUF_INIT; + int i; + + strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding); + for (i = 0; i < len && buf; ++i) { + strbuf_addf( + &trace,"| \e[2m%2i:\e[0m %2x \e[2m%c\e[0m%c", + i, + (unsigned char) buf[i], + (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '), + ((i+1) % 8 && (i+1) < len ? ' ' : '\n') + ); + } + strbuf_addchars(&trace, '\n', 1); + + trace_strbuf(&coe, &trace); + strbuf_release(&trace); +} + static struct encoding { const char *name; struct encoding *next; @@ -325,6 +348,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, error(error_msg, path, enc->name); } + trace_encoding("source", path, enc->name, src, src_len); dst = reencode_string_len(src, src_len, default_encoding, enc->name, &dst_len); if (!dst) { @@ -340,6 +364,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, else error(msg, path, enc->name, default_encoding); } + trace_encoding("destination", path, default_encoding, dst, dst_len); /* * UTF supports lossless round tripping [1]. UTF to other encoding are @@ -365,6 +390,9 @@ static int encode_to_git(const char *path, const char *src, size_t src_len, enc->name, default_encoding, &re_src_len); + trace_encoding("reencoded source", path, enc->name, + re_src, re_src_len); + if (!re_src || src_len != re_src_len || memcmp(src, re_src, src_len)) { const char* msg = _("encoding '%s' from %s to %s and " diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index 4d85b4277..0f36d4990 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -4,6 +4,8 @@ test_description='working-tree-encoding conversion via gitattributes' . ./test-lib.sh +GIT_TRACE_CHECKOUT_ENCODING=1 && export GIT_TRACE_CHECKOUT_ENCODING + test_expect_success 'setup test repo' ' git config core.eol lf && -- 2.16.0.rc0.2.g64d3e4d0cc.dirty
[PATCH v5 3/7] utf8: add function to detect prohibited UTF-16/32 BOM
From: Lars Schneider Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- utf8.c | 24 utf8.h | 9 + 2 files changed, 33 insertions(+) diff --git a/utf8.c b/utf8.c index 2c27ce013..914881cd1 100644 --- a/utf8.c +++ b/utf8.c @@ -538,6 +538,30 @@ char *reencode_string_len(const char *in, int insz, } #endif +static int has_bom_prefix(const char *data, size_t len, + const char *bom, size_t bom_len) +{ + return (len >= bom_len) && !memcmp(data, bom, bom_len); +} + +static const char utf16_be_bom[] = {0xFE, 0xFF}; +static const char utf16_le_bom[] = {0xFF, 0xFE}; +static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF}; +static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00}; + +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + (!strcmp(enc, "UTF-16BE") || !strcmp(enc, "UTF-16LE")) && + (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || + has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + (!strcmp(enc, "UTF-32BE") || !strcmp(enc, "UTF-32LE")) && + (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || + has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 6bbcf31a8..4711429af 100644 --- a/utf8.h +++ b/utf8.h @@ -70,4 +70,13 @@ typedef enum { void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int width, const char *s); +/* + * Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE + * or UTF-32LE a BOM must not be used [1]. The function returns true if + * this is the case. + * + * [1] http://unicode.org/faq/utf_bom.html#bom10 + */ +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.16.0.rc0.2.g64d3e4d0cc.dirty
[PATCH v5 4/7] utf8: add function to detect a missing UTF-16/32 BOM
From: Lars Schneider If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The has_missing_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if there in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used in HTML5 recommends to assume little-endian to "deal with deployed content" [3]. Strictly requiring a BOM seems to be the safest option for content in Git. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#gen6 [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf Section 3.10, D98, page 132 [3] https://encoding.spec.whatwg.org/#utf-16le Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- utf8.c | 13 + utf8.h | 16 2 files changed, 29 insertions(+) diff --git a/utf8.c b/utf8.c index 914881cd1..f033fec1c 100644 --- a/utf8.c +++ b/utf8.c @@ -562,6 +562,19 @@ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) ); } +int has_missing_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + !strcmp(enc, "UTF-16") && + !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || +has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + !strcmp(enc, "UTF-32") && + !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || +has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 4711429af..26b5e9185 100644 --- a/utf8.h +++ b/utf8.h @@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid */ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); +/* + * If the endianness is not defined in the encoding name, then we + * require a BOM. The function returns true if a required BOM is missing. + * + * The Unicode standard instructs to assume big-endian if there + * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG + * encoding standard used in HTML5 recommends to assume + * little-endian to "deal with deployed content" [3]. + * + * [1] http://unicode.org/faq/utf_bom.html#gen6 + * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf + * Section 3.10, D98, page 132 + * [3] https://encoding.spec.whatwg.org/#utf-16le + */ +int has_missing_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.16.0.rc0.2.g64d3e4d0cc.dirty
[PATCH v5 0/7] convert: add support for different encodings
From: Torsten Bögershausen Take V4 from Lars, manually integrated the V2 squash patch, so a review would be good. Add my "comments" as a patch, see 7/7 (and this is more like an RFC) This needs to go on top of tb/crlf-conv-flags Lars Schneider (6): strbuf: remove unnecessary NUL assignment in xstrdup_tolower() strbuf: add xstrdup_toupper() utf8: add function to detect prohibited UTF-16/32 BOM utf8: add function to detect a missing UTF-16/32 BOM convert: add 'working-tree-encoding' attribute convert: add tracing for 'working-tree-encoding' attribute Torsten Bögershausen (1): Careful with CRLF when using e.g. UTF-16 for working-tree-encoding Documentation/gitattributes.txt | 63 +++ convert.c| 233 ++- convert.h| 1 + sha1_file.c | 2 +- strbuf.c | 13 ++- strbuf.h | 1 + t/t0028-working-tree-encoding.sh | 198 + utf8.c | 37 +++ utf8.h | 25 + 9 files changed, 567 insertions(+), 6 deletions(-) create mode 100755 t/t0028-working-tree-encoding.sh -- 2.16.0.rc0.2.g64d3e4d0cc.dirty
[PATCH v1 1/1] convert_to_git(): safe_crlf/checksafe becomes int conv_flags
From: Torsten Bögershausen When calling convert_to_git(), the checksafe parameter defined what should happen if the EOL conversion (CRLF --> LF --> CRLF) does not roundtrip cleanly. In addition, it also defined if line endings should be renormalized (CRLF --> LF) or kept as they are. checksafe was an safe_crlf enum with these values: SAFE_CRLF_FALSE: do nothing in case of EOL roundtrip errors SAFE_CRLF_FAIL:die in case of EOL roundtrip errors SAFE_CRLF_WARN:print a warning in case of EOL roundtrip errors SAFE_CRLF_RENORMALIZE: change CRLF to LF SAFE_CRLF_KEEP_CRLF: keep all line endings as they are In some cases the integer value 0 was passed as checksafe parameter instead of the correct enum value SAFE_CRLF_FALSE. That was no problem because SAFE_CRLF_FALSE is defined as 0. FALSE/FAIL/WARN are different from RENORMALIZE and KEEP_CRLF. Therefore, an enum is not ideal. Let's use a integer bit pattern instead and rename the parameter to conv_flags to make it more generically usable. This allows us to extend the bit pattern in a subsequent commit. Reported-By: Randall S. Becker Helped-By: Lars Schneider Signed-off-by: Torsten Bögershausen Signed-off-by: Lars Schneider --- >I think this is being solved a bit differently with a1fbf854 >("convert_to_git(): safe_crlf/checksafe becomes int conv_flags", >2018-01-06), and 0 becomes the right value to pass at this caller to >say "I am passing none of the flag bit". >I am hoping that the series that ends at f3b11d54 ("convert: add >support for 'checkout-encoding' attribute", 2018-01-06) will be >rerolled and hit 'master' early in the next cycle. Thanks for the report & suggested patch. After reading it, I suggest to break out the enum/int fix into an own "series". apply.c| 6 +++--- combine-diff.c | 2 +- config.c | 7 +-- convert.c | 38 +++--- convert.h | 17 +++-- diff.c | 8 environment.c | 2 +- sha1_file.c| 12 ++-- 8 files changed, 46 insertions(+), 46 deletions(-) diff --git a/apply.c b/apply.c index 321a9fa68..f8b67bfee 100644 --- a/apply.c +++ b/apply.c @@ -2263,8 +2263,8 @@ static void show_stats(struct apply_state *state, struct patch *patch) static int read_old_data(struct stat *st, struct patch *patch, const char *path, struct strbuf *buf) { - enum safe_crlf safe_crlf = patch->crlf_in_old ? - SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; + int conv_flags = patch->crlf_in_old ? + CONV_EOL_KEEP_CRLF : CONV_EOL_RENORMALIZE; switch (st->st_mode & S_IFMT) { case S_IFLNK: if (strbuf_readlink(buf, path, st->st_size) < 0) @@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch *patch, * should never look at the index when explicit crlf option * is given. */ - convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf); + convert_to_git(NULL, path, buf->buf, buf->len, buf, conv_flags); return 0; default: return -1; diff --git a/combine-diff.c b/combine-diff.c index 2505de119..19f30c335 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -1053,7 +1053,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, if (is_file) { struct strbuf buf = STRBUF_INIT; - if (convert_to_git(&the_index, elem->path, result, len, &buf, safe_crlf)) { + if (convert_to_git(&the_index, elem->path, result, len, &buf, global_conv_flags_eol)) { free(result); result = strbuf_detach(&buf, &len); result_size = len; diff --git a/config.c b/config.c index e617c2018..1f003fbb9 100644 --- a/config.c +++ b/config.c @@ -1149,11 +1149,14 @@ static int git_default_core_config(const char *var, const char *value) } if (!strcmp(var, "core.safecrlf")) { + int eol_rndtrp_die; if (value && !strcasecmp(value, "warn")) { - safe_crlf = SAFE_CRLF_WARN; + global_conv_flags_eol = CONV_EOL_RNDTRP_WARN; return 0; } - safe_crlf = git_config_bool(var, value); + eol_rndtrp_die = git_config_bool(var, value); + global_conv_flags_eol = eol_rndtrp_die ? + CONV_EOL_RNDTRP_DIE : CONV_EOL_RNDTRP_WARN; return 0; } diff --git a/convert.c b/convert.c index 1a41a48e1..b976eb968 100644 --- a/convert.c +++ b/convert.c @@ -193,30 +193,30 @@ static enum eol output_eol(enum crlf_action crlf_action) return core_eol; } -static void check_safe_crlf(cons
[PATCH v3 1/1] convert_to_git(): checksafe becomes int conv_flags
From: Torsten Bögershausen When calling convert_to_git(), the checksafe parameter has been used to check if commit would give a non-roundtrip conversion of EOL. When checksafe was introduced, 3 values had been in use: SAFE_CRLF_FALSE: no warning SAFE_CRLF_FAIL: reject the commit if EOL do not roundtrip SAFE_CRLF_WARN: warn the user if EOL do not roundtrip Already today the integer value 0 is passed as the parameter checksafe instead of the correct enum value SAFE_CRLF_FALSE. Turn the whole call chain to use an integer with single bits, which can be extended in the next commits: - The global configuration variable safe_crlf is now conv_flags_eol. - The parameter checksafe is renamed into conv_flags. Helped-By: Lars Schneider Signed-off-by: Torsten Bögershausen --- This is my suggestion. (1) The flag bits had been renamed. (2) The (theoretical ?) mix of WARN/FAIL is still there, I am not sure if this is a real problem. (3) There are 2 reasons that CONV_EOL_RENORMALIZE is set. Either in a renormalizing merge, or by running git add --renormalize . Therefor HASH_RENORMALIZE is not the same as CONV_EOL_RENORMALIZE. apply.c| 6 +++--- combine-diff.c | 2 +- config.c | 7 +-- convert.c | 38 +++--- convert.h | 17 +++-- diff.c | 8 environment.c | 2 +- sha1_file.c| 12 ++-- 8 files changed, 46 insertions(+), 46 deletions(-) diff --git a/apply.c b/apply.c index 321a9fa68d..f8b67bfee2 100644 --- a/apply.c +++ b/apply.c @@ -2263,8 +2263,8 @@ static void show_stats(struct apply_state *state, struct patch *patch) static int read_old_data(struct stat *st, struct patch *patch, const char *path, struct strbuf *buf) { - enum safe_crlf safe_crlf = patch->crlf_in_old ? - SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; + int conv_flags = patch->crlf_in_old ? + CONV_EOL_KEEP_CRLF : CONV_EOL_RENORMALIZE; switch (st->st_mode & S_IFMT) { case S_IFLNK: if (strbuf_readlink(buf, path, st->st_size) < 0) @@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch *patch, * should never look at the index when explicit crlf option * is given. */ - convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf); + convert_to_git(NULL, path, buf->buf, buf->len, buf, conv_flags); return 0; default: return -1; diff --git a/combine-diff.c b/combine-diff.c index 2505de119a..dbc877d0fe 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -1053,7 +1053,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, if (is_file) { struct strbuf buf = STRBUF_INIT; - if (convert_to_git(&the_index, elem->path, result, len, &buf, safe_crlf)) { + if (convert_to_git(&the_index, elem->path, result, len, &buf, conv_flags_eol)) { free(result); result = strbuf_detach(&buf, &len); result_size = len; diff --git a/config.c b/config.c index e617c2018d..bdc7ce2a7e 100644 --- a/config.c +++ b/config.c @@ -1149,11 +1149,14 @@ static int git_default_core_config(const char *var, const char *value) } if (!strcmp(var, "core.safecrlf")) { + int eol_rndtrp_die; if (value && !strcasecmp(value, "warn")) { - safe_crlf = SAFE_CRLF_WARN; + conv_flags_eol = CONV_EOL_RNDTRP_WARN; return 0; } - safe_crlf = git_config_bool(var, value); + eol_rndtrp_die = git_config_bool(var, value); + conv_flags_eol = eol_rndtrp_die ? + CONV_EOL_RNDTRP_DIE : CONV_EOL_RNDTRP_WARN; return 0; } diff --git a/convert.c b/convert.c index 1a41a48e15..0207ddab24 100644 --- a/convert.c +++ b/convert.c @@ -193,30 +193,30 @@ static enum eol output_eol(enum crlf_action crlf_action) return core_eol; } -static void check_safe_crlf(const char *path, enum crlf_action crlf_action, +static void check_conv_flags_eol(const char *path, enum crlf_action crlf_action, struct text_stat *old_stats, struct text_stat *new_stats, - enum safe_crlf checksafe) + int conv_flags) { if (old_stats->crlf && !new_stats->crlf ) { /* * CRLFs would not be restored by checkout */ - if (checksafe == SAFE_CRLF_WARN) + if (conv_flags & CONV_EOL_RNDTRP_DIE) + die(_("CRLF would be replaced by LF in %s."), pa
[PATCH 2/5] strbuf: add xstrdup_toupper()
From: Lars Schneider Create a copy of an existing string and make all characters upper case. Similar xstrdup_tolower(). This function is used in a subsequent commit. Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- strbuf.c | 13 + strbuf.h | 1 + 2 files changed, 14 insertions(+) diff --git a/strbuf.c b/strbuf.c index 8007be8fba..ee05626dc1 100644 --- a/strbuf.c +++ b/strbuf.c @@ -785,6 +785,19 @@ char *xstrdup_tolower(const char *string) return result; } +char *xstrdup_toupper(const char *string) +{ + char *result; + size_t len, i; + + len = strlen(string); + result = xmallocz(len); + for (i = 0; i < len; i++) + result[i] = toupper(string[i]); + result[i] = '\0'; + return result; +} + char *xstrvfmt(const char *fmt, va_list ap) { struct strbuf buf = STRBUF_INIT; diff --git a/strbuf.h b/strbuf.h index 14c8c10d66..df7ced53ed 100644 --- a/strbuf.h +++ b/strbuf.h @@ -607,6 +607,7 @@ __attribute__((format (printf,2,3))) extern int fprintf_ln(FILE *fp, const char *fmt, ...); char *xstrdup_tolower(const char *); +char *xstrdup_toupper(const char *); /** * Create a newly allocated string using printf format. You can do this easily -- 2.16.0.rc0.4.ga4e00d4fa4
[PATCH 4/5] utf8: add function to detect a missing UTF-16/32 BOM
From: Lars Schneider If the endianness is not defined in the encoding name, then let's be strict and require a BOM to avoid any encoding confusion. The has_missing_utf_bom() function returns true if a required BOM is missing. The Unicode standard instructs to assume big-endian if there in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used in HTML5 recommends to assume little-endian to "deal with deployed content" [3]. Strictly requiring a BOM seems to be the safest option for content in Git. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#gen6 [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf Section 3.10, D98, page 132 [3] https://encoding.spec.whatwg.org/#utf-16le Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- utf8.c | 13 + utf8.h | 16 2 files changed, 29 insertions(+) diff --git a/utf8.c b/utf8.c index 776660ee12..1978d6c42a 100644 --- a/utf8.c +++ b/utf8.c @@ -562,6 +562,19 @@ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) ); } +int has_missing_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + !strcmp(enc, "UTF-16") && + !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || +has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + !strcmp(enc, "UTF-32") && + !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || +has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 4711429af9..26b5e91852 100644 --- a/utf8.h +++ b/utf8.h @@ -79,4 +79,20 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid */ int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); +/* + * If the endianness is not defined in the encoding name, then we + * require a BOM. The function returns true if a required BOM is missing. + * + * The Unicode standard instructs to assume big-endian if there + * in no BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG + * encoding standard used in HTML5 recommends to assume + * little-endian to "deal with deployed content" [3]. + * + * [1] http://unicode.org/faq/utf_bom.html#gen6 + * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf + * Section 3.10, D98, page 132 + * [3] https://encoding.spec.whatwg.org/#utf-16le + */ +int has_missing_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.16.0.rc0.4.ga4e00d4fa4
[PATCH 0/5] V2B: simplify convert.c/h
From: Torsten Bögershausen Simplify the convert.h/convert.c logic amd don't touch convert_to_git() The rest is v2 from Lars Lars Schneider (4): strbuf: add xstrdup_toupper() utf8: add function to detect prohibited UTF-16/32 BOM utf8: add function to detect a missing UTF-16/32 BOM convert: add support for 'checkout-encoding' attribute Torsten Bögershausen (1): convert_to_git(): checksafe becomes an integer Documentation/gitattributes.txt | 59 +++ apply.c | 4 +- convert.c | 210 +--- convert.h | 19 ++-- diff.c | 4 +- environment.c | 2 +- sha1_file.c | 8 +- strbuf.c| 13 +++ strbuf.h| 1 + t/t0028-checkout-encoding.sh| 197 + utf8.c | 37 +++ utf8.h | 25 + 12 files changed, 549 insertions(+), 30 deletions(-) create mode 100755 t/t0028-checkout-encoding.sh -- 2.16.0.rc0.4.ga4e00d4fa4
[PATCH 3/5] utf8: add function to detect prohibited UTF-16/32 BOM
From: Lars Schneider Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used [1]. The function returns true if this is the case. This function is used in a subsequent commit. [1] http://unicode.org/faq/utf_bom.html#bom10 Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- utf8.c | 24 utf8.h | 9 + 2 files changed, 33 insertions(+) diff --git a/utf8.c b/utf8.c index 2c27ce0137..776660ee12 100644 --- a/utf8.c +++ b/utf8.c @@ -538,6 +538,30 @@ char *reencode_string_len(const char *in, int insz, } #endif +static int has_bom_prefix(const char *data, size_t len, + const char *bom, size_t bom_len) +{ + return (len >= bom_len) && !memcmp(data, bom, bom_len); +} + +const char utf16_be_bom[] = {0xFE, 0xFF}; +const char utf16_le_bom[] = {0xFF, 0xFE}; +const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF}; +const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00}; + +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len) +{ + return ( + (!strcmp(enc, "UTF-16BE") || !strcmp(enc, "UTF-16LE")) && + (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) || + has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom))) + ) || ( + (!strcmp(enc, "UTF-32BE") || !strcmp(enc, "UTF-32LE")) && + (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) || + has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom))) + ); +} + /* * Returns first character length in bytes for multi-byte `text` according to * `encoding`. diff --git a/utf8.h b/utf8.h index 6bbcf31a83..4711429af9 100644 --- a/utf8.h +++ b/utf8.h @@ -70,4 +70,13 @@ typedef enum { void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int width, const char *s); +/* + * Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE + * or UTF-32LE a BOM must not be used [1]. The function returns true if + * this is the case. + * + * [1] http://unicode.org/faq/utf_bom.html#bom10 + */ +int has_prohibited_utf_bom(const char *enc, const char *data, size_t len); + #endif -- 2.16.0.rc0.4.ga4e00d4fa4
[PATCH 5/5] convert: add support for 'checkout-encoding' attribute
From: Lars Schneider Git and its tools (e.g. git diff) expect all text files in UTF-8 encoding. Git will happily accept content in all other encodings, too, but it might not be able to process the text (e.g. viewing diffs or changing line endings). Add an attribute to tell Git what encoding the user has defined for a given file. If the content is added to the index, then Git converts the content to a canonical UTF-8 representation. On checkout Git will reverse the conversion. Signed-off-by: Lars Schneider Signed-off-by: Torsten Bögershausen --- Documentation/gitattributes.txt | 59 convert.c | 190 +- convert.h | 11 ++- sha1_file.c | 2 +- t/t0028-checkout-encoding.sh| 197 5 files changed, 452 insertions(+), 7 deletions(-) create mode 100755 t/t0028-checkout-encoding.sh diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 30687de81a..0039bd38c3 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -272,6 +272,65 @@ few exceptions. Even though... catch potential problems early, safety triggers. +`checkout-encoding` +^^^ + +Git recognizes files encoded with ASCII or one of its supersets (e.g. +UTF-8 or ISO-8859-1) as text files. All other encodings are usually +interpreted as binary and consequently built-in Git text processing +tools (e.g. 'git diff') as well as most Git web front ends do not +visualize the content. + +In these cases you can teach Git the encoding of a file in the working +directory with the `checkout-encoding` attribute. If a file with this +attributes is added to Git, then Git reencodes the content from the +specified encoding to UTF-8 and stores the result in its internal data +structure. On checkout the content is encoded back to the specified +encoding. + +Please note that using the `checkout-encoding` attribute has a number +of drawbacks: + +- Reencoding content to non-UTF encodings (e.g. SHIFT-JIS) can cause + errors as the conversion might not be round trip safe. + +- Reencoding content requires resources that might slow down certain + Git operations (e.g 'git checkout' or 'git add'). + +- Git clients that do not support the `checkout-encoding` attribute or + the used encoding will checkout the respective files as UTF-8 encoded. + That means the content appears to be different which could cause + trouble. Affected clients are older Git versions and alternative Git + implementations such as JGit or libgit2 (as of January 2018). + +Use the `checkout-encoding` attribute only if you cannot store a file in +UTF-8 encoding and if you want Git to be able to process the content as +text. + +Use the following attributes if your '*.txt' files are UTF-16 encoded +with byte order mark (BOM) and you want Git to perform automatic line +ending conversion based on your platform. + + +*.txt text checkout-encoding=UTF-16 + + +Use the following attributes if your '*.txt' files are UTF-16 little +endian encoded without BOM and you want Git to use Windows line endings +in the working directory. + + +*.txt checkout-encoding=UTF-16LE text eol=CRLF + + +You can get a list of all available encodings on your platform with the +following command: + + +iconv --list + + + `ident` ^^^ diff --git a/convert.c b/convert.c index 5efcc3b73b..22c70d87e5 100644 --- a/convert.c +++ b/convert.c @@ -7,6 +7,7 @@ #include "sigchain.h" #include "pkt-line.h" #include "sub-process.h" +#include "utf8.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -265,6 +266,147 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, } +static struct encoding { + const char *name; + struct encoding *next; +} *encoding, **encoding_tail; +static const char *default_encoding = "UTF-8"; + +static int encode_to_git(const char *path, const char *src, size_t src_len, +struct strbuf *buf, struct encoding *enc, int die_on_failure) +{ + char *dst; + int dst_len; + + /* +* No encoding is specified or there is nothing to encode. +* Tell the caller that the content was not modified. +*/ + if (!enc || (src && !src_len)) + return 0; + + /* +* Looks like we got called from "would_convert_to_git()". +* This means Git wants to know if it would encode (= modify!) +* the content. Let's answer with "yes", since an encoding was +* specified. +*/ + if (!buf && !src) + return 1; + + if (has_prohibited_utf_bom(enc->name, src, src_len)) { + const char *error_msg = _( + "BOM
[PATCH 1/5] convert_to_git(): checksafe becomes an integer
From: Torsten Bögershausen When calling convert_to_git(), the checksafe parameter has been used to check if commit would give a non-roundtrip conversion of EOL. When checksafe was introduced, 3 values had been in use: SAFE_CRLF_FALSE: no warning SAFE_CRLF_FAIL: reject the commit if EOL do not roundtrip SAFE_CRLF_WARN: warn the user if EOL do not roundtrip Today a small flaw is found in the code base: An integer with the value 0 is passed as the parameter checksafe instead of the correct enum value SAFE_CRLF_FALSE. In the next commit there is a need to turn checksafe into a bitmap, which allows to tell convert_to_git() to obey the encoding attribute or not. Signed-off-by: Torsten Bögershausen --- apply.c | 4 ++-- convert.c | 20 ++-- convert.h | 18 -- diff.c| 4 ++-- environment.c | 2 +- sha1_file.c | 6 +++--- 6 files changed, 26 insertions(+), 28 deletions(-) diff --git a/apply.c b/apply.c index 321a9fa68d..a422516062 100644 --- a/apply.c +++ b/apply.c @@ -2263,7 +2263,7 @@ static void show_stats(struct apply_state *state, struct patch *patch) static int read_old_data(struct stat *st, struct patch *patch, const char *path, struct strbuf *buf) { - enum safe_crlf safe_crlf = patch->crlf_in_old ? + int checksafe = patch->crlf_in_old ? SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; switch (st->st_mode & S_IFMT) { case S_IFLNK: @@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch *patch, * should never look at the index when explicit crlf option * is given. */ - convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf); + convert_to_git(NULL, path, buf->buf, buf->len, buf, checksafe); return 0; default: return -1; diff --git a/convert.c b/convert.c index 1a41a48e15..5efcc3b73b 100644 --- a/convert.c +++ b/convert.c @@ -195,13 +195,13 @@ static enum eol output_eol(enum crlf_action crlf_action) static void check_safe_crlf(const char *path, enum crlf_action crlf_action, struct text_stat *old_stats, struct text_stat *new_stats, - enum safe_crlf checksafe) + int checksafe) { if (old_stats->crlf && !new_stats->crlf ) { /* * CRLFs would not be restored by checkout */ - if (checksafe == SAFE_CRLF_WARN) + if (checksafe & SAFE_CRLF_WARN) warning(_("CRLF will be replaced by LF in %s.\n" "The file will have its original line" " endings in your working directory."), path); @@ -211,7 +211,7 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action, /* * CRLFs would be added by checkout */ - if (checksafe == SAFE_CRLF_WARN) + if (checksafe & SAFE_CRLF_WARN) warning(_("LF will be replaced by CRLF in %s.\n" "The file will have its original line" " endings in your working directory."), path); @@ -268,7 +268,7 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, static int crlf_to_git(const struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *buf, - enum crlf_action crlf_action, enum safe_crlf checksafe) + enum crlf_action crlf_action, int checksafe) { struct text_stat stats; char *dst; @@ -298,12 +298,12 @@ static int crlf_to_git(const struct index_state *istate, * unless we want to renormalize in a merge or * cherry-pick. */ - if ((checksafe != SAFE_CRLF_RENORMALIZE) && + if ((!(checksafe & SAFE_CRLF_RENORMALIZE)) && has_crlf_in_index(istate, path)) convert_crlf_into_lf = 0; } - if ((checksafe == SAFE_CRLF_WARN || - (checksafe == SAFE_CRLF_FAIL)) && len) { + if (((checksafe & SAFE_CRLF_WARN) || +((checksafe & SAFE_CRLF_FAIL) && len))) { struct text_stat new_stats; memcpy(&new_stats, &stats, sizeof(new_stats)); /* simulate "git add" */ @@ -1129,7 +1129,7 @@ const char *get_convert_attr_ascii(const char *path) int convert_to_git(const struct index_state *istate, const char *path, const char *src, size_t len, - struct strbuf *dst, enum safe_crlf checksafe) + struct strbuf *dst, int checksafe) { int ret = 0; struct conv_attrs ca; @@ -1144,7 +1144,7 @@ in
[PATCH/RFC 0/2] git diff --UTF-8
From: Torsten Bögershausen RFC patch: convert files from e.g. UTF-16 into UTF-8 while running "git diff". The diff must be called with "git diff --UTF-8" and the "encoding" attribute must be set for the file(s). The commit messages may need some improvements, and a closer look at diff.c, how command line options are forwared, is appreciated. It may even be possible to integrate t4066 somewhere... Torsten Bögershausen (2): convert_to_git(): checksafe becomes an integer git diff: Allow to reencode into UTF-8 Documentation/diff-options.txt | 4 ++ Documentation/gitattributes.txt | 9 + apply.c | 4 +- convert.c | 60 +++- convert.h | 20 +- diff.c | 40 +-- diff.h | 1 + diffcore.h | 3 ++ environment.c | 2 +- sha1_file.c | 6 +-- t/t4066-diff-encoding.sh| 86 + 11 files changed, 205 insertions(+), 30 deletions(-) create mode 100755 t/t4066-diff-encoding.sh -- 2.15.1.271.g1a4e40aa5d
[PATCH/RFC 2/2] git diff: Allow to reencode into UTF-8
From: Torsten Bögershausen When blobs are encoded in UTF-16, `git diff` will treat them as binary. Make it possible to show a user readable diff encoded in UTF-8. This allows to run git diff and feed the into a web sever. Improve Git to look at the "encodig" attribute and to reencode the content into UTF-8 before running the diff itself. Signed-off-by: Torsten Bögershausen --- Documentation/diff-options.txt | 4 ++ Documentation/gitattributes.txt | 9 + convert.c | 40 +++ convert.h | 2 + diff.c | 38 -- diff.h | 1 + diffcore.h | 3 ++ t/t4066-diff-encoding.sh| 86 + 8 files changed, 180 insertions(+), 3 deletions(-) create mode 100755 t/t4066-diff-encoding.sh diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 9d1586b956..bf2f115f11 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -629,6 +629,10 @@ endif::git-format-patch[] linkgit:git-log[1], but not for linkgit:git-format-patch[1] or diff plumbing commands. +--UTF-8:: + Git converts the content into UTF-8 before running the diff when the + "encoding" attribute is defined. See linkgit:gitattributes[5] + --ignore-submodules[=]:: Ignore changes to submodules in the diff generation. can be either "none", "untracked", "dirty" or "all", which is the default. diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 30687de81a..753a7c39b7 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -881,6 +881,15 @@ advantages to choosing this method: 3. Caching. Textconv caching can speed up repeated diffs, such as those you might trigger by running `git log -p`. +Running diff on UTF-16 encoded files + + +Git can convert UTF-16 encoded into UTF-8 before they are feed +into the diff machinery: `diff --UTF-8 file.xxx`. + + +file.xxx encoding=UTF-16 + Marking files as binary ^^^ diff --git a/convert.c b/convert.c index 5efcc3b73b..45577ce504 100644 --- a/convert.c +++ b/convert.c @@ -7,6 +7,7 @@ #include "sigchain.h" #include "pkt-line.h" #include "sub-process.h" +#include "utf8.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -734,6 +735,34 @@ static struct convert_driver { int required; } *user_convert, **user_convert_tail; +const char *get_encoding_attr(const char *path) +{ + static struct attr_check *check; + if (!check) + check = attr_check_initl("encoding", NULL); + if (!git_check_attr(path, check)) { + struct attr_check_item *ccheck = check->items; + const char *value; + value = ccheck->value; + if (ATTR_UNSET(value)) + return NULL; + return value; + } + return NULL; +} + +static int reencode_into_strbuf(const char *path, const char *src, size_t len, + struct strbuf *dst, const char *encoding) +{ + int outsz = 0; + char *buf; + buf = reencode_string_len(src, (int)len, "UTF-8", encoding, &outsz); + if (!buf) + return 0; + strbuf_attach(dst, buf, outsz, outsz); + return SAFE_CRLF_REENCODE; +} + static int apply_filter(const char *path, const char *src, size_t len, int fd, struct strbuf *dst, struct convert_driver *drv, const unsigned int wanted_capability, @@ -1136,6 +1165,17 @@ int convert_to_git(const struct index_state *istate, convert_attrs(&ca, path); + if (checksafe & SAFE_CRLF_REENCODE) { + const char *encoding = get_encoding_attr(path); + if (encoding) { + ret |= reencode_into_strbuf(path, src, len, dst, + encoding); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } + } + } ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL); if (!ret && ca.drv && ca.drv->required) die("%s: clean filter '%s' failed", path, ca.drv->name); diff --git a/convert.h b/convert.h index 532af00423..0b093715c9 100644 --- a/convert.h +++ b/convert.h @@ -13,6 +13,7 @@ struct index_state; #define SAFE_CRLF_WARN(1<<1) #define SAFE_CRLF_RENORMALIZE (1<<2) #define SAFE_CRLF_KEEP_CRLF (1<<3) +#define SAFE_CRLF_REENCODE(1<<4) extern int safe_crlf; @@ -60,6 +61,7 @@ extern const char *get_cached_convert_stats_ascii(const struct index_state *ista
[PATCH/RFC 1/2] convert_to_git(): checksafe becomes an integer
From: Torsten Bögershausen When calling convert_to_git(), the checksafe parameter has been used to check if commit would give a non-roundtrip conversion of EOL. When checksafe was introduced, 3 values had been in use: SAFE_CRLF_FALSE: no warning SAFE_CRLF_FAIL: reject the commit if EOL do not roundtrip SAFE_CRLF_WARN: warn the user if EOL do not roundtrip Today a small flaw is found in the code base: An integer with the value 0 is passed as the parameter checksafe instead of the correct enum value SAFE_CRLF_FALSE. In the next commit there is a need to turn checksafe into a bitmap, which allows to tell convert_to_git() to obey the encoding attribute or not. Signed-off-by: Torsten Bögershausen --- apply.c | 4 ++-- convert.c | 20 ++-- convert.h | 18 -- diff.c| 4 ++-- environment.c | 2 +- sha1_file.c | 6 +++--- 6 files changed, 26 insertions(+), 28 deletions(-) diff --git a/apply.c b/apply.c index 321a9fa68d..a422516062 100644 --- a/apply.c +++ b/apply.c @@ -2263,7 +2263,7 @@ static void show_stats(struct apply_state *state, struct patch *patch) static int read_old_data(struct stat *st, struct patch *patch, const char *path, struct strbuf *buf) { - enum safe_crlf safe_crlf = patch->crlf_in_old ? + int checksafe = patch->crlf_in_old ? SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; switch (st->st_mode & S_IFMT) { case S_IFLNK: @@ -2281,7 +2281,7 @@ static int read_old_data(struct stat *st, struct patch *patch, * should never look at the index when explicit crlf option * is given. */ - convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf); + convert_to_git(NULL, path, buf->buf, buf->len, buf, checksafe); return 0; default: return -1; diff --git a/convert.c b/convert.c index 1a41a48e15..5efcc3b73b 100644 --- a/convert.c +++ b/convert.c @@ -195,13 +195,13 @@ static enum eol output_eol(enum crlf_action crlf_action) static void check_safe_crlf(const char *path, enum crlf_action crlf_action, struct text_stat *old_stats, struct text_stat *new_stats, - enum safe_crlf checksafe) + int checksafe) { if (old_stats->crlf && !new_stats->crlf ) { /* * CRLFs would not be restored by checkout */ - if (checksafe == SAFE_CRLF_WARN) + if (checksafe & SAFE_CRLF_WARN) warning(_("CRLF will be replaced by LF in %s.\n" "The file will have its original line" " endings in your working directory."), path); @@ -211,7 +211,7 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action, /* * CRLFs would be added by checkout */ - if (checksafe == SAFE_CRLF_WARN) + if (checksafe & SAFE_CRLF_WARN) warning(_("LF will be replaced by CRLF in %s.\n" "The file will have its original line" " endings in your working directory."), path); @@ -268,7 +268,7 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, static int crlf_to_git(const struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *buf, - enum crlf_action crlf_action, enum safe_crlf checksafe) + enum crlf_action crlf_action, int checksafe) { struct text_stat stats; char *dst; @@ -298,12 +298,12 @@ static int crlf_to_git(const struct index_state *istate, * unless we want to renormalize in a merge or * cherry-pick. */ - if ((checksafe != SAFE_CRLF_RENORMALIZE) && + if ((!(checksafe & SAFE_CRLF_RENORMALIZE)) && has_crlf_in_index(istate, path)) convert_crlf_into_lf = 0; } - if ((checksafe == SAFE_CRLF_WARN || - (checksafe == SAFE_CRLF_FAIL)) && len) { + if (((checksafe & SAFE_CRLF_WARN) || +((checksafe & SAFE_CRLF_FAIL) && len))) { struct text_stat new_stats; memcpy(&new_stats, &stats, sizeof(new_stats)); /* simulate "git add" */ @@ -1129,7 +1129,7 @@ const char *get_convert_attr_ascii(const char *path) int convert_to_git(const struct index_state *istate, const char *path, const char *src, size_t len, - struct strbuf *dst, enum safe_crlf checksafe) + struct strbuf *dst, int checksafe) { int ret = 0; struct conv_attrs ca; @@ -1144,7 +1144,7 @@ in
[PATCH v3 1/1] check-non-portable-shell.pl: Quoted `wc -l` is not portable
From: Torsten Bögershausen wc -l was used to count the number if lines in test scripts. $ wc -l Makefile gives a line like this: 105 Makefile while Mac OS has 4 leading spaces: 105 Makefile And this means that shell expressions like test "$(wc -l
[PATCH v2 1/1] check-non-portable-shell.pl: Quoted `wc -l` is not portable
From: Torsten Bögershausen wc -l was used to count the number if lines in test scripts. $ wc -l Makefile gives a line like this: 105 Makefile while Mac OS has 4 leading spaces: 105 Makefile And this means that shell expressions like test "$(wc -l
[PATCH v1 1/1] check-non-portable-shell.pl: Quoted `wc -l` is not portable
From: Torsten Bögershausen wc -l is used to count the number if lines in test scripts. $ wc -l Makefile gives a line like this: 105 Makefile while Mac OS has 4 leading spaces: 105 Makefile And this means that shell expressions like test "$(wc -l
[PATCH v1 1/2] t0027: Don't use git commit
From: Torsten Bögershausen Replace `git commit -m "comment" ""` with `git commit -m "comment"` to remove the empty path spec. Signed-off-by: Torsten Bögershausen --- t/t0027-auto-crlf.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index c2c208fdcd..97154f5c79 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -370,7 +370,7 @@ test_expect_success 'setup master' ' echo >.gitattributes && git checkout -b master && git add .gitattributes && - git commit -m "add .gitattributes" "" && + git commit -m "add .gitattributes" && printf "\$Id: \$\nLINEONE\nLINETWO\nLINETHREE" >LF && printf "\$Id: \$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF && printf "\$Id: \$\nLINEONE\r\nLINETWO\nLINETHREE" >CRLF_mix_LF && -- 2.15.1.271.g1a4e40aa5d
[PATCH v1 2/2] t0027: Adapt the new MIX tests to Windows
From: Torsten Bögershausen The new MIX tests don't pass under Windows, adapt them to use the correct native line ending. Signed-off-by: Torsten Bögershausen --- Sorry for the breakage. This needs to go on top of tb/check-crlf-for-safe-crlf t/t0027-auto-crlf.sh | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 97154f5c79..8d929b76dc 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -170,22 +170,22 @@ commit_MIX_chkwrn () { git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err" done - test_expect_success "commit file with mixed EOL crlf=$crlf attr=$attr LF" ' + test_expect_success "commit file with mixed EOL onto LF crlf=$crlf attr=$attr" ' check_warning "$lfwarn" ${pfx}_LF.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf CRLF" ' + test_expect_success "commit file with mixed EOL onto CLRF attr=$attr aeol=$aeol crlf=$crlf" ' check_warning "$crlfwarn" ${pfx}_CRLF.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf CRLF_mix_LF" ' + test_expect_success "commit file with mixed EOL onto CRLF_mix_LF attr=$attr aeol=$aeol crlf=$crlf" ' check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf LF_mix_cr" ' + test_expect_success "commit file with mixed EOL onto LF_mix_cr attr=$attr aeol=$aeol crlf=$crlf " ' check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf CRLF_nul" ' + test_expect_success "commit file with mixed EOL onto CRLF_nul attr=$attr aeol=$aeol crlf=$crlf" ' check_warning "$crlfnul" ${pfx}_CRLF_nul.err ' } @@ -378,7 +378,7 @@ test_expect_success 'setup master' ' printf "\$Id: \$\r\nLINEONE\r\nLINETWO\rLINETHREE" >CRLF_mix_CR && printf "\$Id: \$\r\nLINEONEQ\r\nLINETWO\r\nLINETHREE" | q_to_nul >CRLF_nul && printf "\$Id: \$\nLINEONEQ\nLINETWO\nLINETHREE" | q_to_nul >LF_nul && - create_NNO_MIX_files CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF && + create_NNO_MIX_files && git -c core.autocrlf=false add NNO_*.txt MIX_*.txt && git commit -m "mixed line endings" && test_tick @@ -441,13 +441,14 @@ test_expect_success 'commit files attr=crlf' ' ' # Commit "CRLFmixLF" on top of these files already in the repo: -# LF, CRLF, CRLFmixLF LF_mix_CR CRLFNULL +# mixed mixed mixed mixed mixed +# onto onto ontoonto onto # attrLFCRLF CRLFmixLF LF_mix_CR CRLFNUL commit_MIX_chkwrn "" "" false """""" "" "" commit_MIX_chkwrn "" "" true"LF_CRLF" """" "LF_CRLF" "LF_CRLF" commit_MIX_chkwrn "" "" input "CRLF_LF" """" "CRLF_LF" "CRLF_LF" -commit_MIX_chkwrn "auto" "" false "CRLF_LF" """" "CRLF_LF" "CRLF_LF" +commit_MIX_chkwrn "auto" "" false "$WAMIX" """" "$WAMIX""$WAMIX" commit_MIX_chkwrn "auto" "" true"LF_CRLF" """" "LF_CRLF" "LF_CRLF" commit_MIX_chkwrn "auto" "" input "CRLF_LF" """" "CRLF_LF" "CRLF_LF" -- 2.15.1.271.g1a4e40aa5d
[PATCH v2 1/1] convert: tighten the safe autocrlf handling
From: Torsten Bögershausen When a text file had been commited with CRLF and the file is commited again, the CRLF are kept if .gitattributs has "text=auto". This is done by analyzing the content of the blob stored in the index: If a '\r' is found, Git assumes that the blob was commited with CRLF. The simple search for a '\r' does not always work as expected: A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary. Now the content is converted into UTF-8. At the next commit Git treats the file as text, the CRLF should be converted into LF, but isn't. Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found, 0 is returned directly, this is the most common case. If a '\r' is found, the content is analyzed more deeply. Reported-By: Ashish Negi Signed-off-by: Torsten Bögershausen --- Changes against v1: - change "if (crp && (crp[1] == '\n'))" to "if (crp)" (Thanks Eric. The new patch is more straightforward, and no risk to read out of data) - Remove the "Solution:" in the commit message Note: The original function has_cr_in_index() is from this commit: c4805393d73 (Finn Arne Gangstad 2010-05-12 00:37:57 +0200 225) And has this info: >Change autocrlf to not do any conversions to files that in the >repository already contain a CR. git with autocrlf set will never >create such a file, or change a LF only file to contain CRs, so the >(new) assumption is that if a file contains a CR, it is intentional, >and autocrlf should not change that. So the original assumption was slightly optimistic (but did work in 7 years) convert.c| 19 + t/t0027-auto-crlf.sh | 76 2 files changed, 85 insertions(+), 10 deletions(-) diff --git a/convert.c b/convert.c index 20d7ab67bd..1a41a48e15 100644 --- a/convert.c +++ b/convert.c @@ -220,18 +220,27 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action, } } -static int has_cr_in_index(const struct index_state *istate, const char *path) +static int has_crlf_in_index(const struct index_state *istate, const char *path) { unsigned long sz; void *data; - int has_cr; + const char *crp; + int has_crlf = 0; data = read_blob_data_from_index(istate, path, &sz); if (!data) return 0; - has_cr = memchr(data, '\r', sz) != NULL; + + crp = memchr(data, '\r', sz); + if (crp) { + unsigned int ret_stats; + ret_stats = gather_convert_stats(data, sz); + if (!(ret_stats & CONVERT_STAT_BITS_BIN) && + (ret_stats & CONVERT_STAT_BITS_TXT_CRLF)) + has_crlf = 1; + } free(data); - return has_cr; + return has_crlf; } static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, @@ -290,7 +299,7 @@ static int crlf_to_git(const struct index_state *istate, * cherry-pick. */ if ((checksafe != SAFE_CRLF_RENORMALIZE) && - has_cr_in_index(istate, path)) + has_crlf_in_index(istate, path)) convert_crlf_into_lf = 0; } if ((checksafe == SAFE_CRLF_WARN || diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 68108d956a..0af35cfb1f 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -43,19 +43,31 @@ create_gitattributes () { } >.gitattributes } -create_NNO_files () { +# Create 2 sets of files: +# The NNO files are "Not NOrmalized in the repo. We use CRLF_mix_LF and store +# it under different names for the different test cases, see ${pfx} +# Depending on .gitattributes they are normalized at the next commit (or not) +# The MIX files have different contents in the repo. +# Depending on its contents, the "new safer autocrlf" may kick in. +create_NNO_MIX_files () { for crlf in false true input do for attr in "" auto text -text do for aeol in "" lf crlf do - pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf} + pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf} && cp CRLF_mix_LF ${pfx}_LF.txt && cp CRLF_mix_LF ${pfx}_CRLF.txt && cp CRLF_mix_LF ${pfx}_CRLF_mix_LF.txt && cp CRLF_mix_LF ${pfx}_LF_mix_CR.txt && - cp CRLF_mix_LF ${pfx}_CRLF_nul.txt + cp CRLF_mix_LF ${pfx}_CRLF_nul.txt && + pfx=MIX_attr_${attr}_aeol_${aeol}_${crlf} && + cp LF ${pfx}_LF.txt && + cp CRLF${pfx}_CRLF.txt && + cp CRLF_mix_LF ${pfx}_CRLF_mix_LF.t
[PATCH 1/1] convert: tighten the safe autocrlf handling
From: Torsten Bögershausen When a text file had been commited with CRLF and the file is commited again, the CRLF are kept if .gitattributs has "text=auto". This is done by analyzing the content of the blob stored in the index: If a '\r' is found, Git assumes that the blob was commited with CRLF. The simple search for a '\r' does not always work as expected: A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary. Now the content is converted into UTF-8. At the next commit Git treats the file as text, the CRLF should be converted into LF, but isn't. Solution: Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found, 0 is returned directly, this is the most common case. If a '\r' is found, the content is analyzed more deeply. Reported-By: Ashish Negi Signed-off-by: Torsten Bögershausen --- convert.c| 19 + t/t0027-auto-crlf.sh | 76 2 files changed, 85 insertions(+), 10 deletions(-) diff --git a/convert.c b/convert.c index 20d7ab67bd..63ef799239 100644 --- a/convert.c +++ b/convert.c @@ -220,18 +220,27 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action, } } -static int has_cr_in_index(const struct index_state *istate, const char *path) +static int has_crlf_in_index(const struct index_state *istate, const char *path) { unsigned long sz; void *data; - int has_cr; + const char *crp; + int has_crlf = 0; data = read_blob_data_from_index(istate, path, &sz); if (!data) return 0; - has_cr = memchr(data, '\r', sz) != NULL; + + crp = memchr(data, '\r', sz); + if (crp && (crp[1] == '\n')) { + unsigned int ret_stats; + ret_stats = gather_convert_stats(data, sz); + if (!(ret_stats & CONVERT_STAT_BITS_BIN) && + (ret_stats & CONVERT_STAT_BITS_TXT_CRLF)) + has_crlf = 1; + } free(data); - return has_cr; + return has_crlf; } static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, @@ -290,7 +299,7 @@ static int crlf_to_git(const struct index_state *istate, * cherry-pick. */ if ((checksafe != SAFE_CRLF_RENORMALIZE) && - has_cr_in_index(istate, path)) + has_crlf_in_index(istate, path)) convert_crlf_into_lf = 0; } if ((checksafe == SAFE_CRLF_WARN || diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 68108d956a..0af35cfb1f 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -43,19 +43,31 @@ create_gitattributes () { } >.gitattributes } -create_NNO_files () { +# Create 2 sets of files: +# The NNO files are "Not NOrmalized in the repo. We use CRLF_mix_LF and store +# it under different names for the different test cases, see ${pfx} +# Depending on .gitattributes they are normalized at the next commit (or not) +# The MIX files have different contents in the repo. +# Depending on its contents, the "new safer autocrlf" may kick in. +create_NNO_MIX_files () { for crlf in false true input do for attr in "" auto text -text do for aeol in "" lf crlf do - pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf} + pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf} && cp CRLF_mix_LF ${pfx}_LF.txt && cp CRLF_mix_LF ${pfx}_CRLF.txt && cp CRLF_mix_LF ${pfx}_CRLF_mix_LF.txt && cp CRLF_mix_LF ${pfx}_LF_mix_CR.txt && - cp CRLF_mix_LF ${pfx}_CRLF_nul.txt + cp CRLF_mix_LF ${pfx}_CRLF_nul.txt && + pfx=MIX_attr_${attr}_aeol_${aeol}_${crlf} && + cp LF ${pfx}_LF.txt && + cp CRLF${pfx}_CRLF.txt && + cp CRLF_mix_LF ${pfx}_CRLF_mix_LF.txt && + cp LF_mix_CR ${pfx}_LF_mix_CR.txt && + cp CRLF_nul${pfx}_CRLF_nul.txt done done done @@ -136,6 +148,49 @@ commit_chk_wrnNNO () { ' } +# Commit a file with mixed line endings on top of different files +# in the index. Check for warnings +commit_MIX_chkwrn () { + attr=$1 ; shift + aeol=$1 ; shift + crlf=$1 ; shift + lfwarn=$1 ; shift + crlfwarn=$1 ; shift + lfmixcrlf=$1 ; shift + lfmixcr=$1 ; shift + crlfnul=$1 ; shift + pfx=MIX_attr_${attr}_aeol_${aeol}_${crlf} + #Commit file with CLRF_mix_LF on top of existing file + create_gitattributes "$attr" $a
[PATCH v3 1/1] Introduce git add --renormalize .
From: Torsten Bögershausen Make it safer to normalize the line endings in a repository: Files that had been commited with CRLF will be commited with LF. The old way to normalize a repo was like this: # Make sure that there are not untracked files $ echo "* text=auto" >.gitattributes $ git read-tree --empty $ git add . $ git commit -m "Introduce end-of-line normalization" The user must make sure that there are no untracked files, otherwise they would have been added and tracked from now on. The new "add ..renormalize" does not add untracked files: $ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git commit -m "Introduce end-of-line normalization" Note that "git add --renormalize " is the short form for "git add -u --renormalize ". While add it, document that the same renormalization may be needed, whenever a clean filter is added or changed. Helped-By: Junio C Hamano Signed-off-by: Torsten Bögershausen --- Changes since V2: Add line endings in t0025 Use the <<-\EOF pattern Improve the documentation for "git add --renormalize" Documentation/git-add.txt | 9 - Documentation/gitattributes.txt | 6 -- builtin/add.c | 28 ++-- cache.h | 1 + read-cache.c| 30 +++--- sha1_file.c | 16 ++-- t/t0025-crlf-renormalize.sh | 30 ++ 7 files changed, 102 insertions(+), 18 deletions(-) create mode 100755 t/t0025-crlf-renormalize.sh diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt index b700beaff5..d50fa339dc 100644 --- a/Documentation/git-add.txt +++ b/Documentation/git-add.txt @@ -10,7 +10,7 @@ SYNOPSIS [verse] 'git add' [--verbose | -v] [--dry-run | -n] [--force | -f] [--interactive | -i] [--patch | -p] [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]] - [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] + [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--renormalize] [--chmod=(+|-)x] [--] [...] DESCRIPTION @@ -175,6 +175,13 @@ for "git add --no-all ...", i.e. ignored removed files. warning (e.g., if you are manually performing operations on submodules). +--renormalize:: + Apply the "clean" process freshly to all tracked files to + forcibly add them again to the index. This is useful after + changing `core.autocrlf` configuration or the `text` attribute + in order to correct files added with wrong CRLF/LF line endings. + This option implies `-u`. + --chmod=(+|-)x:: Override the executable bit of the added files. The executable bit is only changed in the index, the files on disk are left diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 4c68bc19d5..30687de81a 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -232,8 +232,7 @@ From a clean working directory: - $ echo "* text=auto" >.gitattributes -$ git read-tree --empty # Clean index, force re-scan of working directory -$ git add . +$ git add --renormalize . $ git status# Show files that will be normalized $ git commit -m "Introduce end-of-line normalization" - @@ -328,6 +327,9 @@ You can declare that a filter turns a content that by itself is unusable into a usable content by setting the filter..required configuration variable to `true`. +Note: Whenever the clean filter is changed, the repo should be renormalized: +$ git add --renormalize . + For example, in .gitattributes, you would assign the `filter` attribute for paths. diff --git a/builtin/add.c b/builtin/add.c index a648cf4c56..c42b50f857 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = { }; static int patch_interactive, add_interactive, edit_interactive; static int take_worktree_changes; +static int add_renormalize; struct update_callback_data { int flags; @@ -123,6 +124,25 @@ int add_files_to_cache(const char *prefix, return !!data.add_errors; } +static int renormalize_tracked_files(const struct pathspec *pathspec, int flags) +{ + int i, retval = 0; + + for (i = 0; i < active_nr; i++) { + struct cache_entry *ce = active_cache[i]; + + if (ce_stage(ce)) + continue; /* do not touch unmerged paths */ + if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode)) + continue; /* do not touch non blobs */ + if (pathspec && !ce_path_match(ce, pathspec, NULL)) + continue; + retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE); + } + + return retval; +} + s
[PATCH v2 1/1] Introduce git add --renormalize .
From: Torsten Bögershausen Make it safer to normalize the line endings in a repository: Files that had been commited with CRLF will be commited with LF. The old way to normalize a repo was like this: # Make sure that there are not untracked files $ echo "* text=auto" >.gitattributes $ git read-tree --empty $ git add . $ git commit -m "Introduce end-of-line normalization" The user must make sure that there are no untracked files, otherwise they would have been added and tracked from now on. The new "add ..renormalize" does not add untracked files: $ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git commit -m "Introduce end-of-line normalization" Note that "git add --renormalize " is the short form for "git add -u --renormalize ". While add it, document that the same renormalization may be needed, whenever a clean filter is added or changed. Helped-By: Junio C Hamano Signed-off-by: Torsten Bögershausen --- Second version: - Removed the global flag - Make clearer that the clean filters may need renormalization - commit message improved Documentation/git-add.txt | 8 +++- Documentation/gitattributes.txt | 6 -- builtin/add.c | 28 ++-- cache.h | 1 + read-cache.c| 30 +++--- sha1_file.c | 16 ++-- t/t0025-crlf-renormalize.sh | 30 ++ 7 files changed, 101 insertions(+), 18 deletions(-) create mode 100755 t/t0025-crlf-renormalize.sh diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt index b700beaff5..09a08ce4c1 100644 --- a/Documentation/git-add.txt +++ b/Documentation/git-add.txt @@ -10,7 +10,7 @@ SYNOPSIS [verse] 'git add' [--verbose | -v] [--dry-run | -n] [--force | -f] [--interactive | -i] [--patch | -p] [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]] - [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] + [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--renormalize] [--chmod=(+|-)x] [--] [...] DESCRIPTION @@ -175,6 +175,12 @@ for "git add --no-all ...", i.e. ignored removed files. warning (e.g., if you are manually performing operations on submodules). +--renormalize:: + Normalizes the line endings from CRLF to LF of tracked files. + This applies to files which are either "text" or "text=auto" + in .gitattributes (or core.autocrlf is true or input) + --renormalize implies -u + --chmod=(+|-)x:: Override the executable bit of the added files. The executable bit is only changed in the index, the files on disk are left diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 4c68bc19d5..30687de81a 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -232,8 +232,7 @@ From a clean working directory: - $ echo "* text=auto" >.gitattributes -$ git read-tree --empty # Clean index, force re-scan of working directory -$ git add . +$ git add --renormalize . $ git status# Show files that will be normalized $ git commit -m "Introduce end-of-line normalization" - @@ -328,6 +327,9 @@ You can declare that a filter turns a content that by itself is unusable into a usable content by setting the filter..required configuration variable to `true`. +Note: Whenever the clean filter is changed, the repo should be renormalized: +$ git add --renormalize . + For example, in .gitattributes, you would assign the `filter` attribute for paths. diff --git a/builtin/add.c b/builtin/add.c index a648cf4c56..c42b50f857 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = { }; static int patch_interactive, add_interactive, edit_interactive; static int take_worktree_changes; +static int add_renormalize; struct update_callback_data { int flags; @@ -123,6 +124,25 @@ int add_files_to_cache(const char *prefix, return !!data.add_errors; } +static int renormalize_tracked_files(const struct pathspec *pathspec, int flags) +{ + int i, retval = 0; + + for (i = 0; i < active_nr; i++) { + struct cache_entry *ce = active_cache[i]; + + if (ce_stage(ce)) + continue; /* do not touch unmerged paths */ + if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode)) + continue; /* do not touch non blobs */ + if (pathspec && !ce_path_match(ce, pathspec, NULL)) + continue; + retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE); + } + + return retval; +} + static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec,
[PATCH v1 1/1] Introduce git add --renormalize .
From: Torsten Bögershausen Make it safer to normalize the line endings in a repository: Files that had been commited with CRLF will be commited with LF. (Unless core.autorclf and .gitattributes specify that Git should not do line ending conversions) The old way to normalize a repo was like this: # Make sure that there are not untracked files $ echo "* text=auto" >.gitattributes $ git read-tree --empty $ git add . $ git commit -m "Introduce end-of-line normalization" The new method is one step shorter, more intuitive and does not add untracked files: $ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git commit -m "Introduce end-of-line normalization" Note that "git add --renormalize " is the short form for "git add -u --renormalize ". Signed-off-by: Torsten Bögershausen --- Documentation/git-add.txt | 8 +++- Documentation/gitattributes.txt | 3 +-- builtin/add.c | 27 +-- cache.h | 1 + convert.c | 1 + environment.c | 1 + read-cache.c| 24 ++-- t/t0025-crlf-renormalize.sh | 30 ++ 8 files changed, 80 insertions(+), 15 deletions(-) create mode 100755 t/t0025-crlf-renormalize.sh diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt index f4169fb1ec..b6e431903d 100644 --- a/Documentation/git-add.txt +++ b/Documentation/git-add.txt @@ -10,7 +10,7 @@ SYNOPSIS [verse] 'git add' [--verbose | -v] [--dry-run | -n] [--force | -f] [--interactive | -i] [--patch | -p] [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]] - [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] + [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--renormalize] [--chmod=(+|-)x] [--] [...] DESCRIPTION @@ -172,6 +172,12 @@ for "git add --no-all ...", i.e. ignored removed files. warning (e.g., if you are manually performing operations on submodules). +--renormalize:: + Normalizes the line endings from CRLF to LF of tracked files. + This applies to files which are either "text" or "text=auto" + in .gitattributes (or core.autocrlf is true or input) +--renormalize implies -u + --chmod=(+|-)x:: Override the executable bit of the added files. The executable bit is only changed in the index, the files on disk are left diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 4c68bc19d5..071dec2bc4 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -232,8 +232,7 @@ From a clean working directory: - $ echo "* text=auto" >.gitattributes -$ git read-tree --empty # Clean index, force re-scan of working directory -$ git add . +$ git add --renormalize . $ git status# Show files that will be normalized $ git commit -m "Introduce end-of-line normalization" - diff --git a/builtin/add.c b/builtin/add.c index a648cf4c56..ee8e756fdc 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -123,6 +123,25 @@ int add_files_to_cache(const char *prefix, return !!data.add_errors; } +static int renormalize_tracked_files(const struct pathspec *pathspec, int flags) +{ + int i, retval = 0; + + for (i = 0; i < active_nr; i++) { + struct cache_entry *ce = active_cache[i]; + + if (ce_stage(ce)) + continue; /* do not touch unmerged paths */ + if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode)) + continue; /* do not touch non blobs */ + if (pathspec && !ce_path_match(ce, pathspec, NULL)) + continue; + retval |= add_file_to_cache(ce->name, flags); + } + + return retval; +} + static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec, int prefix) { char *seen; @@ -276,6 +295,7 @@ static struct option builtin_add_options[] = { OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")), OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files")), OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")), + OPT_BOOL(0, "renormalize", &add_renormalize, N_("renormalize EOL of tracked files (implies -u)")), OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")), OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")), { OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit, @@ -406,7 +426,7 @@ int cmd_add(int argc, const char **argv, const char *prefix) chmod_arg[1] != 'x' || chmod_arg[2]))
[PATCH v1 1/1] test-lint: echo -e (or -E) is not portable
From: Torsten Bögershausen Some implementations of `echo` support the '-e' option to enable backslash interpretation of the following string. As an addition, they support '-E' to turn it off. However, none of these are portable, POSIX doesn't even mention them, and many implementations don't support them. A check for '-n' is already done in check-non-portable-shell.pl, extend it to cover '-n', '-e' or '-E-' Signed-off-by: Torsten Bögershausen --- t/check-non-portable-shell.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl index b170cbc045..03dc9d2852 100755 --- a/t/check-non-portable-shell.pl +++ b/t/check-non-portable-shell.pl @@ -17,7 +17,7 @@ sub err { while (<>) { chomp; /\bsed\s+-i/ and err 'sed -i is not portable'; - /\becho\s+-n/ and err 'echo -n is not portable (please use printf)'; + /\becho\s+-[neE]/ and err 'echo with option is not portable (please use printf)'; /^\s*declare\s+/ and err 'arrays/declare not portable'; /^\s*[^#]\s*which\s/ and err 'which is not portable (please use type)'; /\btest\s+[^=]*==/ and err '"test a == b" is not portable (please use =)'; -- 2.14.1.145.gb3622a4ee9
[PATCH v4 2/2] File commited with CRLF should roundtrip diff and apply
From: Torsten Bögershausen When a file had been commited with CRLF but now .gitattributes say "* text=auto" (or core.autocrlf is true), the following does not roundtrip, `git apply` fails: printf "Added line\r\n" >>file && git diff >patch && git checkout -- . && git apply patch Before applying the patch, the file from working tree is converted into the index format (clean filter, CRLF conversion, ...) Here, when commited with CRLF, the line endings should not be converted. Note that `git apply --index` or `git apply --cache` doesn't call convert_to_git() because the source material is already in index format. Analyze the patch if there is a) any context line with CRLF, or b) if any line with CRLF is to be removed. In this case the patch file `patch` has mixed line endings, for a) it looks like this: diff --git a/one b/one index 533790e..c30dea8 100644 --- a/one +++ b/one @@ -1 +1,2 @@ a\r +b\r And for b) it looks like this: diff --git a/one b/one index 533790e..485540d 100644 --- a/one +++ b/one @@ -1 +1 @@ -a\r +b\r If `git apply` detects that the patch itself has CRLF, (look at the line " a\r" or "-a\r" above), the new flag crlf_in_old is set in "struct patch" and two things will happen: - read_old_data() will not convert CRLF into LF by calling convert_to_git(..., SAFE_CRLF_KEEP_CRLF); - The WS_CR_AT_EOL bit is set in the "white space rule", CRLF are no longer treated as white space. While at there, make clear that read_old_data() in apply.c knows what it wants convert_to_git() to do with respect to CRLF. In fact, this codepath is about applying a patch to a file in the filesystem, which may not exist in the index, or may exist but may not match what is recorded in the index, or in the extreme case, we may not even be in a Git repository. If convert_to_git() peeked at the index while doing its work, it *would* be a bug. Pass NULL instead of &the_index to convert_to_git() to make sure we catch future bugs to clarify this. Update the test in t4124: split one test case into 3: - Detect the " a\r" line in the patch - Detect the "-a\r" line in the patch - Use LF in repo and CLRF in the worktree. Reported-by: Anthony Sottile Helped-by: Junio C Hamano Signed-off-by: Torsten Bögershausen --- Changes since v3: - took apply.c from junio/tb/apply-with-crlf - Remove the leading asterix in the commit message, at the place where the "git diff" is cited. - Mention "Pass NULL instead of &the_index to convert_to_git()" apply.c | 41 - t/t4124-apply-ws-rule.sh | 33 +++-- 2 files changed, 63 insertions(+), 11 deletions(-) diff --git a/apply.c b/apply.c index f2d599141d..66c68f193a 100644 --- a/apply.c +++ b/apply.c @@ -220,6 +220,7 @@ struct patch { unsigned int recount:1; unsigned int conflicted_threeway:1; unsigned int direct_to_threeway:1; + unsigned int crlf_in_old:1; struct fragment *fragments; char *result; size_t resultsize; @@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state, record_ws_error(state, result, line + 1, len - 2, state->linenr); } +/* + * Check if the patch has context lines with CRLF or + * the patch wants to remove lines with CRLF. + */ +static void check_old_for_crlf(struct patch *patch, const char *line, int len) +{ + if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') { + patch->ws_rule |= WS_CR_AT_EOL; + patch->crlf_in_old = 1; + } +} + + /* * Parse a unified diff. Note that this really needs to parse each * fragment separately, since the only way to know the difference @@ -1712,11 +1726,14 @@ static int parse_fragment(struct apply_state *state, if (!deleted && !added) leading++; trailing++; + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action == correct_ws_error) check_whitespace(state, line, len, patch->ws_rule); break; case '-': + if (!state->apply_in_reverse) + check_old_for_crlf(patch, line, len); if (state->apply_in_reverse && state->ws_error_action != nowarn_ws_error) check_whitespace(state, line, len, patch->ws_rule); @@ -1725,6 +1742,8 @@ static int parse_fragment(struct apply_state *state, trailing = 0; break; case '+': + if (state->apply_in_reverse) + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action != nowarn_ws
[PATCH v4 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
From: Torsten Bögershausen When convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen --- convert.c | 10 ++ convert.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index deaf0ba7b3..040123b4fe 100644 --- a/convert.c +++ b/convert.c @@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate, src = dst->buf; len = dst->len; } - ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); - if (ret && dst) { - src = dst->buf; - len = dst->len; + if (checksafe != SAFE_CRLF_KEEP_CRLF) { + ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } } return ret | ident_to_git(path, src, len, dst, ca.ident); } diff --git a/convert.h b/convert.h index cecf59d1aa..cabd5ed6dd 100644 --- a/convert.h +++ b/convert.h @@ -10,7 +10,8 @@ enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, SAFE_CRLF_WARN = 2, - SAFE_CRLF_RENORMALIZE = 3 + SAFE_CRLF_RENORMALIZE = 3, + SAFE_CRLF_KEEP_CRLF = 4 }; extern enum safe_crlf safe_crlf; -- 2.14.0.rc1.15.gd40c2d4e85.dirty
[PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
From: Torsten Bögershausen When convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen --- convert.c | 10 ++ convert.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index deaf0ba7b3..040123b4fe 100644 --- a/convert.c +++ b/convert.c @@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate, src = dst->buf; len = dst->len; } - ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); - if (ret && dst) { - src = dst->buf; - len = dst->len; + if (checksafe != SAFE_CRLF_KEEP_CRLF) { + ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } } return ret | ident_to_git(path, src, len, dst, ca.ident); } diff --git a/convert.h b/convert.h index cecf59d1aa..cabd5ed6dd 100644 --- a/convert.h +++ b/convert.h @@ -10,7 +10,8 @@ enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, SAFE_CRLF_WARN = 2, - SAFE_CRLF_RENORMALIZE = 3 + SAFE_CRLF_RENORMALIZE = 3, + SAFE_CRLF_KEEP_CRLF = 4 }; extern enum safe_crlf safe_crlf; -- 2.14.1.145.gb3622a4ee9
[PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply
From: Torsten Bögershausen When a file had been commited with CRLF but now .gitattributes say "* text=auto" (or core.autocrlf is true), the following does not roundtrip, `git apply` fails: printf "Added line\r\n" >>file && git diff >patch && git checkout -- . && git apply patch Before applying the patch, the file from working tree is converted into the index format (clean filter, CRLF conversion, ...) Here, when commited with CRLF, the line endings should not be converted. Note that `git apply --index` or `git apply --cache` doesn't call convert_to_git() because the source material is already in index format. Analyze the patch if there is a) any context line with CRLF, or b) if any line with CRLF is to be removed. In this case the patch file `patch` has mixed line endings, for a) it looks like this (ignore the * at the begin of the line): * diff --git a/one b/one * index 533790e..c30dea8 100644 * --- a/one * +++ b/one * @@ -1 +1,2 @@ * a\r * +b\r And for b) it looks like this: * diff --git a/one b/one * index 533790e..485540d 100644 * --- a/one * +++ b/one * @@ -1 +1 @@ * -a\r * +b\r If `git apply` detects that the patch itself has CRLF, (look at the line " a\r" or "-a\r" above), the new flag crlf_in_old is set in "struct patch" and two things will happen: - read_old_data() will not convert CRLF into LF by calling convert_to_git(..., SAFE_CRLF_KEEP_CRLF); - The WS_CR_AT_EOL bit is set in the "white space rule", CRLF are no longer treated as white space. Thanks to Junio C Hamano, his input became the base for the changes in t4124. One test case is split up into 3: - Detect the " a\r" line in the patch - Detect the "-a\r" line in the patch - Use LF in repo and CLRF in the worktree. Reported-by: Anthony Sottile Signed-off-by: Torsten Bögershausen --- Changes since v2: - Manually integrated all code changes from Junio (Thanks, I hope that I didn't miss something) - Having examples of "git diff" in the commit message confuses "git apply", so that all examples for git diff have a '*' at the beginnig of the line (V2 used '$' which is typically an example for a shell script) - The official version to apply the CRLF-rules without having an index is SAFE_CRLF_RENORMALIZE, that is already working today. - Now we have convert_to_git(NULL, ..., safe_crlf) with enum safe_crlf safe_crlf = patch->crlf_in_old ? SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE; apply.c | 40 +++- t/t4124-apply-ws-rule.sh | 33 +++-- 2 files changed, 62 insertions(+), 11 deletions(-) diff --git a/apply.c b/apply.c index f2d599141d..691f47c783 100644 --- a/apply.c +++ b/apply.c @@ -220,6 +220,7 @@ struct patch { unsigned int recount:1; unsigned int conflicted_threeway:1; unsigned int direct_to_threeway:1; + unsigned int crlf_in_old:1; struct fragment *fragments; char *result; size_t resultsize; @@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state, record_ws_error(state, result, line + 1, len - 2, state->linenr); } +/* + * Check if the patch has context lines with CRLF or + * the patch wants to remove lines with CRLF. + */ +static void check_old_for_crlf(struct patch *patch, const char *line, int len) +{ + if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') { + patch->ws_rule |= WS_CR_AT_EOL; + patch->crlf_in_old = 1; + } +} + + /* * Parse a unified diff. Note that this really needs to parse each * fragment separately, since the only way to know the difference @@ -1712,11 +1726,15 @@ static int parse_fragment(struct apply_state *state, if (!deleted && !added) leading++; trailing++; + if (!state->apply_in_reverse) + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action == correct_ws_error) check_whitespace(state, line, len, patch->ws_rule); break; case '-': + if (!state->apply_in_reverse) + check_old_for_crlf(patch, line, len); if (state->apply_in_reverse && state->ws_error_action != nowarn_ws_error) check_whitespace(state, line, len, patch->ws_rule); @@ -2268,8 +2286,11 @@ static void show_stats(struct apply_state *state, struct patch *patch) add, pluses, del, minuses); } -static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) +static int read_old_data(struct stat *st, struct patch *patch, +const char *path, struct strbuf *buf) { + enum safe_crlf safe_crlf = patch->crlf_in_old ? +
[PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
From: Torsten Bögershausen When convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen --- convert.c | 10 ++ convert.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index deaf0ba7b3..040123b4fe 100644 --- a/convert.c +++ b/convert.c @@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate, src = dst->buf; len = dst->len; } - ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); - if (ret && dst) { - src = dst->buf; - len = dst->len; + if (checksafe != SAFE_CRLF_KEEP_CRLF) { + ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } } return ret | ident_to_git(path, src, len, dst, ca.ident); } diff --git a/convert.h b/convert.h index cecf59d1aa..cabd5ed6dd 100644 --- a/convert.h +++ b/convert.h @@ -10,7 +10,8 @@ enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, SAFE_CRLF_WARN = 2, - SAFE_CRLF_RENORMALIZE = 3 + SAFE_CRLF_RENORMALIZE = 3, + SAFE_CRLF_KEEP_CRLF = 4 }; extern enum safe_crlf safe_crlf; -- 2.14.1.145.gb3622a4ee9
[PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply
From: Torsten Bögershausen When a file had been commited with CRLF but now .gitattributes say "* text=auto" (or core.autocrlf is true), the following does not roundtrip, `git apply` fails: printf "Added line\r\n" >>file && git diff >patch && git checkout -- . && git apply patch Before applying the patch, the file from working tree is converted into the index format (clean filter, CRLF conversion, ...) Here, when commited with CRLF, the line endings should not be converted. Note that `git apply --index` or `git apply --cache` doesn't call convert_to_git() because the source material is already in index format. Analyze the patch if there is a) any context line with CRLF, or b) if any line with CRLF is to be removed. In this case the patch file `patch` has mixed line endings, for a) it looks like this (ignore the $ at the begin of the line): $ diff --git a/one b/one $ index 533790e..c30dea8 100644 $ --- a/one $ +++ b/one $ @@ -1 +1,2 @@ $ a\r $ +b\r And for b) it looks like this: $ diff --git a/one b/one $ index 533790e..485540d 100644 $ --- a/one $ +++ b/one $ @@ -1 +1 @@ $ -a\r $ +b\r If `git apply` detects that the patch itself has CRLF, (look at the line " a\r" or "-a\r" above), the new flag has_crlf is set in "struct patch" and two things will happen: - read_old_data() will not convert CRLF into LF by calling convert_to_git(..., SAFE_CRLF_KEEP_CRLF); - The WS_CR_AT_EOL bit is set in the "white space rule", CRLF are no longer treated as white space. Thanks to Junio C Hamano, his input became the base for the changes in t4124. One test case is split up into 3: - Detect the " a\r" line in the patch - Detect the "-a\r" line in the patch - Use LF in repo and CLRF in the worktree. (*) * This one proves that convert_to_git(&the_index,...) still needs to pass the &index, otherwise Git will crash. Reported-by: Anthony Sottile Signed-off-by: Torsten Bögershausen --- apply.c | 28 +++- t/t4124-apply-ws-rule.sh | 33 +++-- 2 files changed, 50 insertions(+), 11 deletions(-) diff --git a/apply.c b/apply.c index f2d599141d..bebb176099 100644 --- a/apply.c +++ b/apply.c @@ -220,6 +220,7 @@ struct patch { unsigned int recount:1; unsigned int conflicted_threeway:1; unsigned int direct_to_threeway:1; + unsigned int has_crlf:1; struct fragment *fragments; char *result; size_t resultsize; @@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state, record_ws_error(state, result, line + 1, len - 2, state->linenr); } +/* Check if the patch has context lines with CRLF or + the patch wants to remove lines with CRLF */ +static void check_old_for_crlf(struct patch *patch, const char *line, int len) +{ + if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') { + patch->ws_rule |= WS_CR_AT_EOL; + patch->has_crlf = 1; + } +} + + /* * Parse a unified diff. Note that this really needs to parse each * fragment separately, since the only way to know the difference @@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state, if (!deleted && !added) leading++; trailing++; + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action == correct_ws_error) check_whitespace(state, line, len, patch->ws_rule); break; case '-': + check_old_for_crlf(patch, line, len); if (state->apply_in_reverse && state->ws_error_action != nowarn_ws_error) check_whitespace(state, line, len, patch->ws_rule); @@ -2268,8 +2282,11 @@ static void show_stats(struct apply_state *state, struct patch *patch) add, pluses, del, minuses); } -static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) +static int read_old_data(struct stat *st, struct patch *patch, +const char *path, struct strbuf *buf) { + enum safe_crlf safe_crlf = patch->has_crlf ? + SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE; switch (st->st_mode & S_IFMT) { case S_IFLNK: if (strbuf_readlink(buf, path, st->st_size) < 0) @@ -2278,7 +2295,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) case S_IFREG: if (strbuf_read_file(buf, path, st->st_size) != st->st_size) return error(_("unable to open or read %s"), path); - convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0); + convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf); return 0; default:
[PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply
From: Torsten Bögershausen When a file had been commited with CRLF and core.autocrlf is true, the following does not roundtrip, `git apply` fails: printf "Added line\r\n" >>file && git diff >patch && git checkout -- . && git apply patch Before applying the patch, the file from working tree is converted into the index format (clean filter, CRLF conversion, ...) Here, when commited with CRLF, the line endings should not be converted. Analyze the patch if there is any context line with CRLF, or if any line with CRLF is to be removed. If yes, the new flag has_crlf is set in "struct patch", and two things will happen: - read_old_data() will not convert CRLF into LF by calling convert_to_git(..., SAFE_CRLF_KEEP_CRLF); - The WS_CR_AT_EOL bit is set in the "white space rule", CRLF are no longer treated as white space. Thanks to Junio C Hamano, his input became the base for t4140. Reported-by: Anthony Sottile Signed-off-by: Torsten Bögershausen --- The last version did not pass t4124, fix this. apply.c | 37 - apply.h | 4 t/t4124-apply-ws-rule.sh | 3 +-- t/t4140-apply-CRLF.sh| 46 ++ 4 files changed, 79 insertions(+), 11 deletions(-) create mode 100755 t/t4140-apply-CRLF.sh diff --git a/apply.c b/apply.c index f2d599141d..63455cd65f 100644 --- a/apply.c +++ b/apply.c @@ -220,6 +220,7 @@ struct patch { unsigned int recount:1; unsigned int conflicted_threeway:1; unsigned int direct_to_threeway:1; + unsigned int has_crlf:1; struct fragment *fragments; char *result; size_t resultsize; @@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state, record_ws_error(state, result, line + 1, len - 2, state->linenr); } +/* Check if the patch has context lines with CRLF or + the patch wants to remove lines with CRLF */ +static void check_old_for_crlf(struct patch *patch, const char *line, int len) +{ + if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') { + patch->ws_rule |= WS_CR_AT_EOL; + patch->has_crlf = 1; + } +} + + /* * Parse a unified diff. Note that this really needs to parse each * fragment separately, since the only way to know the difference @@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state, if (!deleted && !added) leading++; trailing++; + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action == correct_ws_error) check_whitespace(state, line, len, patch->ws_rule); break; case '-': + check_old_for_crlf(patch, line, len); if (state->apply_in_reverse && state->ws_error_action != nowarn_ws_error) check_whitespace(state, line, len, patch->ws_rule); @@ -2268,8 +2282,10 @@ static void show_stats(struct apply_state *state, struct patch *patch) add, pluses, del, minuses); } -static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) +static int read_old_data(struct stat *st, const char *path, struct strbuf *buf, int flags) { + enum safe_crlf safe_crlf = flags & APPLY_FLAGS_CR_AT_EOL ? + SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE; switch (st->st_mode & S_IFMT) { case S_IFLNK: if (strbuf_readlink(buf, path, st->st_size) < 0) @@ -2278,7 +2294,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) case S_IFREG: if (strbuf_read_file(buf, path, st->st_size) != st->st_size) return error(_("unable to open or read %s"), path); - convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0); + convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf); return 0; default: return -1; @@ -3385,7 +3401,8 @@ static int load_patch_target(struct apply_state *state, const struct cache_entry *ce, struct stat *st, const char *name, -unsigned expected_mode) +unsigned expected_mode, +int flags) { if (state->cached || state->check_index) { if (read_file_or_gitlink(ce, buf)) @@ -3399,7 +3416,7 @@ static int load_patch_target(struct apply_state *state, } else if (has_symlink_leading_path(name, strlen(name))) { return error(_("reading from '%s' beyond a symbolic link"), name); } else { -
[PATCH/RFC 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
From: Torsten Bögershausen When convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen --- convert.c | 10 ++ convert.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index deaf0ba7b3..040123b4fe 100644 --- a/convert.c +++ b/convert.c @@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate, src = dst->buf; len = dst->len; } - ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); - if (ret && dst) { - src = dst->buf; - len = dst->len; + if (checksafe != SAFE_CRLF_KEEP_CRLF) { + ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } } return ret | ident_to_git(path, src, len, dst, ca.ident); } diff --git a/convert.h b/convert.h index cecf59d1aa..cabd5ed6dd 100644 --- a/convert.h +++ b/convert.h @@ -10,7 +10,8 @@ enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, SAFE_CRLF_WARN = 2, - SAFE_CRLF_RENORMALIZE = 3 + SAFE_CRLF_RENORMALIZE = 3, + SAFE_CRLF_KEEP_CRLF = 4 }; extern enum safe_crlf safe_crlf; -- 2.14.1.145.gb3622a4ee9
[PATCH/RFC] convert: Add SAFE_CRLF_KEEP_CRLF
From: Torsten Bögershausen When convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen --- convert.c | 10 ++ convert.h | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/convert.c b/convert.c index deaf0ba7b3..040123b4fe 100644 --- a/convert.c +++ b/convert.c @@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate, src = dst->buf; len = dst->len; } - ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); - if (ret && dst) { - src = dst->buf; - len = dst->len; + if (checksafe != SAFE_CRLF_KEEP_CRLF) { + ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe); + if (ret && dst) { + src = dst->buf; + len = dst->len; + } } return ret | ident_to_git(path, src, len, dst, ca.ident); } diff --git a/convert.h b/convert.h index cecf59d1aa..cabd5ed6dd 100644 --- a/convert.h +++ b/convert.h @@ -10,7 +10,8 @@ enum safe_crlf { SAFE_CRLF_FALSE = 0, SAFE_CRLF_FAIL = 1, SAFE_CRLF_WARN = 2, - SAFE_CRLF_RENORMALIZE = 3 + SAFE_CRLF_RENORMALIZE = 3, + SAFE_CRLF_KEEP_CRLF = 4 }; extern enum safe_crlf safe_crlf; -- 2.14.1.145.gb3622a4ee9
[PATCH/RFC] File commited with CRLF should roundtrip diff and apply
From: Torsten Bögershausen When a file had been commited with CRLF and core.autocrlf is true, the following does not roundtrip, `git apply` fails: printf "Added line\r\n" >>file && git diff >patch && git checkout -- . && git apply patch Before applying the patch, the file from working tree is converted into the index format (clean filter, CRLF conversion, ...) Here, when commited with CRLF, the line endings should not be converted. Analyze the patch if there is any context line with CRLF, or if any line with CRLF is to be removed. If yes, the new flag has_crlf is set in "struct patch", and two things will happen: - read_old_data() will not convert CRLF into LF by calling convert_to_git(..., SAFE_CRLF_KEEP_CRLF); - The WS_CR_AT_EOL bit is set in the "white space rule", CRLF are no longer treated as white space. Thanks to Junio C Hamano, his input became the base for t4140. Reported-by: Anthony Sottile Signed-off-by: Torsten Bögershausen --- apply.c | 37 - apply.h | 4 t/t4140-apply-CRLF.sh | 46 ++ 3 files changed, 78 insertions(+), 9 deletions(-) create mode 100755 t/t4140-apply-CRLF.sh diff --git a/apply.c b/apply.c index f2d599141d..63455cd65f 100644 --- a/apply.c +++ b/apply.c @@ -220,6 +220,7 @@ struct patch { unsigned int recount:1; unsigned int conflicted_threeway:1; unsigned int direct_to_threeway:1; + unsigned int has_crlf:1; struct fragment *fragments; char *result; size_t resultsize; @@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state, record_ws_error(state, result, line + 1, len - 2, state->linenr); } +/* Check if the patch has context lines with CRLF or + the patch wants to remove lines with CRLF */ +static void check_old_for_crlf(struct patch *patch, const char *line, int len) +{ + if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') { + patch->ws_rule |= WS_CR_AT_EOL; + patch->has_crlf = 1; + } +} + + /* * Parse a unified diff. Note that this really needs to parse each * fragment separately, since the only way to know the difference @@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state, if (!deleted && !added) leading++; trailing++; + check_old_for_crlf(patch, line, len); if (!state->apply_in_reverse && state->ws_error_action == correct_ws_error) check_whitespace(state, line, len, patch->ws_rule); break; case '-': + check_old_for_crlf(patch, line, len); if (state->apply_in_reverse && state->ws_error_action != nowarn_ws_error) check_whitespace(state, line, len, patch->ws_rule); @@ -2268,8 +2282,10 @@ static void show_stats(struct apply_state *state, struct patch *patch) add, pluses, del, minuses); } -static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) +static int read_old_data(struct stat *st, const char *path, struct strbuf *buf, int flags) { + enum safe_crlf safe_crlf = flags & APPLY_FLAGS_CR_AT_EOL ? + SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE; switch (st->st_mode & S_IFMT) { case S_IFLNK: if (strbuf_readlink(buf, path, st->st_size) < 0) @@ -2278,7 +2294,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) case S_IFREG: if (strbuf_read_file(buf, path, st->st_size) != st->st_size) return error(_("unable to open or read %s"), path); - convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0); + convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf); return 0; default: return -1; @@ -3385,7 +3401,8 @@ static int load_patch_target(struct apply_state *state, const struct cache_entry *ce, struct stat *st, const char *name, -unsigned expected_mode) +unsigned expected_mode, +int flags) { if (state->cached || state->check_index) { if (read_file_or_gitlink(ce, buf)) @@ -3399,7 +3416,7 @@ static int load_patch_target(struct apply_state *state, } else if (has_symlink_leading_path(name, strlen(name))) { return error(_("reading from '%s' beyond a symbolic link"), name); } else { - if (read_old_data(st, name, buf)) + if (read_old_dat
[PATCH v1 1/1] correct apply for files commited with CRLF
From: Torsten Bögershausen git apply does not find the source lines when files have CRLF in the index and core.autocrlf is true: These files should not get the CRLF converted to LF. Because cmd_apply() does not load the index, this does not work, CRLF are converted into LF and apply fails. Fix this in the spirit of commit a08feb8ef0b6, "correct blame for files commited with CRLF" by loading the index. As an optimization, skip read_cache() when no conversion is specified for this path. Reported-by: Anthony Sottile Signed-off-by: Torsten Bögershausen --- apply.c | 2 ++ t/t0020-crlf.sh | 12 2 files changed, 14 insertions(+) diff --git a/apply.c b/apply.c index f2d599141d..66b8387360 100644 --- a/apply.c +++ b/apply.c @@ -2278,6 +2278,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf) case S_IFREG: if (strbuf_read_file(buf, path, st->st_size) != st->st_size) return error(_("unable to open or read %s"), path); + if (would_convert_to_git(&the_index, path)) + read_cache(); convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0); return 0; default: diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh index 71350e0657..6611f8a6f6 100755 --- a/t/t0020-crlf.sh +++ b/t/t0020-crlf.sh @@ -386,4 +386,16 @@ test_expect_success 'New CRLF file gets LF in repo' ' test_cmp alllf alllf2 ' +test_expect_success 'CRLF in repo, apply with autocrlf=true' ' + git config core.autocrlf false && + printf "1\r\n2\r\n" >crlf && + git add crlf && + git commit -m "commit crlf with crlf" && + git config core.autocrlf true && + printf "1\r\n2\r\n\r\n\r\n\r\n" >crlf && + git diff >patch && + git checkout -- . && + git apply patch +' + test_done -- 2.13.2.533.ge0aaa1b
[PATCH v3 1/1] cygwin: Allow pushing to UNC paths
From: Torsten Bögershausen cygwin can use an UNC path like //server/share/repo $ cd //server/share/dir $ mkdir test $ cd test $ git init --bare However, when we try to push from a local Git repository to this repo, there is a problem: Git converts the leading "//" into a single "/". As cygwin handles an UNC path so well, Git can support them better: - Introduce cygwin_offset_1st_component() which keeps the leading "//", similar to what Git for Windows does. - Move CYGWIN out of the POSIX in the tests for path normalization in t0060 Signed-off-by: Torsten Bögershausen --- I think I skip all the changing in setup.c and cygwin_access() for the moment: - It is not clear, what is a regression and what is an improvement - It may be a problem that could be solved in cygwin itself - I was able to push a an UNC path on a Windows server when the domain controller was reachable. compat/cygwin.c | 19 +++ compat/cygwin.h | 2 ++ config.mak.uname | 1 + git-compat-util.h | 3 +++ t/t0060-path-utils.sh | 2 ++ 5 files changed, 27 insertions(+) create mode 100644 compat/cygwin.c create mode 100644 compat/cygwin.h diff --git a/compat/cygwin.c b/compat/cygwin.c new file mode 100644 index 000..b9862d6 --- /dev/null +++ b/compat/cygwin.c @@ -0,0 +1,19 @@ +#include "../git-compat-util.h" +#include "../cache.h" + +int cygwin_offset_1st_component(const char *path) +{ + const char *pos = path; + /* unc paths */ + if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { + /* skip server name */ + pos = strchr(pos + 2, '/'); + if (!pos) + return 0; /* Error: malformed unc path */ + + do { + pos++; + } while (*pos && pos[0] != '/'); + } + return pos + is_dir_sep(*pos) - path; +} diff --git a/compat/cygwin.h b/compat/cygwin.h new file mode 100644 index 000..8e52de4 --- /dev/null +++ b/compat/cygwin.h @@ -0,0 +1,2 @@ +int cygwin_offset_1st_component(const char *path); +#define offset_1st_component cygwin_offset_1st_component diff --git a/config.mak.uname b/config.mak.uname index adfb90b..551e465 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -184,6 +184,7 @@ ifeq ($(uname_O),Cygwin) UNRELIABLE_FSTAT = UnfortunatelyYes SPARSE_FLAGS = -isystem /usr/include/w32api -Wno-one-bit-signed-bitfield OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo + COMPAT_OBJS += compat/cygwin.o endif ifeq ($(uname_S),FreeBSD) NEEDS_LIBICONV = YesPlease diff --git a/git-compat-util.h b/git-compat-util.h index 047172d..db9c22d 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -189,6 +189,9 @@ #include #endif +#if defined(__CYGWIN__) +#include "compat/cygwin.h" +#endif #if defined(__MINGW32__) /* pull in Windows compatibility stuff */ #include "compat/mingw.h" diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh index 444b5a4..7ea2bb5 100755 --- a/t/t0060-path-utils.sh +++ b/t/t0060-path-utils.sh @@ -70,6 +70,8 @@ ancestor() { case $(uname -s) in *MINGW*) ;; +*CYGWIN*) + ;; *) test_set_prereq POSIX ;; -- 2.10.0
[PATCH v2 2/2] cygwin: Allow pushing to UNC paths
From: Torsten Bögershausen cygwin can use an UNC path like //server/share/repo $ cd //server/share/dir $ mkdir test $ cd test $ git init --bare However, when we try to push from a local Git repository to this repo, there is a problem: Git converts the leading "//" into a single "/". As cygwin handles an UNC path so well, Git can support them better: - Introduce cygwin_offset_1st_component() which keeps the leading "//", similar to what Git for Windows does. - Move CYGWIN out of the POSIX in the tests for path normalization in t0060. --- config.mak.uname | 1 + git-compat-util.h | 3 +++ t/t0060-path-utils.sh | 2 ++ 3 files changed, 6 insertions(+) diff --git a/config.mak.uname b/config.mak.uname index adfb90b..551e465 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -184,6 +184,7 @@ ifeq ($(uname_O),Cygwin) UNRELIABLE_FSTAT = UnfortunatelyYes SPARSE_FLAGS = -isystem /usr/include/w32api -Wno-one-bit-signed-bitfield OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo + COMPAT_OBJS += compat/cygwin.o endif ifeq ($(uname_S),FreeBSD) NEEDS_LIBICONV = YesPlease diff --git a/git-compat-util.h b/git-compat-util.h index 047172d..db9c22d 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -189,6 +189,9 @@ #include #endif +#if defined(__CYGWIN__) +#include "compat/cygwin.h" +#endif #if defined(__MINGW32__) /* pull in Windows compatibility stuff */ #include "compat/mingw.h" diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh index 444b5a4..7ea2bb5 100755 --- a/t/t0060-path-utils.sh +++ b/t/t0060-path-utils.sh @@ -70,6 +70,8 @@ ancestor() { case $(uname -s) in *MINGW*) ;; +*CYGWIN*) + ;; *) test_set_prereq POSIX ;; -- 2.10.0
[PATCH v2 1/2] Check DB_ENVIRONMENT using is_directory()
From: Torsten Bögershausen In setup.c is_git_directory() checks a Git directory using access(X_OK). This does not check, if path is a file or a directory. Check path with is_directory() instead. --- After all the discussions (and lots of tests) I found that this patch works for my setup. All in all could the error reporting be improvved for is_git_directory(), as there may be "access denied", or "not a directory" or others, but that is for another day. setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.c b/setup.c index 358fbc2..5a7ee2e 100644 --- a/setup.c +++ b/setup.c @@ -321,7 +321,7 @@ int is_git_directory(const char *suspect) /* Check non-worktree-related signatures */ if (getenv(DB_ENVIRONMENT)) { - if (access(getenv(DB_ENVIRONMENT), X_OK)) + if (!is_directory(getenv(DB_ENVIRONMENT))) goto done; } else { -- 2.10.0
[PATCH/RFC v1 1/1] cygwin: Allow pushing to UNC paths
From: Torsten Bögershausen cygwin can use an UNC path like //server/share/repo $ cd //server/share/dir $ mkdir test $ cd test $ git init --bare However, when we try to push from a local Git repository to this repo, there are 2 problems: - Git converts the leading "//" into a single "/". - The remote repo is not accepted because setup.c calls access(getenv(DB_ENVIRONMENT), X_OK) and this call fails. In other words, checking the executable bit of a directory mounted on a SAMBA share is not reliable (and not needed). As cygwin handles an UNC path so well, Git can support them better. - Introduce cygwin_offset_1st_component() which keeps the leading "//", similar to what Git for Windows does. - Move CYGWIN out of the POSIX in the tests for path normalization in t0060. - Use cygwin_access() with a relaxed test for the executable bit on a directory pointed out by an UNC path. Signed-off-by: Torsten Bögershausen --- compat/cygwin.c | 29 + compat/cygwin.h | 7 +++ config.mak.uname | 1 + git-compat-util.h | 3 +++ t/t0060-path-utils.sh | 2 ++ 5 files changed, 42 insertions(+) create mode 100644 compat/cygwin.c create mode 100644 compat/cygwin.h diff --git a/compat/cygwin.c b/compat/cygwin.c new file mode 100644 index 000..d98e877 --- /dev/null +++ b/compat/cygwin.c @@ -0,0 +1,29 @@ +#include "../git-compat-util.h" +#include "../cache.h" + +int cygwin_offset_1st_component(const char *path) +{ + const char *pos = path; + /* unc paths */ + if (is_dir_sep(pos[0]) && is_dir_sep(pos[1])) { + /* skip server name */ + pos = strchr(pos + 2, '/'); + if (!pos) + return 0; /* Error: malformed unc path */ + + do { + pos++; + } while (*pos && pos[0] != '/'); + } + return pos + is_dir_sep(*pos) - path; +} + +#undef access +int cygwin_access(const char *filename, int mode) +{ + /* the execute bit does not work on SAMBA drives */ + if (filename[0] == '/' && filename[1] == '/' ) + return access(filename, mode & ~X_OK); + else + return access(filename, mode); +} diff --git a/compat/cygwin.h b/compat/cygwin.h new file mode 100644 index 000..efa12ad --- /dev/null +++ b/compat/cygwin.h @@ -0,0 +1,7 @@ +int cygwin_access(const char *filename, int mode); +#undef access +#define access cygwin_access + + +int cygwin_offset_1st_component(const char *path); +#define offset_1st_component cygwin_offset_1st_component diff --git a/config.mak.uname b/config.mak.uname index adfb90b..551e465 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -184,6 +184,7 @@ ifeq ($(uname_O),Cygwin) UNRELIABLE_FSTAT = UnfortunatelyYes SPARSE_FLAGS = -isystem /usr/include/w32api -Wno-one-bit-signed-bitfield OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo + COMPAT_OBJS += compat/cygwin.o endif ifeq ($(uname_S),FreeBSD) NEEDS_LIBICONV = YesPlease diff --git a/git-compat-util.h b/git-compat-util.h index 047172d..db9c22d 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -189,6 +189,9 @@ #include #endif +#if defined(__CYGWIN__) +#include "compat/cygwin.h" +#endif #if defined(__MINGW32__) /* pull in Windows compatibility stuff */ #include "compat/mingw.h" diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh index 444b5a4..7ea2bb5 100755 --- a/t/t0060-path-utils.sh +++ b/t/t0060-path-utils.sh @@ -70,6 +70,8 @@ ancestor() { case $(uname -s) in *MINGW*) ;; +*CYGWIN*) + ;; *) test_set_prereq POSIX ;; -- 2.10.0
[PATCH v3 1/1] t0027: tests are not expensive; remove t0025
From: Torsten Bögershausen The purpose of t0027 is to test all CRLF related conversions at "git checkout" and "git add". Running t0027 under Git for Windows takes 3-4 minutes, so the whole script had been marked as "EXPENSIVE". The source code for "Git for Windows" overrides this since 2014: "t0027 is marked expensive, but really, for MinGW we want to run these tests always." Recent "stress" tests show that t0025 if flaky, reported by Lars Schneider, larsxschnei...@gmail.com All tests in t0025 are covered by t0027 already, so that t0025 can be retired. t0027 takes less than 14 seconds under Linux, and 63 seconds under Mac Os X, and this is more or less the same with a SSD or a spinning disk. Acked-by: Johannes Schindelin Signed-off-by: Torsten Bögershausen --- t/t0025-crlf-auto.sh | 181 --- t/t0027-auto-crlf.sh | 6 -- 2 files changed, 187 deletions(-) delete mode 100755 t/t0025-crlf-auto.sh diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh deleted file mode 100755 index 89826c5..000 --- a/t/t0025-crlf-auto.sh +++ /dev/null @@ -1,181 +0,0 @@ -#!/bin/sh - -test_description='CRLF conversion' - -. ./test-lib.sh - -has_cr() { - tr '\015' Q <"$1" | grep Q >/dev/null -} - -test_expect_success setup ' - - git config core.autocrlf false && - - for w in Hello world how are you; do echo $w; done >LFonly && - for w in I am very very fine thank you; do echo ${w}Q; done | q_to_cr >CRLFonly && - for w in Oh here is a QNUL byte how alarming; do echo ${w}; done | q_to_nul >LFwithNUL && - git add . && - - git commit -m initial && - - LFonly=$(git rev-parse HEAD:LFonly) && - CRLFonly=$(git rev-parse HEAD:CRLFonly) && - LFwithNUL=$(git rev-parse HEAD:LFwithNUL) && - - echo happy. -' - -test_expect_success 'default settings cause no changes' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git read-tree --reset -u HEAD && - - ! has_cr LFonly && - has_cr CRLFonly && - LFonlydiff=$(git diff LFonly) && - CRLFonlydiff=$(git diff CRLFonly) && - LFwithNULdiff=$(git diff LFwithNUL) && - test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff" -' - -test_expect_success 'crlf=true causes a CRLF file to be normalized' ' - - # Backwards compatibility check - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - echo "CRLFonly crlf" > .gitattributes && - git read-tree --reset -u HEAD && - - # Note, "normalized" means that git will normalize it if added - has_cr CRLFonly && - CRLFonlydiff=$(git diff CRLFonly) && - test -n "$CRLFonlydiff" -' - -test_expect_success 'text=true causes a CRLF file to be normalized' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - echo "CRLFonly text" > .gitattributes && - git read-tree --reset -u HEAD && - - # Note, "normalized" means that git will normalize it if added - has_cr CRLFonly && - CRLFonlydiff=$(git diff CRLFonly) && - test -n "$CRLFonlydiff" -' - -test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=false' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf false && - echo "LFonly eol=crlf" > .gitattributes && - git read-tree --reset -u HEAD && - - has_cr LFonly && - LFonlydiff=$(git diff LFonly) && - test -z "$LFonlydiff" -' - -test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=input' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf input && - echo "LFonly eol=crlf" > .gitattributes && - git read-tree --reset -u HEAD && - - has_cr LFonly && - LFonlydiff=$(git diff LFonly) && - test -z "$LFonlydiff" -' - -test_expect_success 'eol=lf gives a normalized file LFs with autocrlf=true' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf true && - echo "LFonly eol=lf" > .gitattributes && - git read-tree --reset -u HEAD && - - ! has_cr LFonly && - LFonlydiff=$(git diff LFonly) && - test -z "$LFonlydiff" -' - -test_expect_success 'autocrlf=true does not normalize CRLF files' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf true && - git read-tree --reset -u HEAD && - - has_cr LFonly && - has_cr CRLFonly && - LFonlydiff=$(git diff LFonly) && - CRLFonlydiff=$(git diff CRLFonly) && - LFwithNULdiff=$(git diff LFwithNUL) && - test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff" -' - -test_expect_success 'text=auto, autocrlf=true does not normalize CRLF files' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf true && - echo "* text=auto" > .gitattributes && - gi
[PATCH v2 1/1] t0027: tests are not expensive; remove t0025
From: Torsten Bögershausen The purpose of t0027 is to test all CRLF related conversions at "git checkout" and "git add". Running t0027 under Git for Windows takes 3-4 minutes, so the whole script had been marked as "EXPENSIVE". The source code for "Git for Windows" overrides this since 2014: "t0027 is marked expensive, but really, for MinGW we want to run these tests always." Recent "stress" tests show that t0025 if flaky, reported by Lars Schneider, larsxschnei...@gmail.com All tests from t0025 are covered in t0027 already, so that t0025 can be retiered: The execution time for t0027 is 14 seconds under Linux, and 63 seconds under Mac Os X. And in case you ask, things are not going significantly faster using a SSD instead of a spinning disk. Signed-off-by: Torsten Bögershausen --- t/t0025-crlf-auto.sh | 181 --- t/t0027-auto-crlf.sh | 6 -- 2 files changed, 187 deletions(-) delete mode 100755 t/t0025-crlf-auto.sh diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh deleted file mode 100755 index 89826c5..000 --- a/t/t0025-crlf-auto.sh +++ /dev/null @@ -1,181 +0,0 @@ -#!/bin/sh - -test_description='CRLF conversion' - -. ./test-lib.sh - -has_cr() { - tr '\015' Q <"$1" | grep Q >/dev/null -} - -test_expect_success setup ' - - git config core.autocrlf false && - - for w in Hello world how are you; do echo $w; done >LFonly && - for w in I am very very fine thank you; do echo ${w}Q; done | q_to_cr >CRLFonly && - for w in Oh here is a QNUL byte how alarming; do echo ${w}; done | q_to_nul >LFwithNUL && - git add . && - - git commit -m initial && - - LFonly=$(git rev-parse HEAD:LFonly) && - CRLFonly=$(git rev-parse HEAD:CRLFonly) && - LFwithNUL=$(git rev-parse HEAD:LFwithNUL) && - - echo happy. -' - -test_expect_success 'default settings cause no changes' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git read-tree --reset -u HEAD && - - ! has_cr LFonly && - has_cr CRLFonly && - LFonlydiff=$(git diff LFonly) && - CRLFonlydiff=$(git diff CRLFonly) && - LFwithNULdiff=$(git diff LFwithNUL) && - test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff" -' - -test_expect_success 'crlf=true causes a CRLF file to be normalized' ' - - # Backwards compatibility check - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - echo "CRLFonly crlf" > .gitattributes && - git read-tree --reset -u HEAD && - - # Note, "normalized" means that git will normalize it if added - has_cr CRLFonly && - CRLFonlydiff=$(git diff CRLFonly) && - test -n "$CRLFonlydiff" -' - -test_expect_success 'text=true causes a CRLF file to be normalized' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - echo "CRLFonly text" > .gitattributes && - git read-tree --reset -u HEAD && - - # Note, "normalized" means that git will normalize it if added - has_cr CRLFonly && - CRLFonlydiff=$(git diff CRLFonly) && - test -n "$CRLFonlydiff" -' - -test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=false' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf false && - echo "LFonly eol=crlf" > .gitattributes && - git read-tree --reset -u HEAD && - - has_cr LFonly && - LFonlydiff=$(git diff LFonly) && - test -z "$LFonlydiff" -' - -test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=input' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf input && - echo "LFonly eol=crlf" > .gitattributes && - git read-tree --reset -u HEAD && - - has_cr LFonly && - LFonlydiff=$(git diff LFonly) && - test -z "$LFonlydiff" -' - -test_expect_success 'eol=lf gives a normalized file LFs with autocrlf=true' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf true && - echo "LFonly eol=lf" > .gitattributes && - git read-tree --reset -u HEAD && - - ! has_cr LFonly && - LFonlydiff=$(git diff LFonly) && - test -z "$LFonlydiff" -' - -test_expect_success 'autocrlf=true does not normalize CRLF files' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf true && - git read-tree --reset -u HEAD && - - has_cr LFonly && - has_cr CRLFonly && - LFonlydiff=$(git diff LFonly) && - CRLFonlydiff=$(git diff CRLFonly) && - LFwithNULdiff=$(git diff LFwithNUL) && - test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff" -' - -test_expect_success 'text=auto, autocrlf=true does not normalize CRLF files' ' - - rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL && - git config core.autocrlf true && - echo "* text=auto" > .gitattr
[PATCH/RFC 1/1] t0027: Some tests are not expensive
From: Torsten Bögershausen The purpose of t0027 is to test all CRLF related conversions at "git checkout" and "git add". Running t0027 under Git for Windows takes 3-4 minutes, so the whole script had been marked as "EXPENSIVE". The source code for "Git for Windows" overrides this since 2014: "t0027 is marked expensive, but really, for MinGW we want to run these tests always." Recent "stress" tests show that t0025 if flaky, reported by Lars Schneider, larsxschnei...@gmail.com All tests from t0025 are covered in t0027 as well, so that t0025 can be retired later. Split the tests in t0027 into 2 groups: expensive and not expensive. Expensive are all tests which check the CRLF conversion warnings and all tests which activate the Git internal "ident" filter. All other test are now run under all platforms, which allows to remove the flaky t0025 in the next commit. The execution time for the non-expansive part is 6..8 seconds under Linux, and 32 seconds under Mac Os X. Running the "expensive" version roughly doubles the time. And in case you ask, things are not going significantly faster using a SSD instead of a spinning disk. Signed-off-by: Torsten Bögershausen PS: The removal of t0025 is not included (yet) --- t/t0027-auto-crlf.sh | 100 ++- 1 file changed, 59 insertions(+), 41 deletions(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 90db54c..2c5aff6 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -4,10 +4,12 @@ test_description='CRLF conversion all combinations' . ./test-lib.sh -if ! test_have_prereq EXPENSIVE +if ! test_have_prereq EXPENSIVE && ! test_have_prereq MINGW then - skip_all="EXPENSIVE not set" - test_done + say "# EXPENSIVE or MINGW not set, skipping ident and warning tests" +else + EXPENSIVE0027=t + export EXPENSIVE0027 fi compare_files () { @@ -95,11 +97,14 @@ commit_check_warn () { git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err" done && git commit -m "core.autocrlf $crlf" && - check_warning "$lfname" ${pfx}_LF.err && - check_warning "$crlfname" ${pfx}_CRLF.err && - check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err && - check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err && - check_warning "$crlfnul" ${pfx}_CRLF_nul.err + if test "$EXPENSIVE0027" = t + then + check_warning "$lfname" ${pfx}_LF.err && + check_warning "$crlfname" ${pfx}_CRLF.err && + check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err && + check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err && + check_warning "$crlfnul" ${pfx}_CRLF_nul.err + fi } commit_chk_wrnNNO () { @@ -122,24 +127,27 @@ commit_chk_wrnNNO () { git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err" done - test_expect_success "commit NNO files crlf=$crlf attr=$attr LF" ' - check_warning "$lfwarn" ${pfx}_LF.err - ' - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF" ' - check_warning "$crlfwarn" ${pfx}_CRLF.err - ' - - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_mix_LF" ' - check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err - ' - - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf LF_mix_cr" ' - check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err - ' - - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_nul" ' - check_warning "$crlfnul" ${pfx}_CRLF_nul.err - ' + if test "$EXPENSIVE0027" = t + then + test_expect_success "commit NNO files crlf=$crlf attr=$attr LF" ' + check_warning "$lfwarn" ${pfx}_LF.err + ' + test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF" ' + check_warning "$crlfwarn" ${pfx}_CRLF.err + ' + + test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_mix_LF" ' + check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err + ' + + test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf LF_mix_cr" ' + check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err + ' + + test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_nul" ' + check_warning "$crlfnul" ${pfx}_CRLF_nul.err + ' + fi } stats_ascii () { @@ -250,21 +258,24 @@ checkout_files () { fi done - test_expect_success "ls-files --eol attr=$attr $ident aeol=$aeol core.autocrlf=$crlf core.eol=$ceol" ' - test_when_finished "rm expect actual" && - sort <<-EOF >expect && -
[PATCH v2 1/1] Document how to normalize the line endings
From: Torsten Bögershausen The instructions how to normalize the line endings should have been updated as part of commit 6523728499e 'convert: unify the "auto" handling of CRLF', (but that part never made it into the commit). Update the documentation in Documentation/gitattributes.txt and add a test case in t0025. Reported by Kristian Adrup https://github.com/git-for-windows/git/issues/954 Signed-off-by: Torsten Bögershausen --- Documentation/gitattributes.txt | 6 ++ t/t0025-crlf-auto.sh| 26 ++ 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 976243a..3b76687 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -227,11 +227,9 @@ From a clean working directory: - $ echo "* text=auto" >.gitattributes -$ rm .git/index # Remove the index to force Git to -$ git reset # re-scan the working directory +$ rm .git/index # Remove the index to re-scan the working directory +$ git add . $ git status# Show files that will be normalized -$ git add -u -$ git add .gitattributes $ git commit -m "Introduce end-of-line normalization" - diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh index d0bee08..89826c5 100755 --- a/t/t0025-crlf-auto.sh +++ b/t/t0025-crlf-auto.sh @@ -152,4 +152,30 @@ test_expect_success 'eol=crlf _does_ normalize binary files' ' test -z "$LFwithNULdiff" ' +test_expect_success 'prepare unnormalized' ' + > .gitattributes && + git config core.autocrlf false && + printf "LINEONE\nLINETWO\r\n" >mixed && + git add mixed .gitattributes && + git commit -m "Add mixed" && + git ls-files --eol | egrep "i/crlf" && + git ls-files --eol | egrep "i/mixed" +' + +test_expect_success 'normalize unnormalized' ' + echo "* text=auto" >.gitattributes && + rm .git/index && + git add . && + git commit -m "Introduce end-of-line normalization" && + git ls-files --eol | tr "\\t" " " | sort >act && +cat >exp <
[PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
From: Junio C Hamano git diff --quiet may take a short-cut to see if a file is changed in the working tree: Whenever the file size differs from what is recorded in the index, the file is assumed to be changed and git diff --quiet returns exit with code 1 This shortcut must be suppressed whenever the line endings are converted or a filter is in use. The attributes say "* text=auto" and a file has "Hello\nWorld\n" in the index with a length of 12. The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14. (Or even "Hello\r\nWorld\n"). In this case "git add" will not do any changes to the index, and "git diff -quiet" should exit 0. Add calls to would_convert_to_git() before blindly saying that a different size means different content. Reported-By: Mike Crowe Signed-off-by: Torsten Bögershausen --- This is what I can come up with, collecting all the loose ends. I'm not sure if Mike wan't to have the Reported-By with a Signed-off-by ? The other question is, if the commit message summarizes the discussion well enough ? diff.c| 18 ++ t/t0028-diff-converted.sh | 27 +++ 2 files changed, 41 insertions(+), 4 deletions(-) create mode 100755 t/t0028-diff-converted.sh diff --git a/diff.c b/diff.c index 051761b..c264758 100644 --- a/diff.c +++ b/diff.c @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p) *differences. * * 2. At this point, the file is known to be modified, -*with the same mode and size, and the object -*name of one side is unknown. Need to inspect -*the identical contents. +*with the same mode and size, the object +*name of one side is unknown, or size comparison +*cannot be depended upon. Need to inspect the +*contents. */ if (!DIFF_FILE_VALID(p->one) || /* (1) */ !DIFF_FILE_VALID(p->two) || @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p) (p->one->mode != p->two->mode) || diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || - (p->one->size != p->two->size) || + + /* +* only if eol and other conversions are not involved, +* we can say that two contents of different sizes +* cannot be the same without checking their contents. +*/ + (!would_convert_to_git(p->one->path) && +!would_convert_to_git(p->two->path) && +(p->one->size != p->two->size)) || + !diff_filespec_is_identical(p->one, p->two)) /* (2) */ p->skip_stat_unmatch_result = 1; return p->skip_stat_unmatch_result; diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh new file mode 100755 index 000..3d5ab95 --- /dev/null +++ b/t/t0028-diff-converted.sh @@ -0,0 +1,27 @@ +#!/bin/sh +# +# Copyright (c) 2017 Mike Crowe +# +# These tests ensure that files changing line endings in the presence +# of .gitattributes to indicate that line endings should be ignored +# don't cause 'git diff' or 'git diff --quiet' to think that they have +# been changed. + +test_description='git diff with files that require CRLF conversion' + +. ./test-lib.sh + +test_expect_success setup ' + echo "* text=auto" >.gitattributes && + printf "Hello\r\nWorld\r\n" >crlf.txt && + git add .gitattributes crlf.txt && + git commit -m "initial" +' + +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' ' + printf "Hello\r\nWorld\n" >crlf.txt && + git status && + git diff --quiet +' + +test_done -- 2.10.0
[PATCH v2 1/1] convert: git cherry-pick -Xrenormalize did not work
From: Torsten Bögershausen Working with a repo that used to be all CRLF. At some point it was changed to all LF, with `text=auto` in .gitattributes. Trying to cherry-pick a commit from before the switchover fails: $ git cherry-pick -Xrenormalize fatal: CRLF would be replaced by LF in [path] Commit 65237284 "unify the "auto" handling of CRLF" introduced a regression: Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX, SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf(). This is wrong because here everything else than SAFE_CRLF_WARN is treated as SAFE_CRLF_FAIL. Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or SAFE_CRLF_FAIL. Reported-by: Eevee (Lexy Munroe) Signed-off-by: Torsten Bögershausen --- convert.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/convert.c b/convert.c index be91358..f8e4dfe 100644 --- a/convert.c +++ b/convert.c @@ -281,13 +281,13 @@ static int crlf_to_git(const char *path, const char *src, size_t len, /* * If the file in the index has any CR in it, do not convert. * This is the new safer autocrlf handling. + - unless we want to renormalize in a merge or cherry-pick */ - if (checksafe == SAFE_CRLF_RENORMALIZE) - checksafe = SAFE_CRLF_FALSE; - else if (has_cr_in_index(path)) + if ((checksafe != SAFE_CRLF_RENORMALIZE) && has_cr_in_index(path)) convert_crlf_into_lf = 0; } - if (checksafe && len) { + if ((checksafe == SAFE_CRLF_WARN || + (checksafe == SAFE_CRLF_FAIL)) && len) { struct text_stat new_stats; memcpy(&new_stats, &stats, sizeof(new_stats)); /* simulate "git add" */ -- 2.10.0
[PATCH v1 1/1] convert: git cherry-pick -Xrenormalize did not work
From: Torsten Bögershausen Working with a repo that used to be all CRLF. At some point it was changed to all LF, with `text=auto` in .gitattributes. Trying to cherry-pick a commit from before the switchover fails: $ git cherry-pick -Xrenormalize fatal: CRLF would be replaced by LF in [path] Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX, SAFE_CRLF_RENORMALIZE must be turned into CRLF_SAFE_FALSE. Reported-by: Eevee (Lexy Munroe) Signed-off-by: Torsten Bögershausen --- Thanks for reporting. Here is a less invasive patch. Please let me know, if the patch is OK for you (email address, does it work..) convert.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/convert.c b/convert.c index be91358..526ec1d 100644 --- a/convert.c +++ b/convert.c @@ -286,7 +286,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len, checksafe = SAFE_CRLF_FALSE; else if (has_cr_in_index(path)) convert_crlf_into_lf = 0; - } + } else if (checksafe == SAFE_CRLF_RENORMALIZE) + checksafe = SAFE_CRLF_FALSE; + if (checksafe && len) { struct text_stat new_stats; memcpy(&new_stats, &stats, sizeof(new_stats)); -- 2.10.0
[PATCH/RFC v1 1/1] New way to normalize the line endings
From: Torsten Bögershausen Sincec commit 6523728499e7 'convert: unify the "auto" handling of CRLF' the normalization instruction in Documentation/gitattributes.txt doesn't work any more. Update the documentation and add a test case. Reported by Kristian Adrup https://github.com/git-for-windows/git/issues/954 Signed-off-by: Torsten Bögershausen --- Documentation/gitattributes.txt | 7 +++ t/t0025-crlf-auto.sh| 29 + 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 976243a..1f7529a 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -227,11 +227,10 @@ From a clean working directory: - $ echo "* text=auto" >.gitattributes -$ rm .git/index # Remove the index to force Git to -$ git reset # re-scan the working directory +$ git ls-files --eol | egrep "i/(crlf|mixed)" # find not normalized files +$ rm .git/index # Remove the index to re-scan the working directory +$ git add . $ git status# Show files that will be normalized -$ git add -u -$ git add .gitattributes $ git commit -m "Introduce end-of-line normalization" - diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh index d0bee08..4ad4d02 100755 --- a/t/t0025-crlf-auto.sh +++ b/t/t0025-crlf-auto.sh @@ -152,4 +152,33 @@ test_expect_success 'eol=crlf _does_ normalize binary files' ' test -z "$LFwithNULdiff" ' +test_expect_success 'prepare unnormalized' ' + + > .gitattributes && + git config core.autocrlf false && + printf "LINEONE\nLINETWO\r\n" >mixed && + git add mixed .gitattributes && + git commit -m "Add mixed" && + git ls-files --eol | egrep "i/crlf" && + git ls-files --eol | egrep "i/mixed" + +' + +test_expect_success 'normalize unnormalized' ' + echo "* text=auto" >.gitattributes && + rm .git/index && + git add . && + git commit -m "Introduce end-of-line normalization" && + git ls-files --eol | tr "\\t" " " | sort >act && +cat >exp <
[PATCH v2 2/2] convert.c: stream and fast search for binary
From: Torsten Bögershausen When statistics are done for the autocrlf handling, the search in the content can be stopped, if e.g - a search for binary is done, and a NUL character is found - a search for CRLF is done, and the first CRLF is found. Similar when statistics for binary vs non-binary are gathered: Whenever a lone CR or NUL is found, the search can be aborted. When checking out files in "auto" mode, any file that has a "lone CR" or a CRLF will not be converted, so the search can be aborted early. Add the new bit, CONVERT_STAT_BITS_ANY_CR, which is set for either lone CR or CRLF. Many binary files have a NUL very early and it is often not necessary to load the whole content of a file or blob into memory. Split gather_stats() into gather_all_stats() and gather_stats_partly() to do a streaming handling for blobs and files in the worktree. Signed-off-by: Torsten Bögershausen --- convert.c | 191 ++ 1 file changed, 129 insertions(+), 62 deletions(-) diff --git a/convert.c b/convert.c index 077f5e6..2396fe5 100644 --- a/convert.c +++ b/convert.c @@ -3,6 +3,7 @@ #include "run-command.h" #include "quote.h" #include "sigchain.h" +#include "streaming.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -13,10 +14,12 @@ * translation when the "text" attribute or "auto_crlf" option is set. */ -/* Stat bits: When BIN is set, the txt bits are unset */ #define CONVERT_STAT_BITS_TXT_LF0x1 #define CONVERT_STAT_BITS_TXT_CRLF 0x2 #define CONVERT_STAT_BITS_BIN 0x4 +#define CONVERT_STAT_BITS_ANY_CR0x8 + +#define STREAM_BUFFER_SIZE (1024*16) enum crlf_action { CRLF_UNDEFINED, @@ -31,30 +34,36 @@ enum crlf_action { struct text_stat { /* NUL, CR, LF and CRLF counts */ - unsigned nul, lonecr, lonelf, crlf; + unsigned stat_bits, lonecr, lonelf, crlf; /* These are just approximations! */ unsigned printable, nonprintable; }; -static void gather_stats(const char *buf, unsigned long size, struct text_stat *stats) +static void gather_stats_partly(const char *buf, unsigned long size, + struct text_stat *stats, unsigned search_only) { unsigned long i; - memset(stats, 0, sizeof(*stats)); - + if (!buf || !size) + return; for (i = 0; i < size; i++) { unsigned char c = buf[i]; if (c == '\r') { + stats->stat_bits |= CONVERT_STAT_BITS_ANY_CR; if (i+1 < size && buf[i+1] == '\n') { stats->crlf++; i++; - } else + stats->stat_bits |= CONVERT_STAT_BITS_TXT_CRLF; + } else { stats->lonecr++; + stats->stat_bits |= CONVERT_STAT_BITS_BIN; + } continue; } if (c == '\n') { stats->lonelf++; + stats->stat_bits |= CONVERT_STAT_BITS_TXT_LF; continue; } if (c == 127) @@ -67,7 +76,7 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * stats->printable++; break; case 0: - stats->nul++; + stats->stat_bits |= CONVERT_STAT_BITS_BIN; /* fall through */ default: stats->nonprintable++; @@ -75,6 +84,8 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * } else stats->printable++; + if (stats->stat_bits & search_only) + break; /* We found what we have been searching for */ } /* If file ends with EOF then don't count this EOF as non-printable. */ @@ -86,41 +97,62 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * * The same heuristics as diff.c::mmfile_is_binary() * We treat files with bare CR as binary */ -static int convert_is_binary(unsigned long size, const struct text_stat *stats) +static void convert_nonprintable(struct text_stat *stats) { - if (stats->lonecr) - return 1; - if (stats->nul) - return 1; if ((stats->printable >> 7) < stats->nonprintable) - return 1; - return 0; + stats->stat_bits |= CONVERT_STAT_BITS_BIN; } -static unsigned int gather_convert_stats(const char *data, unsigned long size) +static void gather_all_stats(const char *buf, unsigned long size, +struct text_stat *stats, unsigned sear
[PATCH v2 1/2] read-cache: factor out get_sha1_from_index() helper
From: Torsten Bögershausen Factor out the retrieval of the sha1 for a given path in read_blob_data_from_index() into the function get_sha1_from_index(). This will be used in the next commit, when convert.c can do the analyze for "text=auto" without slurping the whole blob into memory at once. Add a wrapper definition get_sha1_from_cache(). Signed-off-by: Torsten Bögershausen --- cache.h | 3 +++ read-cache.c | 29 ++--- 2 files changed, 21 insertions(+), 11 deletions(-) diff --git a/cache.h b/cache.h index 1604e29..04de209 100644 --- a/cache.h +++ b/cache.h @@ -380,6 +380,7 @@ extern void free_name_hash(struct index_state *istate); #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at) #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec) #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz)) +#define get_sha1_from_cache(path) get_sha1_from_index (&the_index, (path)) #endif enum object_type { @@ -1089,6 +1090,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type * return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT); } +const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path); + /* * This internal function is only declared here for the benefit of * lookup_replace_object(). Please do not call it directly. diff --git a/read-cache.c b/read-cache.c index 38d67fa..5a1df14 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2290,13 +2290,27 @@ int index_name_is_other(const struct index_state *istate, const char *name, void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size) { - int pos, len; + const unsigned char *sha1; unsigned long sz; enum object_type type; void *data; - len = strlen(path); - pos = index_name_pos(istate, path, len); + sha1 = get_sha1_from_index(istate, path); + if (!sha1) + return NULL; + data = read_sha1_file(sha1, &type, &sz); + if (!data || type != OBJ_BLOB) { + free(data); + return NULL; + } + if (size) + *size = sz; + return data; +} + +const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path) +{ + int pos = index_name_pos(istate, path, strlen(path)); if (pos < 0) { /* * We might be in the middle of a merge, in which @@ -2312,14 +2326,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un } if (pos < 0) return NULL; - data = read_sha1_file(istate->cache[pos]->oid.hash, &type, &sz); - if (!data || type != OBJ_BLOB) { - free(data); - return NULL; - } - if (size) - *size = sz; - return data; + return istate->cache[pos]->oid.hash; } void stat_validity_clear(struct stat_validity *sv) -- 2.10.0
[PATCH v2 0/2] Stream and fast search
From: Torsten Bögershausen Changes since v1: - Rename earlyout into search_only - Increase buffer from 2KiB to 16KiB - s/mask/eol_bits/ - Reduce the "noise" - Document "split gather_stats() into gather_all_stats()/gather_stats_partly() Torsten Bögershausen (2): read-cache: factor out get_sha1_from_index() helper convert.c: stream and fast search for binary cache.h | 3 + convert.c| 191 --- read-cache.c | 29 + 3 files changed, 150 insertions(+), 73 deletions(-) -- 2.10.0
[PATCH v1 1/2] read-cache: factor out get_sha1_from_index() helper
From: Torsten Bögershausen Factor out the retrieval of the sha1 for a given path in read_blob_data_from_index() into the function get_sha1_from_index(). This will be used in the next commit, when convert.c can do the analyze for "text=auto" without slurping the whole blob into memory at once. Add a wrapper definition get_sha1_from_cache(). --- cache.h | 3 +++ read-cache.c | 29 ++--- 2 files changed, 21 insertions(+), 11 deletions(-) diff --git a/cache.h b/cache.h index 1604e29..04de209 100644 --- a/cache.h +++ b/cache.h @@ -380,6 +380,7 @@ extern void free_name_hash(struct index_state *istate); #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at) #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec) #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz)) +#define get_sha1_from_cache(path) get_sha1_from_index (&the_index, (path)) #endif enum object_type { @@ -1089,6 +1090,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type * return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT); } +const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path); + /* * This internal function is only declared here for the benefit of * lookup_replace_object(). Please do not call it directly. diff --git a/read-cache.c b/read-cache.c index 38d67fa..5a1df14 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2290,13 +2290,27 @@ int index_name_is_other(const struct index_state *istate, const char *name, void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size) { - int pos, len; + const unsigned char *sha1; unsigned long sz; enum object_type type; void *data; - len = strlen(path); - pos = index_name_pos(istate, path, len); + sha1 = get_sha1_from_index(istate, path); + if (!sha1) + return NULL; + data = read_sha1_file(sha1, &type, &sz); + if (!data || type != OBJ_BLOB) { + free(data); + return NULL; + } + if (size) + *size = sz; + return data; +} + +const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path) +{ + int pos = index_name_pos(istate, path, strlen(path)); if (pos < 0) { /* * We might be in the middle of a merge, in which @@ -2312,14 +2326,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un } if (pos < 0) return NULL; - data = read_sha1_file(istate->cache[pos]->oid.hash, &type, &sz); - if (!data || type != OBJ_BLOB) { - free(data); - return NULL; - } - if (size) - *size = sz; - return data; + return istate->cache[pos]->oid.hash; } void stat_validity_clear(struct stat_validity *sv) -- 2.10.0
[PATCH v1 0/2] convert: stream and early out
From: Torsten Bögershausen An optimization when autocrlf is used and the binary/text detection is run. Or git ls-files --eol is run to analyze the content of files or blobs. Torsten Bögershausen (2): read-cache: factor out get_sha1_from_index() helper convert.c: stream and early out cache.h | 3 + convert.c| 195 +++ read-cache.c | 29 + 3 files changed, 151 insertions(+), 76 deletions(-) -- 2.10.0
[PATCH v1 2/2] convert.c: stream and early out
From: Torsten Bögershausen When statistics are done for the autocrlf handling, the search in the content can be stopped, if e.g - a search for binary is done, and a NUL character is found - a search for CRLF is done, and the first CRLF is found. Similar when statistics for binary vs non-binary are gathered: Whenever a lone CR or NUL is found, the search can be aborted. When checking out files in "auto" mode, any file that has a "lone CR" or a CRLF will not be converted, so the search can be aborted early. Add the new bit, CONVERT_STAT_BITS_ANY_CR, which is set for either lone CR or CRLF. Many binary files have a NUL very early (within the first few bytes, latest within the first 1..2K). It is often not necessary to load the whole content of a file or blob into memory. Use a streaming handling for blobs and files in the worktree. --- convert.c | 195 +- 1 file changed, 130 insertions(+), 65 deletions(-) diff --git a/convert.c b/convert.c index 077f5e6..6a625e5 100644 --- a/convert.c +++ b/convert.c @@ -3,6 +3,7 @@ #include "run-command.h" #include "quote.h" #include "sigchain.h" +#include "streaming.h" /* * convert.c - convert a file when checking it out and checking it in. @@ -13,10 +14,10 @@ * translation when the "text" attribute or "auto_crlf" option is set. */ -/* Stat bits: When BIN is set, the txt bits are unset */ #define CONVERT_STAT_BITS_TXT_LF0x1 #define CONVERT_STAT_BITS_TXT_CRLF 0x2 #define CONVERT_STAT_BITS_BIN 0x4 +#define CONVERT_STAT_BITS_ANY_CR0x8 enum crlf_action { CRLF_UNDEFINED, @@ -31,30 +32,36 @@ enum crlf_action { struct text_stat { /* NUL, CR, LF and CRLF counts */ - unsigned nul, lonecr, lonelf, crlf; + unsigned stat_bits, lonecr, lonelf, crlf; /* These are just approximations! */ unsigned printable, nonprintable; }; -static void gather_stats(const char *buf, unsigned long size, struct text_stat *stats) +static void gather_stats_partly(const char *buf, unsigned long len, + struct text_stat *stats, unsigned earlyout) { unsigned long i; - memset(stats, 0, sizeof(*stats)); - - for (i = 0; i < size; i++) { + if (!buf || !len) + return; + for (i = 0; i < len; i++) { unsigned char c = buf[i]; if (c == '\r') { - if (i+1 < size && buf[i+1] == '\n') { + stats->stat_bits |= CONVERT_STAT_BITS_ANY_CR; + if (i+1 < len && buf[i+1] == '\n') { stats->crlf++; i++; - } else + stats->stat_bits |= CONVERT_STAT_BITS_TXT_CRLF; + } else { stats->lonecr++; + stats->stat_bits |= CONVERT_STAT_BITS_BIN; + } continue; } if (c == '\n') { stats->lonelf++; + stats->stat_bits |= CONVERT_STAT_BITS_TXT_LF; continue; } if (c == 127) @@ -67,7 +74,7 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * stats->printable++; break; case 0: - stats->nul++; + stats->stat_bits |= CONVERT_STAT_BITS_BIN; /* fall through */ default: stats->nonprintable++; @@ -75,10 +82,12 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * } else stats->printable++; + if (stats->stat_bits & earlyout) + break; /* We found what we have been searching for */ } /* If file ends with EOF then don't count this EOF as non-printable. */ - if (size >= 1 && buf[size-1] == '\032') + if (len >= 1 && buf[len-1] == '\032') stats->nonprintable--; } @@ -86,41 +95,62 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * * The same heuristics as diff.c::mmfile_is_binary() * We treat files with bare CR as binary */ -static int convert_is_binary(unsigned long size, const struct text_stat *stats) +static void convert_nonprintable(struct text_stat *stats) { - if (stats->lonecr) - return 1; - if (stats->nul) - return 1; if ((stats->printable >> 7) < stats->nonprintable) - return 1; - return 0; + stats->stat_bits |= CONVERT_STAT_BITS_BIN; } -static unsigned int gather_convert_stats(const char *data, unsigned long s
[PATCH v2 0/2] Adjust the documentation to the unified "auto" handling
From: Torsten Bögershausen Changes since v1: - 1/2 is left unchanged - 2/2 is re-written and should be more consistant to read. Torsten Bögershausen (2): git ls-files: text=auto eol=lf is supported in Git 2.10 gitattributes: Document the unified "auto" handling Documentation/git-ls-files.txt | 3 +-- Documentation/gitattributes.txt | 58 + 2 files changed, 25 insertions(+), 36 deletions(-) -- 2.9.0.243.g5c589a7 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] gitattributes: Document the unified "auto" handling
From: Torsten Bögershausen Update the documentation about text=auto: text=auto now follows the core.autocrlf handling when files are not normalized in the repository. For a cross platform project recommend the usage of attributes for line-ending conversions. Signed-off-by: Torsten Bögershausen --- Documentation/gitattributes.txt | 58 + 1 file changed, 24 insertions(+), 34 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 807577a..7aff940 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -182,23 +182,6 @@ While Git normally leaves file contents alone, it can be configured to normalize line endings to LF in the repository and, optionally, to convert them to CRLF when files are checked out. -Here is an example that will make Git normalize .txt, .vcproj and .sh -files, ensure that .vcproj files have CRLF and .sh files have LF in -the working directory, and prevent .jpg files from being normalized -regardless of their content. - - -* text=auto -*.txt text -*.vcproj text eol=crlf -*.sh text eol=lf -*.jpg -text - - -Other source code management systems normalize all text files in their -repositories, and there are two ways to enable similar automatic -normalization in Git. - If you simply want to have CRLF line endings in your working directory regardless of the repository you are working with, you can set the config variable "core.autocrlf" without using any attributes. @@ -208,35 +191,42 @@ config variable "core.autocrlf" without using any attributes. autocrlf = true -This does not force normalization of all text files, but does ensure +This does not force normalization of text files, but does ensure that text files that you introduce to the repository have their line endings normalized to LF when they are added, and that files that are already normalized in the repository stay normalized. -If you want to interoperate with a source code management system that -enforces end-of-line normalization, or you simply want all text files -in your repository to be normalized, you should instead set the `text` -attribute to "auto" for _all_ files. +If you want to ensure that text files that any contributor introduces to +the repository have their line endings normalized, you can set the +`text` attribute to "auto" for _all_ files. * text=auto -This ensures that all files that Git considers to be text will have -normalized (LF) line endings in the repository. The `core.eol` -configuration variable controls which line endings Git will use for -normalized files in your working directory; the default is to use the -native line ending for your platform, or CRLF if `core.autocrlf` is -set. +The attributes allow a fine-grained control, how the line endings +are converted. +Here is an example that will make Git normalize .txt, .vcproj and .sh +files, ensure that .vcproj files have CRLF and .sh files have LF in +the working directory, and prevent .jpg files from being normalized +regardless of their content. + + +* text=auto +*.txt text +*.vcproj text eol=crlf +*.sh text eol=lf +*.jpg -text + + +NOTE: When `text=auto` conversion is enabled in a cross-platform +project using push and pull to a central repository the text files +containing CRLFs should be normalized. -NOTE: When `text=auto` normalization is enabled in an existing -repository, any text files containing CRLFs should be normalized. If -they are not they will be normalized the next time someone tries to -change them, causing unfortunate misattribution. From a clean working -directory: +From a clean working directory: - -$ echo "* text=auto" >>.gitattributes +$ echo "* text=auto" >.gitattributes $ rm .git/index # Remove the index to force Git to $ git reset # re-scan the working directory $ git status# Show files that will be normalized -- 2.9.0.243.g5c589a7 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] git ls-files: text=auto eol=lf is supported in Git 2.10
From: Torsten Bögershausen The man page for `git ls-files --eol` mentions the combination of text attributes "text=auto eol=lf" or "text=auto eol=crlf" as not supported yet, but may be in the future. Now they are supported Signed-off-by: Torsten Bögershausen --- Documentation/git-ls-files.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index 078b556..0d933ac 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -159,8 +159,7 @@ not accessible in the working tree. + is the attribute that is used when checking out or committing, it is either "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf". -Note: Currently Git does not support "text=auto eol=lf" or "text=auto eol=crlf", -that may change in the future. +Since Git 2.10 "text=auto eol=lf" and "text=auto eol=crlf" are supported. + Both the in the index ("i/") and in the working tree ("w/") are shown for regular files, -- 2.9.0.243.g5c589a7 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 0/3] Update eol documentation
From: Torsten Bögershausen Sorry for posting this so late: While reviewing another patch I realized that the eol related documentation was not updated as it should be. Torsten Bögershausen (2): git ls-files: text=auto eol=lf is supported in Git 2.10 gitattributes: Document the unified "auto" handling Documentation/git-ls-files.txt | 3 +-- Documentation/gitattributes.txt | 24 2 files changed, 17 insertions(+), 10 deletions(-) -- 2.9.3.599.g2376d31.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 1/2] git ls-files: text=auto eol=lf is supported in Git 2.10
From: Torsten Bögershausen The man page for `git ls-files --eol` mentions the combination of text attributes "text=auto eol=lf" or "text=auto eol=crlf" as not supported yet, but may be in the future. Now they are supported Signed-off-by: Torsten Bögershausen --- Documentation/git-ls-files.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index 078b556..0d933ac 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -159,8 +159,7 @@ not accessible in the working tree. + is the attribute that is used when checking out or committing, it is either "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf". -Note: Currently Git does not support "text=auto eol=lf" or "text=auto eol=crlf", -that may change in the future. +Since Git 2.10 "text=auto eol=lf" and "text=auto eol=crlf" are supported. + Both the in the index ("i/") and in the working tree ("w/") are shown for regular files, -- 2.9.3.599.g2376d31.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 2/2] gitattributes: Document the unified "auto" handling
From: Torsten Bögershausen Update the documentation about text=auto: text=auto now follows the core.autocrlf handling when files are not normalized in the repository. For a cross platform project recommend the usage of attributes for line-ending conversions. Signed-off-by: Torsten Bögershausen --- Documentation/gitattributes.txt | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 807577a..4012661 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -213,27 +213,35 @@ that text files that you introduce to the repository have their line endings normalized to LF when they are added, and that files that are already normalized in the repository stay normalized. +If you want to ensure that text files that any contributor introduces to +the repository have their line endings normalized, you could set the +`text` attribute to "auto" for _all_ files. + + +* text=auto + + If you want to interoperate with a source code management system that enforces end-of-line normalization, or you simply want all text files in your repository to be normalized, you should instead set the `text` -attribute to "auto" for _all_ files. +attribute to "text" for text files. -* text=auto +*.txt text -This ensures that all files that Git considers to be text will have +This ensures that all files marked as text will have normalized (LF) line endings in the repository. The `core.eol` configuration variable controls which line endings Git will use for normalized files in your working directory; the default is to use the native line ending for your platform, or CRLF if `core.autocrlf` is set. -NOTE: When `text=auto` normalization is enabled in an existing -repository, any text files containing CRLFs should be normalized. If -they are not they will be normalized the next time someone tries to -change them, causing unfortunate misattribution. From a clean working -directory: +NOTE: When you have a cross-platform project using push and pull +to a central repository the text files containing CRLFs should be +normalized. All text files should have a text attribute, either +`text` or `text=auto`. +From a clean working directory: - $ echo "* text=auto" >>.gitattributes -- 2.9.3.599.g2376d31.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 0/1] Rename NotNormalized (NNO) into CRLF in index
From: Torsten Bögershausen Here comes the promised cleanup of t0027: - The wording NNO is removed and replaced by CRI - No code changes - Needs to go on top of next or pu or tb/t0027-raciness-fix Torsten Bögershausen (1): t0027: Rename NotNormalized (NNO) into CRLF in index t/t0027-auto-crlf.sh | 122 +-- 1 file changed, 61 insertions(+), 61 deletions(-) -- 2.9.3.599.g2376d31.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 1/1] t0027: Rename NotNormalized (NNO) into CRLF in index
From: Torsten Bögershausen Originally NNO stands for content, that had been commited "Not NOrmalized", in other words files with CRLF in the index. Make more clear what should be tested: - commit a file with CRLF into the index - Change the content in the working tree - Run "git add" and check for the conversion warnings - Repeat for different content (text, LF, CRLF, mixed) and binary (LF and lone CR, CRLF with NUL) Rename commit_chk_wrnNNO() into CRI_add_chk_wrn() and rename NNO into CRI. Integrate create_NNO_files() into 'setup master' Signed-off-by: Torsten Bögershausen --- t/t0027-auto-crlf.sh | 122 +-- 1 file changed, 61 insertions(+), 61 deletions(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 90db54c..bfcf14b 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -49,24 +49,6 @@ create_gitattributes () { } >.gitattributes } -create_NNO_files () { - for crlf in false true input - do - for attr in "" auto text -text - do - for aeol in "" lf crlf - do - pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf} - cp CRLF_mix_LF ${pfx}_LF.txt && - cp CRLF_mix_LF ${pfx}_CRLF.txt && - cp CRLF_mix_LF ${pfx}_CRLF_mix_LF.txt && - cp CRLF_mix_LF ${pfx}_LF_mix_CR.txt && - cp CRLF_mix_LF ${pfx}_CRLF_nul.txt - done - done - done -} - check_warning () { case "$1" in LF_CRLF) echo "warning: LF will be replaced by CRLF" >"$2".expect ;; @@ -102,7 +84,7 @@ commit_check_warn () { check_warning "$crlfnul" ${pfx}_CRLF_nul.err } -commit_chk_wrnNNO () { +CRI_add_chk_wrn () { attr=$1 ; shift aeol=$1 ; shift crlf=$1 ; shift @@ -111,7 +93,7 @@ commit_chk_wrnNNO () { lfmixcrlf=$1 ; shift lfmixcr=$1 ; shift crlfnul=$1 ; shift - pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf} + pfx=CRI_attr_${attr}_aeol_${aeol}_${crlf} #Commit files on top of existing file create_gitattributes "$attr" $aeol && for f in LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul @@ -122,22 +104,22 @@ commit_chk_wrnNNO () { git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err" done - test_expect_success "commit NNO files crlf=$crlf attr=$attr LF" ' + test_expect_success "CRLF in index add file crlf=$crlf attr=$attr LF" ' check_warning "$lfwarn" ${pfx}_LF.err ' - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF" ' + test_expect_success "CRLF in index add file attr=$attr aeol=$aeol crlf=$crlf CRLF" ' check_warning "$crlfwarn" ${pfx}_CRLF.err ' - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_mix_LF" ' + test_expect_success "CRLF in index add file attr=$attr aeol=$aeol crlf=$crlf CRLF_mix_LF" ' check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err ' - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf LF_mix_cr" ' + test_expect_success "CRLF in index add file attr=$attr aeol=$aeol crlf=$crlf LF_mix_cr" ' check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err ' - test_expect_success "commit NNO files attr=$attr aeol=$aeol crlf=$crlf CRLF_nul" ' + test_expect_success "CRLF in index add file attr=$attr aeol=$aeol crlf=$crlf CRLF_nul" ' check_warning "$crlfnul" ${pfx}_CRLF_nul.err ' } @@ -199,7 +181,7 @@ check_files_in_repo () { compare_files $crlfnul ${pfx}CRLF_nul.txt } -check_in_repo_NNO () { +check_in_repo_CRI () { attr=$1 ; shift aeol=$1 ; shift crlf=$1 ; shift @@ -208,7 +190,7 @@ check_in_repo_NNO () { lfmixcrlf=$1 ; shift lfmixcr=$1 ; shift crlfnul=$1 ; shift - pfx=NNO_attr_${attr}_aeol_${aeol}_${crlf} + pfx=CRI_attr_${attr}_aeol_${aeol}_${crlf} test_expect_success "compare_files $lfname ${pfx}_LF.txt" ' compare_files $lfname ${pfx}_LF.txt ' @@ -329,8 +311,22 @@ test_expect_success 'setup master' ' printf "\$Id: \$\r\nLINEONE\r\nLINETWO\rLINETHREE" >CRLF_mix_CR && printf "\$Id: \$\r\nLINEONEQ\r\nLINETWO\r\nLINETHREE" | q_to_nul >CRLF_nul && printf "\$Id: \$\nLINEONEQ\nLINETWO\nLINETHREE" | q_to_nul >LF_nul && - create_NNO_files CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF && - git -c core.autocrlf=false add NNO_*.txt && + for crlf in false true input + do + for attr in "" auto text -text +