Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-19 Thread Junio C Hamano
Torsten Bögershausen  writes:

> The only problematic system is Win64, where "unsigned long" is 32 bit,
> and therefore we must use size_t to address data in memory.
> This is not to be confused with off_t, which is used for "data on disk"
> (and nothing else) or timestamp_t which is used for timestamps (and nothing 
> else).
>
> I haven't followed the "coccinelle script" development at all, if someone
> makes a patch do replace "unsigned long" with size_t, that could replace
> my whole patch. (Some of them may be downgraded to "unsigned int" ?)

This paragraph makes it sound as if this patch is s/ulong/size_t/,
but that contradicts with the previous paragraph, no?  It is much
better to leave a ulong that is not about the size of a memory
region as-is, to be turned into appropriate and correct type later,
than changing it into another wrong type (size_t).

In short, we could do ulong to size_t with Coccinelle, but I do not
think we want to blindly rewrite all.



Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-19 Thread René Scharfe
Am 19.11.2018 um 06:33 schrieb Torsten Bögershausen:
> The archive-tar.c is actually a good example, why a step-by-step update
> is not ideal (the code would not work any more on Win64).
> 
> If we look here:
> 
> static int stream_blocked(const struct object_id *oid)
> {
>   struct git_istream *st;
>   enum object_type type;
>   size_t sz;
>   char buf[BLOCKSIZE];
>   ssize_t readlen;
> 
>   st = open_istream(oid, , , NULL);
>   ^
>   if (!st)
>   return error(_("cannot stream blob %s"), oid_to_hex(oid));
>   for (;;) {
> 
> The sz variable must follow whatever open_istream() uses, so if we start
> with archive-tar.c, we must use either size_t or ulong, whatever
> open_istream() needs. Otherwise things will break:
> archive-tar.c uses ulong, open_istream() size_t, but we are passing pointers
> around, and here  != _t
> 
> If we only update open_istream(), but not archive-tar.c, then
> things are not better:
> _t != 
> 
> I don't have a good idea how to split the patch.

sz is not actually used later in that function; this change can
be done independently of any other ulong/size_t conversion in that
file.

Hmm, looking at that call I wonder why open_istream() doesn't return
type and size in the struct git_istream.  To make it match
read_object_file(), I suppose.  Perhaps it's an opportunity to
improve its interface?

René


Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-19 Thread Torsten Bögershausen
On Sun, Nov 18, 2018 at 03:18:52PM -0500, Derrick Stolee wrote:
> On 11/17/2018 10:11 AM, tbo...@web.de wrote:
> >From: Torsten Bögershausen 
> >
> >Currently Git users can not commit files >4Gib under 64 bit Windows,
> >where "long" is 32 bit but size_t is 64 bit.
> >Improve the code base in small steps, as small as possible.
> >What started with a small patch to replace "unsigned long" with size_t
> >in one file (convert.c) ended up with a change in many files.
> >
> >Signed-off-by: Torsten Bögershausen 
> >---
> >
> >This needs to go on top of pu, to cover all the good stuff
> >   cooking here.
> 
> Better to work on top of 'master', as the work in 'pu' will be rewritten
> several times, probably.
> 
> >I have started this series on November 1st, since that 2 or 3 rebases
> >   had been done to catch up, and now it is on pu from November 15.
> >
> >I couldn't find a reason why changing "unsigned ling"
> >   into "size_t" may break anything, any thoughts, please ?
> 
> IIRC, the blocker for why we haven't done this already is that "size_t",
> "timestamp_t" and "off_t" are all 64-bit types that give more implied
> meaning, so we should swap types specifically to these as we want. One
> example series does a specific, small change [1].
> 
> If we wanted to do a single swap that removes 'unsigned long' with an
> unambiguously 64-bit type, I would recommend "uint64_t". Later replacements
> to size_t, off_t, and timestamp_t could happen on a case-by-case basis for
> type clarity.
> 
> It may benefit reviewers to split this change into multiple patches,
> starting at the lowest levels of the call stack (so higher 'unsigned long's
> can up-cast to the 64-bit types when calling a function) and focusing the
> changes to one or two files at a time (X.c and X.h, preferrably).
> 
> Since you are talking about the benefits for Git for Windows to accept 4GB
> files, I wonder if we can add a test that verifies such a file will work. If
> you have such a test, then I could help verify that the test fails before
> the change and succeeds afterward.
> 
> Finally, it may be good to add a coccinelle script that replaces 'unsigned
> long' with 'uint64_t' so we can easily fix any new introductions that happen
> in the future.

The plan was never to change "unsigned long" to a 64 bit value in general.
The usage of "unsigned long" (instead of size_t) was (and is) still good
for 32bit systems, where both are 32 bit. (at least all system I am aware of).

For 64 bit systems like Linux or Mac OS it is the same, both are 64 bit.

The only problematic system is Win64, where "unsigned long" is 32 bit,
and therefore we must use size_t to address data in memory.
This is not to be confused with off_t, which is used for "data on disk"
(and nothing else) or timestamp_t which is used for timestamps (and nothing 
else).

I haven't followed the "coccinelle script" development at all, if someone
makes a patch do replace "unsigned long" with size_t, that could replace
my whole patch. (Some of them may be downgraded to "unsigned int" ?)

However, as we need to let  tb/print-size-t-with-uintmax-format make it to
master, otherwise we are not able to print the variables in a portable way. 


> Thanks! I do think we should make this change, but we must be careful. It
> may be disruptive to topics in flight.
> 
> -Stolee
> 
> [1] https://public-inbox.org/git/20181112084031.11769-1-care...@gmail.com/
> 


Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-18 Thread Torsten Bögershausen
On 2018-11-19 00:40, Junio C Hamano wrote:
> Derrick Stolee  writes:
> 
>>> This needs to go on top of pu, to cover all the good stuff
>>>cooking here.
>>
>> Better to work on top of 'master', as the work in 'pu' will be
>> rewritten several times, probably.
> 
> We may not be able to find any good moment to update some codepaths
> with deep callchains that reaches a basic API function that take
> ulong that way, as things are always in motion, but hopefully a lot
> of areas that need changes are rather isolated.  
> 
> For example, the changes I see around "offset" (which is "ulong" and
> the patch wants to change it to "size_t") in archive-tar.c in the
> patch do not have any interaction with the changes in this patch
> outside that single file, and I do not think any topic in-flight
> would interact with this change badly, either.  I didn't carefully
> look at the remainder of the patches, but I have a feeling that many
> can be separated out into independent and focused set of smaller
> patches that can be evaluated on their own.
> 

The archive-tar.c is actually a good example, why a step-by-step update
is not ideal (the code would not work any more on Win64).

If we look here:

static int stream_blocked(const struct object_id *oid)
{
struct git_istream *st;
enum object_type type;
size_t sz;
char buf[BLOCKSIZE];
ssize_t readlen;

st = open_istream(oid, , , NULL);
  ^
if (!st)
return error(_("cannot stream blob %s"), oid_to_hex(oid));
for (;;) {

The sz variable must follow whatever open_istream() uses, so if we start
with archive-tar.c, we must use either size_t or ulong, whatever
open_istream() needs. Otherwise things will break:
archive-tar.c uses ulong, open_istream() size_t, but we are passing pointers
around, and here  != _t

If we only update open_istream(), but not archive-tar.c, then
things are not better:
_t != 

I don't have a good idea how to split the patch.
However, "add a coccinelle script" may be a solution.


Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-18 Thread Junio C Hamano
Derrick Stolee  writes:

>> This needs to go on top of pu, to cover all the good stuff
>>cooking here.
>
> Better to work on top of 'master', as the work in 'pu' will be
> rewritten several times, probably.

We may not be able to find any good moment to update some codepaths
with deep callchains that reaches a basic API function that take
ulong that way, as things are always in motion, but hopefully a lot
of areas that need changes are rather isolated.  

For example, the changes I see around "offset" (which is "ulong" and
the patch wants to change it to "size_t") in archive-tar.c in the
patch do not have any interaction with the changes in this patch
outside that single file, and I do not think any topic in-flight
would interact with this change badly, either.  I didn't carefully
look at the remainder of the patches, but I have a feeling that many
can be separated out into independent and focused set of smaller
patches that can be evaluated on their own.



Re: [PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-18 Thread Derrick Stolee

On 11/17/2018 10:11 AM, tbo...@web.de wrote:

From: Torsten Bögershausen 

Currently Git users can not commit files >4Gib under 64 bit Windows,
where "long" is 32 bit but size_t is 64 bit.
Improve the code base in small steps, as small as possible.
What started with a small patch to replace "unsigned long" with size_t
in one file (convert.c) ended up with a change in many files.

Signed-off-by: Torsten Bögershausen 
---

This needs to go on top of pu, to cover all the good stuff
   cooking here.


Better to work on top of 'master', as the work in 'pu' will be rewritten 
several times, probably.



I have started this series on November 1st, since that 2 or 3 rebases
   had been done to catch up, and now it is on pu from November 15.

I couldn't find a reason why changing "unsigned ling"
   into "size_t" may break anything, any thoughts, please ?


IIRC, the blocker for why we haven't done this already is that "size_t", 
"timestamp_t" and "off_t" are all 64-bit types that give more implied 
meaning, so we should swap types specifically to these as we want. One 
example series does a specific, small change [1].


If we wanted to do a single swap that removes 'unsigned long' with an 
unambiguously 64-bit type, I would recommend "uint64_t". Later 
replacements to size_t, off_t, and timestamp_t could happen on a 
case-by-case basis for type clarity.


It may benefit reviewers to split this change into multiple patches, 
starting at the lowest levels of the call stack (so higher 'unsigned 
long's can up-cast to the 64-bit types when calling a function) and 
focusing the changes to one or two files at a time (X.c and X.h, 
preferrably).


Since you are talking about the benefits for Git for Windows to accept 
4GB files, I wonder if we can add a test that verifies such a file will 
work. If you have such a test, then I could help verify that the test 
fails before the change and succeeds afterward.


Finally, it may be good to add a coccinelle script that replaces 
'unsigned long' with 'uint64_t' so we can easily fix any new 
introductions that happen in the future.


Thanks! I do think we should make this change, but we must be careful. 
It may be disruptive to topics in flight.


-Stolee

[1] https://public-inbox.org/git/20181112084031.11769-1-care...@gmail.com/



[PATCH/RFC v1 1/1] Use size_t instead of unsigned long

2018-11-17 Thread tboegi
From: Torsten Bögershausen 

Currently Git users can not commit files >4Gib under 64 bit Windows,
where "long" is 32 bit but size_t is 64 bit.
Improve the code base in small steps, as small as possible.
What started with a small patch to replace "unsigned long" with size_t
in one file (convert.c) ended up with a change in many files.

Signed-off-by: Torsten Bögershausen 
---

This needs to go on top of pu, to cover all the good stuff
  cooking here.
I have started this series on November 1st, since that 2 or 3 rebases
  had been done to catch up, and now it is on pu from November 15.

I couldn't find a reason why changing "unsigned ling"
  into "size_t" may break anything, any thoughts, please ?
Side question: One thing I wondered about is why Git creates a conflict
like this, using git cherry-pick:
<<< HEAD
unsigned long size;
void *data = read_object_file(oid, , );
===
size_t size;
void *data = repo_read_object_file(the_repository, oid, ,
   );
>>> 3ee0abef4c... Use size_t instead of unsigned long

One commit changed "unsigned long size" into "size_t size",
the other commit swapped repo_read_object_file() with read_object_file().
Both changed are on different lines, but Git sees a conflict here.

 apply.c  | 14 -
 archive-tar.c| 18 +--
 archive-zip.c|  2 +-
 archive.c|  2 +-
 archive.h|  2 +-
 bisect.c |  2 +-
 blame.c  |  6 ++--
 blame.h  |  2 +-
 builtin/cat-file.c   | 10 +++---
 builtin/difftool.c   |  3 +-
 builtin/fast-export.c|  6 ++--
 builtin/fmt-merge-msg.c  |  4 ++-
 builtin/fsck.c   |  6 ++--
 builtin/grep.c   |  8 ++---
 builtin/index-pack.c | 27 
 builtin/log.c|  4 +--
 builtin/ls-tree.c|  2 +-
 builtin/merge-tree.c |  6 ++--
 builtin/mktag.c  |  5 +--
 builtin/notes.c  |  6 ++--
 builtin/pack-objects.c   | 56 +-
 builtin/reflog.c |  2 +-
 builtin/replace.c|  2 +-
 builtin/tag.c|  4 +--
 builtin/unpack-file.c|  2 +-
 builtin/unpack-objects.c | 35 ++---
 builtin/verify-commit.c  |  4 +--
 bundle.c |  2 +-
 cache.h  | 10 +++---
 combine-diff.c   | 11 ---
 commit.c | 22 +++---
 commit.h | 10 +++---
 config.c |  2 +-
 convert.c| 18 +--
 delta.h  | 20 ++--
 diff-delta.c |  4 +--
 diff.c   | 30 +-
 diff.h   |  2 +-
 diffcore-pickaxe.c   |  4 +--
 diffcore.h   |  2 +-
 dir.c|  6 ++--
 dir.h|  2 +-
 entry.c  |  4 +--
 fast-import.c| 26 
 fsck.c   | 12 
 fsck.h   |  2 +-
 fuzz-pack-headers.c  |  4 +--
 grep.h   |  2 +-
 http-push.c  |  2 +-
 list-objects-filter.c|  2 +-
 mailmap.c|  2 +-
 match-trees.c|  4 +--
 merge-blobs.c|  6 ++--
 merge-blobs.h|  2 +-
 merge-recursive.c|  4 +--
 notes-cache.c|  2 +-
 notes-merge.c|  4 +--
 notes.c  |  6 ++--
 object-store.h   | 20 ++--
 object.c |  4 +--
 object.h |  2 +-
 pack-check.c |  2 +-
 pack-objects.h   | 14 -
 pack.h   |  2 +-
 packfile.c   | 40 
 packfile.h   |  8 ++---
 patch-delta.c|  8 ++---
 range-diff.c |  2 +-
 read-cache.c | 48 ++---
 ref-filter.c | 30 +-
 remote-testsvn.c |  4 +--
 rerere.c |  2 +-
 sha1-file.c  | 66 
 sha1dc_git.c |  2 +-
 sha1dc_git.h |  2 +-
 streaming.c  | 12 
 streaming.h  |  2 +-
 submodule-config.c   |  2 +-
 t/helper/test-delta.c|  2 +-
 tag.c|  6 ++--
 tag.h|  2 +-
 tree-walk.c  | 14 -
 tree.c   |  2 +-
 xdiff-interface.c|  4 +--
 xdiff-interface.h|  4 +--
 85 files changed, 391 insertions(+), 384 deletions(-)

diff --git a/apply.c b/apply.c
index 3703bfc8d0..5e11b85d17 100644
--- a/apply.c
+++ b/apply.c
@@ -3096,7 +3096,7 @@ static int apply_binary_fragment(struct apply_state 
*state,
 struct patch *patch)
 {
struct fragment *fragment = patch->fragments;
-   unsigned long len;
+   size_t len;
void *dst;