Re: [PATCH v1 0/2] Fix default macOS build locally and on Travis CI

2016-11-09 Thread Torsten Bögershausen


On 10/11/16 00:39, Junio C Hamano wrote:


I've followed what was available at the public-inbox archive, but it
is unclear what the conclusion was.

For the first one your "how about" non-patch, to which Peff said
"that's simple and good", looked good to me as well, but is it
available as a final patch that I can just take and apply (otherwise
I think I can do the munging myself, but I'd rather be spoon-fed
when able ;-).


If you can munge the Peff version that may be easiest ?
I couldn't find a branch in your repo, but am happy to review
whatever shows up there and in pu.



Re: [PATCH v1 1/2] config.mak.in: set NO_OPENSSL and APPLE_COMMON_CRYPTO for macOS >10.11

2016-11-09 Thread Torsten Bögershausen
On 09.11.16 10:29, Lars Schneider wrote:
> 
>> On 09 Nov 2016, at 09:18, Torsten Bögershausen  wrote:
>>
>> On 07.11.16 18:26, Jeff King wrote:
>>> On Sun, Nov 06, 2016 at 08:35:04PM +0100, Lars Schneider wrote:
>>>
>>>> Good point. I think I found an even easier way to achieve the same.
>>>> What do you think about the patch below?
>>>>
>>>> [...]
>>>>
>>>> diff --git a/Makefile b/Makefile
>>>> index 9d6c245..f53fcc9 100644
>>>> --- a/Makefile
>>>> +++ b/Makefile
>>>> @@ -1047,6 +1047,7 @@ ifeq ($(uname_S),Darwin)
>>>>endif
>>>>endif
>>>>ifndef NO_APPLE_COMMON_CRYPTO
>>>> +  NO_OPENSSL = YesPlease
>>>>APPLE_COMMON_CRYPTO = YesPlease
>>>>COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
>>>>endif
>>>
>>> That is much simpler.
>> []
>> I don't know if that is a correct solution.
>>
>> If I have Mac OS 10.12 and Mac Ports installed, I may want to use
>> OPENSSL from Mac Ports.
> 
> Can't you define `NO_APPLE_COMMON_CRYPTO` in that case? 
> I think if you use OpenSSL then you don't need the Apple crypto lib, right?

After re-reading the Makefile: that makes sense :-)

Do you want to send a new patch ?

Feel free to omit
"Original-patch-by: Torsten Bögershausen "






Re: [PATCH v1 1/2] config.mak.in: set NO_OPENSSL and APPLE_COMMON_CRYPTO for macOS >10.11

2016-11-09 Thread Torsten Bögershausen
On 07.11.16 18:26, Jeff King wrote:
> On Sun, Nov 06, 2016 at 08:35:04PM +0100, Lars Schneider wrote:
> 
>> Good point. I think I found an even easier way to achieve the same.
>> What do you think about the patch below?
>>
>> [...]
>>
>> diff --git a/Makefile b/Makefile
>> index 9d6c245..f53fcc9 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1047,6 +1047,7 @@ ifeq ($(uname_S),Darwin)
>>  endif
>>  endif
>>  ifndef NO_APPLE_COMMON_CRYPTO
>> +NO_OPENSSL = YesPlease
>>  APPLE_COMMON_CRYPTO = YesPlease
>>  COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
>>  endif
> 
> That is much simpler.
[]
I don't know if that is a correct solution.

If I have Mac OS 10.12 and Mac Ports installed, I may want to use
OPENSSL from Mac Ports.
How about this:


diff --git a/Makefile b/Makefile
index ee89c06..e93511f 100644
--- a/Makefile
+++ b/Makefile
@@ -1038,17 +1038,22 @@ ifeq ($(uname_S),Darwin)
ifeq ($(shell test -d /sw/lib && echo y),y)
BASIC_CFLAGS += -I/sw/include
BASIC_LDFLAGS += -L/sw/lib
+   HAS_OPENSSL = Yes
endif
endif
ifndef NO_DARWIN_PORTS
ifeq ($(shell test -d /opt/local/lib && echo y),y)
BASIC_CFLAGS += -I/opt/local/include
BASIC_LDFLAGS += -L/opt/local/lib
+   HAS_OPENSSL = Yes
endif
endif
ifndef NO_APPLE_COMMON_CRYPTO
APPLE_COMMON_CRYPTO = YesPlease
COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
+   ifndef HAS_OPENSSL
+   NO_OPENSSL = YesPlease
+   endif
endif
NO_REGEX = YesPlease
PTHREAD_LIBS =



Re: What's cooking in git.git (Oct 2016, #09; Mon, 31)

2016-11-05 Thread Torsten Bögershausen
> * st/verify-tag (2016-10-10) 7 commits
>  - t/t7004-tag: Add --format specifier tests
>  - t/t7030-verify-tag: Add --format specifier tests
>  - builtin/tag: add --format argument for tag -v
>  - builtin/verify-tag: add --format to verify-tag
>  - tag: add format specifier to gpg_verify_tag
>  - ref-filter: add function to print single ref_array_item
>  - gpg-interface, tag: add GPG_VERIFY_QUIET flag
> 
>  "git tag" and "git verify-tag" learned to put GPG verification
>  status in their "--format=" output format.
> 
>  Waiting for a reroll.
>  cf. <20161007210721.20437-1-santi...@nyu.edu>

I don't know if this has been reported before:
Some tests needs to be protected by GPG:
+test_expect_success GPG 'verifying a proper tag with --format pass and format 
accordingly' 

test_expect_success GPG 'verifying a forged tag with --format fail and format 
accordingly'

test_expect_success GPG 'verifying a forged tag with --format fail and format 
accordingly'


Re: What's cooking in git.git (Oct 2016, #09; Mon, 31)

2016-11-05 Thread Torsten Bögershausen
On Thu, Nov 03, 2016 at 09:30:44AM -0400, Martin-Louis Bright wrote:
> I will see if I can find a OSX 10.6 system to test with, and I'll try with
> perl 5.10.
> 
> --Martin

No need to worry too much:

I have tested Peffs patch applied on next OK- 
And the integration into pu that came the 2nd Novevember is tested OK as well.

(And please everybody: avoid top-posting here)


Re: [PATCH 4/4] t0021: fix filehandle usage on older perl

2016-11-02 Thread Torsten Bögershausen

> +use IO::File;

That did the trick (the J6T version work as well)

Thanks for the fast responses.


Re: What's cooking in git.git (Oct 2016, #09; Mon, 31)

2016-11-02 Thread Torsten Bögershausen
> * ls/filter-process (2016-10-17) 14 commits
>   (merged to 'next' on 2016-10-19 at ffd0de042c)

Some (late, as I recently got a new battery for the Mac OS 10.6 test system) 
comments:
t0021 failes here:


Can't locate object method "flush" via package "IO::Handle" at 
/Users/tb/projects/git/git.next/t/t0021/rot13-filter.pl line 90.
fatal: The remote end hung up unexpectedly


perl itself is 5.10 and we use the one shipped with Mac OS.
Why that ?
t0021 uses the hard-coded path:
t0021/rot13-filter.pl (around line 345) and the nice macro
PERL_PATH from the Makefile is fully ignored.

Commenting out the different "flush" makes the test hang, and I haven't digged 
further.



Re: What's cooking in git.git (Oct 2016, #09; Mon, 31)

2016-11-02 Thread Torsten Bögershausen
> 
> * tb/convert-stream-check (2016-10-27) 2 commits
>  - convert.c: stream and fast search for binary
>  - read-cache: factor out get_sha1_from_index() helper
> 
>  End-of-line conversion sometimes needs to see if the current blob
>  in the index has NULs and CRs to base its decision.  We used to
>  always get a full statistics over the blob, but in many cases we
>  can return early when we have seen "enough" (e.g. if we see a
>  single NUL, the blob will be handled as binary).  The codepaths
>  have been optimized by using streaming interface.
> 
>  The tip seems to do too much in a single commit and may be better split.
>  cf. <20161012134724.28287-1-tbo...@web.de>
>  cf. 

Reviews have been done, thanks everybody.

It looks to be a good "proof of concept".

The current series is far away from being ready, so please kick it
out of pu and feel free to discard.
 

 


Re: [PATCH v2 2/2] convert.c: stream and fast search for binary

2016-11-01 Thread Torsten Bögershausen
[]
> This probably should be done as four more patches to become
> reviewable.
> 
>  - One to use the CONVERT_STAT_BITS a lot more for the conversion
>decision than before, 
> 
>  - another to allow the caller to tell gather_stats() to give up
>early with the "search_only" bits, 
> 
>  - another to update the get_*_convert_stats() functions to use
>get_convert_stats_sha1(), and then finally 
> 
>  - use the streaming interface when reading from blob and file.
> 
> or something line that.

Many thanks for the detailed review. Let's see if I can come up
with a better series the next weeks or so.



Re: [PATCH] rev-list: restore the NUL commit separator in --header mode

2016-10-19 Thread Torsten Bögershausen

diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh

index 3e752ce..a2acff3 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -100,4 +100,11 @@ test_expect_success '--bisect and --first-parent can not 
be combined' '
test_must_fail git rev-list --bisect --first-parent HEAD
  '
  
+test_expect_success '--header shows a NUL after each commit' '

+   touch expect &&
+   printf "\0" > expect &&

Micronit: No ' ' after '>'
(And why do we need the touch ?)

+   printf "\0" >expect &&



+   git rev-list --header --max-count=1 HEAD | tail -n1 >actual &&
+   test_cmp_bin expect actual
+'
+
  test_done




Re: [RFC] Case insensitive Git attributes

2016-10-16 Thread Torsten Bögershausen



On 17/10/16 05:07, Stefan Beller wrote:

On Sun, Oct 16, 2016 at 6:04 PM, Lars Schneider
 wrote:

Hi,

Git attributes for path names are generally case sensitive. However, on
a case insensitive file system (e.g. macOS/Windows) they appear to be
case insensitive (`*.bar` would match `foo.bar` and `foo.BAR`).

This feels like a bug:

$ git diff
diff --git a/.gitattributes b/.gitattributes
index 5e98806..1419867 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,3 +1,4 @@
 * whitespace=!indent,trail,space
 *.[ch] whitespace=indent,trail,space
 *.sh whitespace=indent,trail,space
+*.C text

---
$ git -c core.ignorecase=false check-attr --all git.c
git.c: whitespace: indent,trail,space

#But running on a case insensitve FS I get:
$ git  check-attr --all git.c
git.c: text: set
git.c: whitespace: indent,trail,space



That
works great until a Git users joins the party with a case sensitive file
system. For this Git user only files that match the exact case of the
attribute pattern get the attributes (only `foo.bar`).

This inconsistent behavior can confuse Git users. An advanced Git user
could use a glob pattern (e.g. `*.[bB][aA][rR]) to match files in a
case insensitive way. However, this can get confusing quickly, too.

I wonder if we can do something about this. One idea could be to add an
attribute "case-sensitive" (or "caseSensitive") and set it to false
(if desired) for all files in .gitattributes for a given repo.

FYI: I am currently refactoring the attr subsystem (e.g.
https://public-inbox.org/git/20161012224109.23410-1-sbel...@google.com/
"attr: convert to new threadsafe API")


### .gitattributes example ###

* case-sensitive=false

How about
* ignorecase=true
 ?


Would this modify the current file only or the whole stack of attrs?
(In just one way or the whole stack, i.e. can you add this in .git/info/exclude
and the attribute file in the home dir also behaves differently? Or rather the
other way round when the system wide attr file enables case insensitivity,
each repository local config is set automatically? both ways?)


*.bar something

###

I haven't looked into the feasibility of an implementation, yet. However,
would that be an acceptable approach?

Conceptually I would prefer if we had a single switch that indicates a
case insensitive FS. That could be used for different purposes as well,
that are FS relevant such as checking in, checking out/renaming files
in the working tree? (does any such switch already exist for case
sensitivity?)

Thanks,
Stefan


Thanks,
Lars







Re: [PATCH v3 07/25] sequencer: completely revamp the "todo" script parsing

2016-10-15 Thread Torsten Bögershausen
On 15.10.16 19:19, Jeff King wrote:
> On Sat, Oct 15, 2016 at 07:03:46PM +0200, Torsten Bögershausen wrote:
> 
>> sequencer.c:633:14: warning: comparison of constant 2 with expression of 
>> type 'const enum todo_command' is always true 
>> [-Wtautological-constant-out-of-range-compare]
>> if (command < ARRAY_SIZE(todo_command_strings))
>> ~~~ ^ 
>> 1 warning generated.
>>
>> 53f8024e (Johannes Schindelin   2016-10-10 19:25:07 +0200  633) if 
>> (command < ARRAY_SIZE(todo_command_strings))
>>
> 
> Interesting. The compiler is right that this _should_ never happen, but
> I think the patch is quite reasonable to be defensive in case the enum
> happens to get a value outside of its acceptable range (which is
> probably undefined behavior, but...).
> 
> I wonder if:
> 
>   if ((int)command < ARRAY_SIZE(todo_command_strings))
> 
> silences the warning (I suppose size_t is probably an even better type,
> though obviously it does not matter in practice).
> 
> -Peff
> 
Both do (silence the warning)

enum may be signed or unsigned, right ?
So the size_t variant seams to be a better choice





Re: [PATCH v3 07/25] sequencer: completely revamp the "todo" script parsing

2016-10-15 Thread Torsten Bögershausen

Not sure is this has been reported before:


sequencer.c:633:14: warning: comparison of constant 2 with expression of type 
'const enum todo_command' is always true 
[-Wtautological-constant-out-of-range-compare]
if (command < ARRAY_SIZE(todo_command_strings))
~~~ ^ 
1 warning generated.


53f8024e (Johannes Schindelin   2016-10-10 19:25:07 +0200  633) if 
(command < ARRAY_SIZE(todo_command_strings))



Re: 2 directories same spelling one directory is camel cased

2016-10-13 Thread Torsten Bögershausen
On 12.10.16 18:05, David Brown wrote:
> Howdy git gurus,
> 
> I have the dubious distinction of working with a remote repo (master) that 
> has a class loader run-time error when cloned, built and executed.
> 
> The reason for the runtime issue is a directory hierarchical path has to 
> directories (folders) with the same name spelling but one of the directories 
> is camel-cased. The package names are the same.
> 
> The compiler doesn't care but the run-time class loader has an issue with the 
> 2 'same like named' classes.
> 
> How to remove the offending directories and files at the locally cloned repo 
> but not push 'deleted' directories and files back to origin/master the remote 
> repo?
> 
> Please advise.
> 
> Regards.
This email did not resolve from here: David Brown 

It is somewhat unclear, which issue the class loader has.
What does 
git ls-files | grep -i "sameNameBitDifferent"
say ?

What do you mean with 
"How to remove the offending directories" ?

Just run
rm -rf "offending directories" ?

Or, may be
git mv dir1 NewDir1

If you don't want to push, you don't have to, or what do I miss ?





Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-11 Thread Torsten Bögershausen
On Tue, Oct 11, 2016 at 10:11:22AM +0200, Lars Schneider wrote:
> 
> > On 10 Oct 2016, at 21:58, Junio C Hamano  wrote:
> > 
> > larsxschnei...@gmail.com writes:
> > 
> > [...]
> >> +# Count unique lines except clean invocations in two files and compare
> >> +# them. Clean invocations are not counted because their number can vary.
> >> +# c.f. 
> >> http://public-inbox.org/git/xmqqshv18i8i@gitster.mtv.corp.google.com/
> >> +test_cmp_count_except_clean () {
> >> +  for FILE in $@
> >> +  do
> >> +  sort $FILE | uniq -c | sed "s/^[ ]*//" |
> >> +  sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
> >> +  cat $FILE.tmp >$FILE
> >> +  done &&
> >> +  test_cmp $@
> >> +}
> > 
> > Why do you even _care_ about the number of invocations?  While I
> > told you why "clean" could be called multiple times under racy Git
> > as an example, that was not meant to be an exhaustive example.  I
> > wouldn't be surprised if we needed to run smudge twice, for example,
> > in some weirdly racy cases in the future.
> > 
> > Can we just have the correctness (i.e. "we expect that the working
> > tree file gets this as the result of checking it out, and we made
> > sure that is the case") test without getting into such an
> > implementation detail?
> 
> My goal is to check that clean/smudge is invoked at least once. I could
> just run `uniq` to achieve that but then all other filter commands could
> happen multiple times and the test would not detect that.
> 
> I also prefer to check the filter commands to ensure the filter is 
> working as expected (e.g. no multiple start ups etc) in addition to 
> checking the working tree.
> 
> Would the patch below work for you? If yes, then please squash it into
> "convert: add filter..process option".
> 
> Thank you,
> Lars
> 
> 
> 
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index 9f892c0..714f706 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -31,38 +31,33 @@ filter_git () {
>   rm -f git-stderr.log
>  }
>  
> -# Count unique lines in two files and compare them.
> -test_cmp_count () {
> - for FILE in $@
> - do
> - sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
> - cat $FILE.tmp >$FILE
> - done &&
> - test_cmp $@
> -}
> -
> -# Count unique lines except clean invocations in two files and compare
> -# them. Clean invocations are not counted because their number can vary.
> +# Compare two files and ensure that `clean` and `smudge` respectively are
> +# called at least once if specified in the `expect` file. The actual
> +# invocation count is not relevant because their number can vary.
>  # c.f. 
> http://public-inbox.org/git/xmqqshv18i8i@gitster.mtv.corp.google.com/
> -test_cmp_count_except_clean () {
> - for FILE in $@

> +test_cmp_count () {
> + expect=$1 actual=$2

That could be 
expect="$1"
actual="$2"

> + for FILE in "$expect" "$actual"
>   do

> + sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
> + sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
> + sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" 
> >"$FILE.tmp" &&
> + cat "$FILE.tmp" >"$FILE"

How about 
cp "$FILE.tmp" "$FILE"



Re: A head scratcher, clone results in modified files (tested linux and cygwin) - .gitattributes file?

2016-10-09 Thread Torsten Bögershausen



On 09/10/16 08:48, Jason Pyeron wrote:


The whole .gitattributes needs to be adopted, I think

Git 2.10 or higher has "git ls-files --eol":

git ls-files --eol   | grep "i/crlf.*auto"
i/crlf  w/crlf  attr/text=auto src/site/xdoc/upgradeto2_3.xml
i/crlf  w/crlf  attr/text=auto 
src/test/resources/org/apache/commons/io/FileUtilsTestDataCRLF.dat

i/crlf  w/crlf  attr/text=auto src/test/resources/test-file-gbk.bin
i/crlf  w/crlf  attr/text=auto 
src/test/resources/test-file-iso8859-1-shortlines-win-linebr.bin

i/crlf  w/crlf  attr/text=auto src/test/resources/test-file-utf8-win-linebr.bin
i/crlf  w/crlf  attr/text=auto src/test/resources/test-file-windows-31j.bin
i/crlf  w/crlf  attr/text=auto src/test/resources/test-file-x-windows-949.bin
i/crlf  w/crlf  attr/text=auto src/test/resources/test-file-x-windows-950.bin

Problem:
xml file had been commited  with CRLF : either normalize it or declare "-text".

The same is valid for the other files as well.
They are identified by auto as text, and commited with CRLF.
My feeling is that they should be declared as "-text".
Or, to be more compatible, with "-crlf":

Solution:
Make up your mind about the xml file and the html files.
If they are text, they need to be normalized.


Question:
What happens, if you do this:
# Auto detect text files and perform LF normalization
*crlf=auto

*.bin-crlf
*.dat-crlf
*.java   crlf diff=java
*.html   crlf diff=html
*.csscrlf
*.js crlf
*.sqlcrlf





Re: [PATCH v10 14/14] contrib/long-running-filter: add long running filter example

2016-10-08 Thread Torsten Bögershausen
On 08.10.16 13:25, larsxschnei...@gmail.com wrote:
> From: Lars Schneider 
> 
> Add a simple pass-thru filter as example implementation for the Git
> filter protocol version 2. See Documentation/gitattributes.txt, section
> "Filter Protocol" for more info.
> 

Nothing wrong with code in contrib.
I may have missed parts of the discussion, was there a good reason to
drop the test case completely?

>When adding a new feature, make sure that you have new tests to show
>the feature triggers the new behavior when it should, and to show the
>feature does not trigger when it shouldn't.  After any code change, make
>sure that the entire test suite passes.

Or is there a plan to add them later ?



Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-08 Thread Torsten Bögershausen
On 09.10.16 01:06, Jakub Narębski wrote:
>> +
>> > +packet:  git< status=abort
>> > +packet:  git< 
>> > +
>> > +
>> > +After the filter has processed a blob it is expected to wait for
>> > +the next "key=value" list containing a command. Git will close
>> > +the command pipe on exit. The filter is expected to detect EOF
>> > +and exit gracefully on its own.
> Any "kill filter" solutions should probably be put here.  I guess
> that filter exiting means EOF on its standard output when read
> by Git command, isn't it?
>
Isn't it that Git closes the command pipe, then filter sees EOF on it's stdin

and does a graceful exit.





Re: [PATCH] run-command: fix build on cygwin (stdin is a macro)

2016-10-05 Thread Torsten Bögershausen

>I am not suggesting that you apply this exact patch (stdin_ is not a good
choice

How about fd_stdin ?


Re: Impossible to change working directory

2016-10-01 Thread Torsten Bögershausen
On 29.09.16 21:30, Sebastian Feldmann wrote:
> Hi there,
>
> I have a problem executing a pre-commit hook.
> The hook script has to change the working directory to work and if I use plain
>
> git commit
>
> it works as expected, the script executes without errors, but if I use
>
> git commit —only file.x file.y
>
> the script fails because changing the current working directory fails.
> If I echo the current working directory it always echoes the root repository 
> path
>
> Is this expected behavior?
> Thanks for your feedback.
Is there any chance to send us the content of the script ?
(Or a demo example, which doesn't work)



Re: [PATCH v8 00/11] Git filter protocol

2016-09-29 Thread Torsten Bögershausen



On 29/09/16 19:57, Lars Schneider wrote:

On 29 Sep 2016, at 18:57, Junio C Hamano  wrote:

Torsten Bögershausen  writes:


1) Git exits
2) The filter process receives EOF and prints "STOP" to the log
3) t0021 checks the content of the log

Sometimes 3 happened before 2 which makes the test fail.
(Example: https://travis-ci.org/git/git/jobs/162660563 )

I added a this to wait until the filter process terminates:

+wait_for_filter_termination () {
+   while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 
2>&1
+   do
+   echo "Waiting for /t0021/rot13-filter.pl to finish..."
+   sleep 1
+   done
+}

Does this look OK to you?

Do we need the ps at all ?
How about this:

+wait_for_filter_termination () {
+   while ! grep "STOP"  LOGFILENAME >/dev/null
+   do
+   echo "Waiting for /t0021/rot13-filter.pl to finish..."
+   sleep 1
+   done
+}

Running "ps" and grepping for a command is not suitable for script
to reliably tell things, so it is out of question.  Compared to
that, your version looks slightly better, but what if the machinery
that being tested, i.e. the part that drives the filter process, is
buggy or becomes buggy and causes the filter process that writes
"STOP" to die before it actually writes that string?

I have a feeling that the machinery being tested needs to be fixed
so that the sequence is always be:

0) Git spawns the filter process, as it needs some contents to
   be filtered.

1) Git did everything it needed to do and decides that is time
   to go.

2) Filter process receives EOF and prints "STOP" to the log.

3) Git waits until the filter process finishes.

4) t0021, after Git finishes, checks the log.

Repeated sleep combined with grep is probably just sweeping the real
problem under the rug.  Do we have enough information to do the
above?

An inspiration may be in the way we centrally clean all tempfiles
and lockfiles before exiting.  We have a central registry of these
files that need cleaning up and have a single atexit(3) handler to
clean them up.  Perhaps we need a registry that filter processes
spawned by the mechanism Lars introduces in this series, and have an
atexit(3) handler that closes the pipe to them (which signals the
filters that it is time for them to go) and wait(2) on them, or
something?  I do not think we want any kill(2) to be involved in
this clean-up procedure, but I do think we should wait(2) on what we
spawn, as long as these processes are meant to be shut down when the
main process of Git exits (this is different from things like
credential-cache daemon where they are expected to persist and meant
to serve multiple Git processes).

We discussed that issue in v4 and v6:
http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/
http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/

My impression was that you don't want Git to wait for the filter process.
If Git waits for the filter process - how long should Git wait?

Thanks,
Lars


Hm,
I would agree that  Git should not wait for the filter.
But does the test suite need to wait for the filter ?
May be, in this case we test the filter and Git, which is good.
Adding a 1 second delay, if, and only if, there is a racy condition,
is not that bad (or do we have better ways to check for a process to
be terminated ?)




Re: [PATCH v8 00/11] Git filter protocol

2016-09-29 Thread Torsten Bögershausen



On 29/09/16 12:28, Lars Schneider wrote:

On 28 Sep 2016, at 23:49, Junio C Hamano  wrote:

I suspect that you are preparing a reroll already, but the one that
is sitting in 'pu' seems to be flaky in t/t0021 and I seem to see
occasional failures from it.

I didn't trace where the test goes wrong, but one easy mistake you
could make (I am not saying that is the reason of the failure) is to
assume your filter will not be called under certain condition (like
immediately after you checked out from the index to the working
tree), when the automated test goes fast enough and get you into a
"racy git" situation---the filter may be asked to filter the
contents from the working tree again to re-validate what's there is
still what is in the index.

Thanks for the heads-up!

This is what happens:

1) Git exits
2) The filter process receives EOF and prints "STOP" to the log
3) t0021 checks the content of the log

Sometimes 3 happened before 2 which makes the test fail.
(Example: https://travis-ci.org/git/git/jobs/162660563 )

I added a this to wait until the filter process terminates:

+wait_for_filter_termination () {
+   while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 
2>&1
+   do
+   echo "Waiting for /t0021/rot13-filter.pl to finish..."
+   sleep 1
+   done
+}

Does this look OK to you?

Do we need the ps at all ?
How about this:

+wait_for_filter_termination () {
+   while ! grep "STOP"  LOGFILENAME >/dev/null
+   do
+   echo "Waiting for /t0021/rot13-filter.pl to finish..."
+   sleep 1
+   done
+}




Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-28 Thread Torsten Bögershausen
On 15.09.16 22:04, Junio C Hamano wrote:
> Lars Schneider  writes:
> 
>> Wouldn't that complicate the pathname parsing on the filter side?
>> Can't we just define in our filter protocol documentation that our 
>> "pathname" packet _always_ has a trailing "\n"? That would mean the 
>> receiver would know a packet "pathname=ABC\n\n" encodes the path
>> "ABC\n" [1].
> 
> That's fine, too.  If you declare that pathname over the protocol is
> a binary thing, you can also define that the packet does not have
> the terminating \n, i.e. the example encodes the path "ABC\n\n",
> which is also OK ;-)
> 
> As long as the rule is clearly documented, easy for filter
> implementors to follow it, and hard for them to get it wrong, I'd be
> perfectly happy.
>

(Sorry for the late reply)

In V8 the additional "\n" is clearly documented.

On the long run,
I would suggest to be more clear what BINARY is:

--- a/Documentation/technical/protocol-common.txt
+++ b/Documentation/technical/protocol-common.txt
@@ -61,6 +61,9 @@ the length's hexadecimal representation.
 A pkt-line MAY contain binary data, so implementors MUST ensure
 pkt-line parsing/formatting routines are 8-bit clean.
 
+Each pkt-line that may contain ASCII control characters should
+be treated as binary.
+



Re: Gitattributes file is not respected when switching between branches

2016-09-18 Thread Torsten Bögershausen
On 16.09.16 08:51, Виталий Ищенко wrote:
> Sorry for delay.
> 
No problem about the delay.

(And please no top-posting)


If you say
> ".gitattributes" indeed is not present in "master", but this is intentionally
then nobody has (to my knowledge) thought about this situation/workflow yet.


The short version:
Git is designed to have the same .gitattributes in different branches.
At least not in the long run.

A typical use case is to create a repo, adjust the
.gitattributes and keep this in all branches.



 





Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-13 Thread Torsten Bögershausen
On 12.09.16 11:49, Lars Schneider wrote:

>> How do we send pathnames the have '\n' ?
>> Not really recommended, but allowed.
>> And here I am a little bit lost, is each of the lines packed into
>> a pkt-line ?
>> command=smudge is packet as pkt-line and pathname= is packed into
>> another one ? (The we don't need the '\n' at all)
> 
> Every line is a dedicated packet. That's why '\n' in a path name would
> not be a problem as the receiver is expected to read the entire packet
> when parsing the value (and the receiver knows the packet length, too).
> 
> The '\n' at the end is required by the pkt-line format:
> "A non-binary line SHOULD BE terminated by an LF..."
> (see protocol-common.txt)

That is only the half part of the story:
A non-binary line SHOULD BE terminated by an LF, which if present
MUST be included in the total length. Receivers MUST treat pkt-lines
with non-binary data the same whether or not they contain the trailing
LF (stripping the LF if present, and not complaining when it is
missing).


How do we treat pathnames ?
They can have each byte value except '\0'.
What should a receiver do, which reads a string like "ABC\n\n" ?
Is it "ABC\n" or "ABC\n\n" ?

I would really consider to treat pathnames as binary, and not add a trailing 
'\n',
are there other opinions ?
 







Re: Gitattributes file is not respected when switching between branches

2016-09-12 Thread Torsten Bögershausen
On 12.09.16 21:35, Torsten Bögershausen wrote:
> On 12.09.16 14:55, Виталий Ищенко wrote:
>> Good day
>>
>> I faced following issue with gitattributes file (at least eol setting)
>> when was trying to force `lf` mode on windows.
>>
>> We have 2 branches: master & dev. With master set as HEAD in repository
>>
>> I've added `.gitattributes` with following content to `dev` branch
>>
>> ```
>> * text eol=lf
>> ```
>>
>> Now when you clone this repo on other machine and checkout dev branch,
>> eol setting is not respected.
>> As a workaround you can rm all files except .git folder and do hard reset.
>>
>> Issue is reproducible on windows & unix versions. Test repo can be
>> found on github
>> https://github.com/betalb/gitattributes-issue
>>
>> master branch - one file without gitattributes
>> feature-branch - .gitattributes added with eol=lf
>> unix-feature-branch - .gitattributes added with eol=crlf
>>
>> Thanks,
>> Vitalii
> Some more information may be needed, to help to debug.
>
> Which version of Git are you using ?
> What does
>
> git ls-files --eol
>
> say ?
Obs, All information was in the email.

tb@xxx:/tmp/gitattributes-issue> git ls-files --eol
i/lfw/lfattr/   testfile-crlf.txt
tb@xxx:/tmp/gitattributes-issue> ls -al
total 8
drwxr-xr-x   4 tbwheel  136 Sep 12 21:38 .
drwxrwxrwt  19 root  wheel  646 Sep 12 21:38 ..
drwxr-xr-x  13 tbwheel  442 Sep 12 21:38 .git
-rw-r--r--   1 tbwheel   60 Sep 12 21:38 testfile-crlf.txt
tb@xxx:/tmp/gitattributes-issue>

Could it be that you didn't commit the file ".gitattributes" ?
This could help:
git add .gitattributes && git commit -m "Add .gitattributes"









Re: Gitattributes file is not respected when switching between branches

2016-09-12 Thread Torsten Bögershausen
On 12.09.16 14:55, Виталий Ищенко wrote:
> Good day
>
> I faced following issue with gitattributes file (at least eol setting)
> when was trying to force `lf` mode on windows.
>
> We have 2 branches: master & dev. With master set as HEAD in repository
>
> I've added `.gitattributes` with following content to `dev` branch
>
> ```
> * text eol=lf
> ```
>
> Now when you clone this repo on other machine and checkout dev branch,
> eol setting is not respected.
> As a workaround you can rm all files except .git folder and do hard reset.
>
> Issue is reproducible on windows & unix versions. Test repo can be
> found on github
> https://github.com/betalb/gitattributes-issue
>
> master branch - one file without gitattributes
> feature-branch - .gitattributes added with eol=lf
> unix-feature-branch - .gitattributes added with eol=crlf
>
> Thanks,
> Vitalii
Some more information may be needed, to help to debug.

Which version of Git are you using ?
What does

git ls-files --eol

say ?




Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-10 Thread Torsten Bögershausen
[]

One general question up here, more comments inline.
The current order for a clean-filter is like this, I removed the error handling:

int convert_to_git()
{
ret |= apply_filter(path, src, len, -1, dst, filter);
if (ret && dst) {
src = dst->buf;
len = dst->len;
}
ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
return ret | ident_to_git(path, src, len, dst, ca.ident);
}

The first step is the clean filter, the CRLF-LF conversion (if needed),
then ident.
The current implementation streams the whole file content to the filter,
(STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
This is to use the UNIX-like STDIN--STDOUT method for writing a filter.

However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
offer a sort of short-cut:
The filter reads from the file directly, and the output of the filter is
read into a STRBUF.

It looks as if the multi-filter approach can use this in a similar way:
Give the pathname to the filter, the filter opens the file for reading
and stream the result via the pkt-line protocol into Git.
This needs some more changes, and may be very well go into a separate patch
series. (and should).

What I am asking for:
When a multi-filter is used, the content is handled to the filter via pkt-line,
and the result is given to Git via pkt-line ?
Nothing wrong with it, I just wonder, if it should be mentioned somewhere.

> diff --git a/contrib/long-running-filter/example.pl 
> b/contrib/long-running-filter/example.pl
> new file mode 100755
> index 000..279fbfb
> --- /dev/null
> +++ b/contrib/long-running-filter/example.pl
> @@ -0,0 +1,111 @@
> +#!/usr/bin/perl
> +#
> +# Example implementation for the Git filter protocol version 2
> +# See Documentation/gitattributes.txt, section "Filter Protocol"
> +#
> +
> +use strict;
> +use warnings;
> +
> +my $MAX_PACKET_CONTENT_SIZE = 65516;
> +
> +sub packet_read {
> +my $buffer;
> +my $bytes_read = read STDIN, $buffer, 4;
> +if ( $bytes_read == 0 ) {
> +
> +# EOF - Git stopped talking to us!
> +exit();
> +}
> +elsif ( $bytes_read != 4 ) {
> +die "invalid packet size '$bytes_read' field";
> +}

This is half-kosher, I would say,
(And I really. really would like to see an implementation in C ;-)

A read function may look like this:

   ret = read(0, &buffer, 4);
   if (ret < 0) {
 /* Error, nothing we can do */
 exit(1);
   } else if (ret == 0) {
 /* EOF  */
 exit(0);
   } else if (ret < 4) {
 /* 
  * Go and read more, until we have 4 bytes or EOF or Error */
   } else {
 /* Good case, see below */
   }

> +my $pkt_size = hex($buffer);
> +if ( $pkt_size == 0 ) {
> +return ( 1, "" );
> +}
> +elsif ( $pkt_size > 4 ) {
> +my $content_size = $pkt_size - 4;
> +$bytes_read = read STDIN, $buffer, $content_size;
> +if ( $bytes_read != $content_size ) {
> +die "invalid packet ($content_size expected; $bytes_read read)";
> +}
> +return ( 0, $buffer );
> +}
> +else {
> +die "invalid packet size";
> +}
> +}
> +
> +sub packet_write {
> +my ($packet) = @_;
> +print STDOUT sprintf( "%04x", length($packet) + 4 );
> +print STDOUT $packet;
> +STDOUT->flush();
> +}
> +
> +sub packet_flush {
> +print STDOUT sprintf( "%04x", 0 );
> +STDOUT->flush();
> +}
> +
> +( packet_read() eq ( 0, "git-filter-client" ) ) || die "bad initialization";
> +( packet_read() eq ( 0, "version=2" ) ) || die "bad version";
> +( packet_read() eq ( 1, "" ) )  || die "bad version end";
> +
> +packet_write("git-filter-server\n");
> +packet_write("version=2\n");
> +
> +( packet_read() eq ( 0, "clean=true" ) )  || die "bad capability";
> +( packet_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
> +( packet_read() eq ( 1, "" ) )|| die "bad capability end";
> +
> +packet_write( "clean=true\n" );
> +packet_write( "smudge=true\n" );
> +packet_flush();
> +
> +while (1) {
> +my ($command) = packet_read() =~ /^command=([^=]+)\n$/;
> +my ($pathname) = packet_read() =~ /^pathname=([^=]+)\n$/;
> +
> +packet_read();
> +
> +my $input = "";
> +{
> +binmode(STDIN);
> +my $buffer;
> +my $done = 0;
> +while ( !$done ) {
> +( $done, $buffer ) = packet_read();
> +$input .= $buffer;
> +}
> +}
> +
> +my $output;
> +if ( $command eq "clean" ) {
> +### Perform clean here ###
> +$output = $input;
> +}
> +elsif ( $command eq "smudge" ) {
> +### Perform smudge here ###
> +$output = $input;
> +}
> +else {
> +die "bad command '$command'";
> +}
> +
> +packet_write("status=success\n");
> +packet_flush();
> +while ( length($output) > 0 ) {
> +my $packet = substr( $output, 0, $MAX_PACK

Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-09 Thread Torsten Bögershausen
On Thu, Sep 08, 2016 at 08:21:32PM +0200, larsxschnei...@gmail.com wrote:
[]
> +packet:  git> git-filter-client
> +packet:  git> version=2
> +packet:  git> version=42
> +packet:  git> 
> +packet:  git< git-filter-server
> +packet:  git< version=2
> +packet:  git> clean=true
> +packet:  git> smudge=true
> +packet:  git> not-yet-invented=true
> +packet:  git> 
> +packet:  git< clean=true
> +packet:  git< smudge=true
> +packet:  git< 

It's probalby only me who has difficulties to distinguish
'>' from '<'.

packet:  git> git-filter-client
packet:  git> version=2
packet:  git> version=42
packet:  git> 
packet:   filter> git-filter-server
packet:   filter> version=2

(Otherwise the dialoge description is nice)

> +
> +Supported filter capabilities in version 2 are "clean" and
> +"smudge".
> +
> +Afterwards Git sends a list of "key=value" pairs terminated with
> +a flush packet. The list will contain at least the filter command
> +(based on the supported capabilities) and the pathname of the file
> +to filter relative to the repository root. Right after these packets
> +Git sends the content split in zero or more pkt-line packets and a
> +flush packet to terminate content.
> +
> +packet:  git> command=smudge\n
> +packet:  git> pathname=path/testfile.dat\n

How do we send pathnames the have '\n' ?
Not really recommended, but allowed.
And here I am a little bit lost, is each of the lines packed into
a pkt-line ?
command=smudge is packet as pkt-line and pathname= is packed into
another one ? (The we don't need the '\n' at all)
Or go both lines into one pkt-line (thats what I think), then
we don't need the '\n' afther the pathname.
And in this case the pathname is always the last element, and a '\n'
may occur in the pathname, since we know the length of the packet
we know how long the pathname must be.



> +packet:  git> 
> +packet:  git> CONTENT
> +packet:  git> 
> +
> +


> +The filter is expected to respond with a list of "key=value" pairs
> +terminated with a flush packet. If the filter does not experience
> +problems then the list must contain a "success" status. Right after
> +these packets the filter is expected to send the content in zero
> +or more pkt-line packets and a flush packet at the end. Finally, a
> +second list of "key=value" pairs terminated with a flush packet
> +is expected. The filter can change the status in the second list.
> +
> +packet:  git< status=success\n
> +packet:  git< 
> +packet:  git< SMUDGED_CONTENT
> +packet:  git< 
> +packet:  git<   # empty list!
> +
> +
> +If the result content is empty then the filter is expected to respond
> +with a success status and an empty list.
> +
> +packet:  git< status=success\n
> +packet:  git< 
> +packet:  git<   # empty content!
> +packet:  git<   # empty list!
> +
> +
> +In case the filter cannot or does not want to process the content,

Does not want ? 
I can see something like "I read through the file, there is nothing
to do. So Git, I don't send anything back, you know where the file is.

> +it is expected to respond with an "error" status. Depending on the
> +`filter..required` flag Git will interpret that as error
> +but it will not stop or restart the filter process.
> +
> +packet:  git< status=error\n
> +packet:  git< 
> +
> +
> +If the filter experiences an error during processing, then it can
> +send the status "error" after the content was (partially or
> +completely) sent. Depending on the `filter..required` flag
> +Git will interpret that as error but it will not stop or restart the
> +filter process.
> +
> +packet:  git< status=success\n
> +packet:  git< 
> +packet:  git< HALF_WRITTEN_ERRONEOUS_CONTENT
> +packet:  git< 
> +packet:  git< status=error\n
> +packet:  git< 
> +
> +
> +If the filter dies during the communication or does not adhere to
> +the protocol then Git will stop the filter process and restart it

My personal comment:
When a filter is mis-behaving, Git should say so loud and clear, and
die(). 
The filter process can be left running, so that it can be debugged.

Here I stopped the review for a moment, 
I still think that Git shouldn't kill anything, because we loose
the ability to debug these processes.



Re: [PATCH v4 1/3] tests: move test_lazy_prereq JGIT to test-lib.sh

2016-09-09 Thread Torsten Bögershausen
On Fri, Sep 09, 2016 at 10:36:28AM -0700, Jonathan Tan wrote:
[]
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index d731d66..c9c1037 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1072,6 +1072,10 @@ test_lazy_prereq NOT_ROOT '
>   test "$uid" != 0
>  '
>  
> +test_lazy_prereq JGIT '
> + type jgit
> +'
> +

Minor note: 
Typically the stdout of `which` is suppressed like this:

if ! type cvs >/dev/null 2>&1


Re: How to simulate a real checkout to test a new smudge filter?

2016-09-06 Thread Torsten Bögershausen
On 06.09.16 19:47, john smith wrote:
> I am looking for a way to force smudge filter to run by simulating a
> real life checkout. Let's say I just created a new branch and did not
> modify any files but want to test my new smudge filter. According to
> some answers such as
> https://stackoverflow.com/questions/22909620/git-smudge-clean-filter-between-branches
> and
> https://stackoverflow.com/questions/21652242/git-re-checkout-files-after-creating-smudge-filter
> it should be possible by running:
>
> git checkout HEAD --
>
> but in doesn't work with git 2.9.0. Method suggested in accepted
> answer here
> https://stackoverflow.com/questions/17223527/how-do-i-force-git-to-checkout-the-master-branch-and-remove-carriage-returns-aft
> works but I don't like because it seems fragile. Is there a safe way
> to do what I want to do in Git still today?
>
It depends what you mean with "safe way".

git checkout, git checkout -f or other combinations will only

overwrite/rewrite the files in the working tree, if, and only if,

git comes to the conclusion that "git add" will do something,

like replace a blob for a file in the index.

(And by running "rm .git/index git will evaluate the "clean" filters,

and the CRLF->LF conversion).


If you want to test a smudge filter, simply remove the file:

mv file /tmp && git checkout file






Re: [PATCH v6 12/13] convert: add filter..process option

2016-08-30 Thread Torsten Bögershausen
On Tue, Aug 30, 2016 at 03:23:10PM -0700, Junio C Hamano wrote:
> Lars Schneider  writes:
> 
> > On 30 Aug 2016, at 20:59, Junio C Hamano  wrote:
> >> 
> >>> "abort" could be ambiguous because it could be read as "abort only
> >>> this file". "abort-all" would work, though. Would you prefer to see
> >>> "error" replaced by "abort" and "error-all" by "abort-all"?
> >> 
> >> No.
> >> 
> >> I was primarily reacting to "-all" part, so anything that ends with
> >> "-all" is equally ugly from my point of view and not an improvement.
> >> 
> >> As I said, "error-all" as long as other reviewers are happy with is
> >> OK by me, too.
> >
> > Now, I see your point. How about "error-and-stop" or "error-stop"
> > instead of "error-all"?
> 
> Not really.  I was primarily reacting to having "error" and
> "error-all", that share the same prefix.  Changing "-all" suffix to
> "-stop" does not make all that difference to me.  But in any case,
> as long as other reviewers are happy with it, it is OK by me, too.
> 
> > Good argument. However, my intention here was to mimic the v1 filter
> > mechanism.
> 
> I am not making any argument and in no way against the chosen
> behaviour.  That "intention here" that was revealed after two
> iterations is something we would want to see as the reasoning
> behind the choice of the final behaviour recorded somewhere,
> and now I drew it out of you, I achieved what I set out to do
> initially ;-)
> 
> > I am not sure I understand your last sentence. Just to be clear:
> > Would you prefer it, if Git would just close the pipe to the filter process
> > on Git exit and leave the filter running?
> 
> As a potential filter writer, I would probably feel it the easiest
> to handle if there is no signal involved.  My job would be to read
> from the command pipe, react to it, and when the command pipe gives
> me EOF, I am expected to exit myself.  That would be simple enough.
> 
> >> I meant it as primarily an example people can learn from when they
> >> want to write their own.
> >
> > I think `t/t0021/rot13-filter.pl` (part of this patch) serves this purpose 
> > already.
> 
> I would expect them to peek at contrib/, but do you seriously expect
> people to comb through t/ directory?

How about a filter written in C, and placed in ./t/helper/ ?
At least I feel that a filter in C-language could be a starting point
for others which prefer, depending on the task the filter is going to do,
to use shell scripts, perl, python or any other high-level language.

A test case, where data can not be filtered, would be a minimum.
As Jakub pointed out, you can use iconv with good and bad data.


Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-30 Thread Torsten Bögershausen

> 
> diff --git a/sha1_file.c b/sha1_file.c
> index d5e1121..759991e 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, 
> void *map,
>  
>  int git_open_noatime(const char *name)

Hm, should the function then be renamed into

git_open_noatime_cloexec()

>  {
> - static int sha1_file_open_flag = O_NOATIME;
> + static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
>  
>   for (;;) {
>   int fd;
> 
> 
> 


Re: [PATCH v1 0/3] Update eol documentation

2016-08-26 Thread Torsten Bögershausen



On 25/08/16 22:31, Junio C Hamano wrote:

[]



This [0/3] is meant to be a cover for [1/2] and [2/2]?

I am trying to see if we broke format-patch recently, or it is a
manual editing error.  The latter I do not care about; the former I
do.



No worry, 0/3 is patched by me by hand, sorry for the confusion.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: core.autocrlf, was Re: [git-for-windows] Re: [ANNOUNCE] Git for Windows 2.9.3

2016-08-25 Thread Torsten Bögershausen



I was not talking about the cost of correcting mistakes. Running --filters
is potentially very costly. Just so you understand what I am talking
about: I have a report that says that checking out a sizeable worktree
with core.autocrlf=true is 58% slower than with core.autocrlf=false. That
is horrible. []


Is this a public repo ?

Or is there a benchmark repo somewhere ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-svn bridge and line endings

2016-08-23 Thread Torsten Bögershausen
On 23.08.16 19:50, Lucian Smith wrote:
> On Tue, Aug 23, 2016 at 10:36 AM, Julian Phillips
>  wrote:
>> On 23/08/2016 17:14, Lucian Smith wrote:
>>>
>>> Thanks for the quick responses!
>>>
>>> My situation is that the git side is entirely whatever github.org is
>>> running; presumably the latest stable version?  They provide a URL for
>>> repositories hosted there that can be accessed by an SVN
>>> client--somewhere on github is the 'git-svn bridge' (as I understand
>>> it): something that receives SVN requests, translates them to
>>> git-speak, and replies with what SVN expects.
>>
>>
>> The ability to use a Subversion client is functionality provided by GitHub,
>> and not native to git itself.  So unless someone for the appropriate GitHub
>> team happens to read this thread I expect there isn't much we can do to
>> help.  I don't know if they've even provided any real detail of how they
>> implemented the bridge.
> 
> All right, that makes sense.  And yeah, it was hard to find any
> information about the bridge, which is why I ended up here...
> 
>>> There is indeed a .gitattributes file in the repository, but the SVN
>>> client doesn't know what to do with it.  My hope was that something in
>>> the bridge code, that translated SVN requests to git and back, could
>>> take the SVN request, "Please give me this file; I'm on Windows" look
>>> at the .gitattributes file in the repository, and hand out a file with
>>> CR/LF's in it.  Conversely, when SVN tells git "Here is the new
>>> version of the file to check in," the bridge could look at the file,
>>> realize it had CR/LF's in it, look at the .gitattributes file to know
>>> if it needed to be converted, and then convert it appropriately.
>>>
>>> I can imagine a full-blown bridge that could even translate the SVN
>>> EOL propset back and forth appropriately, but I'm not sure if going
>>> that far is necessary and/or helpful.
>>>
>>> I don't know if this is the right mailing list for that particular bit
>>> of software, but it at least seemed like a good place to start.  Thank
>>> you!
>>
>>
>> Supported properties are listed here:
>> https://help.github.com/articles/subversion-properties-supported-by-github/
>>
>> You'll need to ask GitHub to implement support for the svn:eol-style
>> property I expect.
> 
> Thanks for finding that!  That even has an 'ask a human' button, which
> I expect is my next step.
> 
>> Might be easier to just use Tortoise Git?
> 
> It may be!  But thanks for the responses anyway.
Most text-files have been commited with LF:
/tmp/ttt/sbml-test-suite> git ls-files --eol | grep "i/lf" | wc -l
   10266
Some have been commited with CRLF:
/tmp/ttt/sbml-test-suite> git ls-files --eol | grep "i/crlf" | wc -l
1620


The whole repo deserves to be normalized and equipped with a .gitattributes 
file,
see

https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-svn bridge and line endings

2016-08-23 Thread Torsten Bögershausen
On Mon, Aug 22, 2016 at 05:04:47PM -0700, Lucian Smith wrote:
> I'm attempting to use the git-svn bridge, and am having problems with
> line endings on Windows.
> 
> The setup is that we have a git repository on github, and I've checked
> out a branch on my Windows machine using Tortoise svn.  I make
> changes, commit them, and the branch is updated.  In general, this
> works fine.

Just to make sure:
The repo is in git format.
Is it a public repo ?
Or could you make a piblic demo repo ?
Do I understand it right: Tortoise SVN talks directly to the Git server ?
Isn't Tortoise SVN a client to talk to SVN server?
What goes over the wire to the remote Git server, git or SVN ?

To my understanding, "git svn" can use Git locally, and talk to an SVN server.
What do I miss ?

> 
> If this was just SVN, I could set the 'eol-style' for files to
> 'native' to let it know to expect Windows/linux/mac line endings for
> particular files.  This seems to be handled in git by using the
> '.gitattributes' file instead.  Unfortunately, the git/svn bridge
> doesn't seem to be translate the information in the .gitattributes
> file to appropriate eol-style settings in SVN.  Checking out a file
> using SVN on Windows leaves me with a file without CRLF's, and if I
> check in a CRLF file, that's the way it goes into the repository.
> Differences in CRLF alone show up as 'real' differences that can be
> checked in, and, if this happens, this causes problems with other
> people's repositories.
> 
> Am I doing something wrong; is there another way to handle this; or
> can I file this as a bug report/feature request?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] cat-file: introduce the --filters option

2016-08-22 Thread Torsten Bögershausen
On 19.08.16 17:00, Johannes Schindelin wrote:
> Hi Torsten,
>
> On Fri, 19 Aug 2016, Torsten Bögershausen wrote:
>
>> On Thu, Aug 18, 2016 at 02:46:17PM +0200, Johannes Schindelin wrote:
>>
>>> +--filters::
>>> +   Show the content as transformed by the filters configured in
>> Minor comment:
>> s/transformed/converted/ ?
> Sure.
>
>> Does it make sense to be more specific here:
>> The order of conversion is
>> - ident
>> - CRLF
>> - smudge
> I do not think it makes sense to complexify the documentation in that
> manner. The filters should always be applied in the same order, methinks,
> and it would only clutter the man page to repeat that order here.
Can we can shorten the description and have something like this:

--filters::
+   Show the content converted by the filters configured in
+   the current working tree for the given . 
+has to be of the form : or :.
+


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] cat-file: introduce the --filters option

2016-08-19 Thread Torsten Bögershausen
[]
On Thu, Aug 18, 2016 at 02:46:17PM +0200, Johannes Schindelin wrote:

> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 071029b..7d48735 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -9,15 +9,15 @@ git-cat-file - Provide content or type and size information 
> for repository objec
>  SYNOPSIS
>  
>  [verse]
> -'git cat-file' (-t [--allow-unknown-type]| -s [--allow-unknown-type]| -e | 
> -p |  | --textconv ) 
> +'git cat-file' (-t [--allow-unknown-type]| -s [--allow-unknown-type]| -e | 
> -p |  | --textconv | --filters ) 
>  'git cat-file' (--batch | --batch-check) [--follow-symlinks]
>  
>  DESCRIPTION
>  ---
>  In its first form, the command provides the content or the type of an object 
> in
>  the repository. The type is required unless `-t` or `-p` is used to find the
> -object type, or `-s` is used to find the object size, or `--textconv` is used
> -(which implies type "blob").
> +object type, or `-s` is used to find the object size, or `--textconv` or
> +`--filters` is used (which imply type "blob").
>  
>  In the second form, a list of objects (separated by linefeeds) is provided on
>  stdin, and the SHA-1, type, and size of each object is printed on stdout.
> @@ -58,6 +58,12 @@ OPTIONS
>   order to apply the filter to the content recorded in the index at
>   .
>  
> +--filters::
> + Show the content as transformed by the filters configured in
Minor comment:
s/transformed/converted/ ?

Does it make sense to be more specific here:
The order of conversion is
- ident
- CRLF
- smudge
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] cat-file: support --textconv/--filters in batch mode

2016-08-18 Thread Torsten Bögershausen
On Thu, Aug 18, 2016 at 02:46:28PM +0200, Johannes Schindelin wrote:
> With this patch, --batch can be combined with --textconv or --filters.
> For this to work, the input needs to have the form
> 
>   
> 
> so that the filters can be chosen appropriately.
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  Documentation/git-cat-file.txt | 18 +++-
>  builtin/cat-file.c | 49 
> +-
>  t/t8010-cat-file-filters.sh| 10 +
>  3 files changed, 67 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 59a3c37..1f4d954 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  
>  [verse]
>  'git cat-file' (-t [--allow-unknown-type]| -s [--allow-unknown-type]| -e | 
> -p |  | --textconv | --filters ) [--use-path=] 
> -'git cat-file' (--batch | --batch-check) [--follow-symlinks]
> +'git cat-file' (--batch | --batch-check) [ --textconv | --filters ] 
> [--follow-symlinks]
>  
>  DESCRIPTION
>  ---
> @@ -20,7 +20,11 @@ object type, or `-s` is used to find the object size, or 
> `--textconv` or
>  `--filters` is used (which imply type "blob").
>  
>  In the second form, a list of objects (separated by linefeeds) is provided on
> -stdin, and the SHA-1, type, and size of each object is printed on stdout.
> +stdin, and the SHA-1, type, and size of each object is printed on stdout. The
> +output format can be overridden using the optional `` argument. If
> +either `--textconv` or `--filters` was specified, the input is expected to
> +list the object names followed by the path name, separated by a single white
> +space, so that the appropriate drivers can be determined.
>  
>  OPTIONS
>  ---
> @@ -72,13 +76,17 @@ OPTIONS
>  --batch::
>  --batch=::
>   Print object information and contents for each object provided
> - on stdin.  May not be combined with any other options or arguments.
> - See the section `BATCH OUTPUT` below for details.
> + on stdin.  May not be combined with any other options or arguments
> + except `--textconv` or `--filters`, in which case the input lines
> + also need to specify the path, separated by white space.  See the
> + section `BATCH OUTPUT` below for details.
>  
>  --batch-check::
>  --batch-check=::
>   Print object information for each object provided on stdin.  May
> - not be combined with any other options or arguments.  See the
> + not be combined with any other options or arguments except
> + `--textconv` or `--filters`, in which case the input lines also
> + need to specify the path, separated by white space.  See the
>   section `BATCH OUTPUT` below for details.
>  
>  --batch-all-objects::
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5ff58b3..5f91cf4 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -17,6 +17,7 @@ struct batch_options {
>   int print_contents;
>   int buffer_output;
>   int all_objects;
> + int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
How do I read 'w' and 'c' ?
wilter and cextconv ? Does it make sense to use an enum here ?
Or a #define ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] cat-file: introduce the --filters option

2016-08-18 Thread Torsten Bögershausen
On Thu, Aug 18, 2016 at 02:46:17PM +0200, Johannes Schindelin wrote:
> As suggested by its name, the --filters option applies the filters

[]
> diff --git a/t/t8010-cat-file-filters.sh b/t/t8010-cat-file-filters.sh
Does it make sense to integrate tests into t1006-cat-file ?
> new file mode 100755
> index 000..e466634
> --- /dev/null
> +++ b/t/t8010-cat-file-filters.sh
> @@ -0,0 +1,34 @@
> +#!/bin/sh
> +
> +test_description='git cat-file filters support'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup ' '
> + echo "*.txt eol=crlf diff=txt" >.gitattributes &&
> + echo "hello" | append_cr >world.txt &&
> + git add .gitattributes world.txt &&
> + test_tick &&
> + git commit -m "Initial commit"
> +'
> +
> +has_cr () {
> + tr '\015' Q <"$1" | grep Q >/dev/null
> +}
> +
> +test_expect_success 'no filters with `git show`' '
> + git show HEAD:world.txt >actual &&
I would prefer to have something using
  cat >expect <<-\EOF &&
  xxx
  test_cmp expect actual
to make it easier to debug in case of a failure ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.10.0-rc0

2016-08-17 Thread Torsten Bögershausen
On 17.08.16 19:38, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
>> Torsten Bögershausen  writes:
>>
>>> Is this `text=auto` correct ?
>>
>> Thanks for spotting a typo.  Definitely.
>>
>>> I think it should be
>>>
>>>used to have the same effect as
>>>$ echo "* text eol=crlf" >.gitattributes
>>
>> Thanks.
>>
>>> # In other words, the `auto` was ignored, as explained here:
>>> +   $ git config core.eol crlf
>>> +   i.e. declaring all files are text; the combination now is
>>> +   equivalent to doing
>>> +   $ git config core.autocrlf true
>>> +
> 
> Just to make sure I got you right, how does this look?
> 
>  Documentation/RelNotes/2.10.0.txt | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/RelNotes/2.10.0.txt 
> b/Documentation/RelNotes/2.10.0.txt
> index 179e575..ccf5f64 100644
> --- a/Documentation/RelNotes/2.10.0.txt
> +++ b/Documentation/RelNotes/2.10.0.txt
> @@ -235,13 +235,13 @@ Performance, Internal Implementation, Development 
> Support etc.
>   * The API to iterate over all the refs (i.e. for_each_ref(), etc.)
> has been revamped.
>  
> - * The handling of the "text = auto" attribute has been updated.
> + * The handling of the "text=auto" attribute has been corrected.
> $ echo "* text=auto eol=crlf" >.gitattributes
> used to have the same effect as
> -   $ echo "* text=auto eol=crlf" >.gitattributes
> +   $ echo "* text eol=crlf" >.gitattributes
# Now we describe/define  the eol at checkouts twice.
# I think we can drop this line:
> $ git config core.eol crlf
# The rest looks good:
> -   i.e. declaring all files are text; the combination now is
> -   equivalent to doing
> +   i.e. declaring all files are text (ignoring "auto").  The
> +   combination has been fixed to be equivalent to doing
> $ git config core.autocrlf true
>  
>   * A few tests that specifically target "git rebase -i" have been
> 

# Just to be sure: the whole paragraph may look like this:

* The handling of the "text=auto" attribute has been corrected.
  $ echo "* text=auto eol=crlf" >.gitattributes
  used to have the same effect as
  $ echo "* text eol=crlf" >.gitattributes
  i.e. declaring all files are text (ignoring "auto").  The
  combination has been fixed to be equivalent to doing
  $ git config core.autocrlf true
















--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.10.0-rc0

2016-08-16 Thread Torsten Bögershausen
On Mon, Aug 15, 2016 at 08:42:48AM -0700, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
> 
> > On 15.08.16 00:47, Junio C Hamano wrote:
> >> Torsten Bögershausen (1):
> >>   convert: unify the "auto" handling of CRLF
> >
> > Should we mention this change in the release notes?
> >
> > The handling of "*.text = auto" was changed, and now
> >
> > $ echo "* text=auto eol=crlf" >.gitattributes
> > has the same effect as
> > $ git config core.autocrlf true
> 
> Oh, definitely.  Thanks.
Hm, one question remains:
 git show 07c92928f2b782330df6e78
[]
+ * The handling of the "text = auto" attribute has been updated.
+   $ echo "* text=auto eol=crlf" >.gitattributes
+   used to have the same effect as
+   $ echo "* text=auto eol=crlf" >.gitattributes
Is this `text=auto` correct ?
I think it should be

   used to have the same effect as
   $ echo "* text eol=crlf" >.gitattributes

# In other words, the `auto` was ignored, as explained here:
+   $ git config core.eol crlf
+   i.e. declaring all files are text; the combination now is
+   equivalent to doing
+   $ git config core.autocrlf true
+
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.10.0-rc0

2016-08-15 Thread Torsten Bögershausen
On 15.08.16 00:47, Junio C Hamano wrote:
> Torsten Bögershausen (1):
>   convert: unify the "auto" handling of CRLF

Should we mention this change in the release notes?

The handling of "*.text = auto" was changed, and now

$ echo "* text=auto eol=crlf" >.gitattributes
has the same effect as
$ git config core.autocrlf true


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 1/2] t0027: Correct raciness in NNO test

2016-08-14 Thread Torsten Bögershausen
On Sat, Aug 13, 2016 at 06:50:19PM +0200, Johannes Sixt wrote:
> Am 12.08.2016 um 18:51 schrieb tbo...@web.de:
> >From: Torsten Bögershausen 
> >
> >When a non-reversible CRLF conversion is done in "git add",
> >a warning is printed on stderr (or Git dies, depending on checksafe)
> >
> >The function commit_chk_wrnNNO() in t0027 was written to test this,
> >but did the wrong thing: Instead of looking at the warning
> >from "git add", it looked at the warning from "git commit".
> >
> >This is racy because "git commit" may not have to do CRLF conversion
> >at all if it can use the sha1 value from the index (which depends on
> >whether "add" and "commit" run in a single second).
> >
> >Correct this and replace the commit for each and every file with a commit
> >of all files in one go.
> 
> The new test code does not only fix the race condition, but also
> tests different things, or prepares test cases in a different
> manner. I would have appreciated an explanation why this is
> necessary. Is it "on my machine, the race condition was triggered
> consistently for a bunch of tests, and so I recorded the wrong
> expected output in the test cases"?
> 
Good point. The origanal intention was to let t0027 pass, and fix
the bug in convert.c in a later commit.
(and rename NNO in a 3rd commit, that is postponed until we figured this out).

Looking at t0027, it turns out that the changes in the test matrix done in 1/2
are reverted in 2/2.
This is not a good thing.
Either (a) they should be marked as test_expect_failure in 1/2 and
corrected in 2/2,
or (b) 1/2 and 2/2 are squashed together and the noise in t0027 is minimal.
I'll send a patch for (b) in a second.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t0027 racy?

2016-08-11 Thread Torsten Bögershausen
[]
> FWIW I would strongly prefer to use the warning of `git add` and not even
> bother with `git commit`. What we are interested in is the warning
> message, generated by convert_to_git(). 
The commit is needed, because we check the content of the commit later.
> Not using the first one and
[]
> 
> On that matter, I wonder whether there would be a chance to revamp t0027
> in a major way, with the following goals:
> 
> - to make it very obvious to the casual reader what is being tested
> 
> - to combine Git invocations when possible, e.g. running one big `git add`
>   on a couple of files and then verify the relevant parts of the output
> 
> - dramatically decreasing the time required to run the test, without
>   sacrificing correctness (I would wager a bet that not only a few of
>   those 1388 test cases essentially exercise identical code paths)
Good ideas, I will work on a series that fixes bugs first, and then we
can see if there is room for optimization.

What do you think about this as a starting point,
more things will follow.
I like to here comments about the commit msg first ;-)

commit 3754404d3d1ea4a0cbbed4986cc4ac1b5fe6b66e
Author: Torsten Bögershausen 
Date:   Thu Aug 11 18:47:29 2016 +0200

t0027: Correct raciness in NNO test

When a non-reversible CRLF conversion is done in "git add",
a warning is printed on stderr.

The commit_chk_wrnNNO() function  in t0027 was written to test this,
but did the wrong thing: Instead of looking at the warning
from "git add", it looked at the warning from "git commit".

Correct this and replace the commit for each and every file with a commit
of all files in one go.

The function commit_chk_wrnNNO() will to be renamed in a separate commit.
Thanks to Jeff King  for analizing t0027.
Reporyed-By: Johannes Schindelin 

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 2860d2d..6e44382 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -119,8 +119,7 @@ commit_chk_wrnNNO () {
fname=${pfx}_$f.txt &&
cp $f $fname &&
printf Z >>"$fname" &&
-   git -c core.autocrlf=$crlf add $fname 2>/dev/null &&
-   git -c core.autocrlf=$crlf commit -m "commit_$fname" $fname 
>"${pfx}_$f.err" 2>&1
+   git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err"
done
 
test_expect_success "commit NNO files crlf=$crlf attr=$attr LF" '
@@ -394,11 +393,10 @@ test_expect_success 'commit files attr=crlf' '
 
 # attrLFCRLF  CRLFmixLF   
LF_mix_CR   CRLFNUL
 commit_chk_wrnNNO ""  ""  false   """"""  ""   
   ""
-commit_chk_wrnNNO ""  ""  trueLF_CRLF   """"  ""   
   ""
+commit_chk_wrnNNO ""  ""  true""""""  ""   
   ""
 commit_chk_wrnNNO ""  ""  input   """"""  ""   
   ""
-
-commit_chk_wrnNNO "auto"  ""  false   "$WILC"   """"  ""   
   ""
-commit_chk_wrnNNO "auto"  ""  trueLF_CRLF   """"  ""   
   ""
+commit_chk_wrnNNO "auto"  ""  false   """"""  ""   
   ""
+commit_chk_wrnNNO "auto"  ""  true""""""  ""   
   ""
 commit_chk_wrnNNO "auto"  ""  input   """"""  ""   
   ""
 for crlf in true false input
 do
@@ -408,7 +406,7 @@ do
commit_chk_wrnNNO ""lf  $crlf   ""   CRLF_LFCRLF_LF 
 "" CRLF_LF
commit_chk_wrnNNO ""crlf$crlf   LF_CRLF   ""LF_CRLF 
LF_CRLF ""
commit_chk_wrnNNO auto  lf  $crlf   """"""  
""  ""
-   commit_chk_wrnNNO auto  crlf$crlf   LF_CRLF   """"  
""  ""
+   commit_chk_wrnNNO auto  crlf$crlf   """"""  
""  ""
commit_chk_wrnNNO text  lf  $crlf   ""   CRLF_LFCRLF_LF 
""  CRLF_LF
commit_chk_wrnNNO text  crlf$crlf   LF_CRLF   ""LF_CRLF 
LF_CRLF ""
 done
@@ -417,7 +415,8 @@ commit_chk_wrnNNO "text"  ""  false   "$WILC"   "$WICL" 
  "$WAMIX""$WILC
 commit_chk_wrnNNO "text"  ""  trueLF_CRLF   ""LF_CRLF 
LF_CRLF ""
 commit_chk_wrnNNO "text"  ""  input   ""CRLF_LF   CRLF_LF ""   
   CRLF_LF
 
-test_expect_success 'create files cleanup' '
+test_expect_success 'commit NNO and cleanup' '
+   git commit -m "commit files on top of NNO" &&
rm -f *.txt &&
git -c core.autocrlf=false reset --hard
 '
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t0027 racy?

2016-08-09 Thread Torsten Bögershausen
>   git -c core.autocrlf=$crlf add $fname >"${pfx}_$f.err" 2>&1
> 
> would make more sense. We _know_ that we have to do convert_to_git() in
> that step because the content is changed. And then you can ignore the
> warnings from "git commit" (which are racy), or you can simply commit as
> a whole later, as some other loops do.
> 
> But like Dscho, I do not actually understand what this test is checking.
> The function is called commit_chk_wrnNNO(), so perhaps you really are
> interested in what "commit" has to say. But IMHO that is not an
> interesting test. We know that if it has to read the content from disk,
> it will call convert_to_git(), which is the exact same code path used by
> "git add". So I do not understand what it is accomplishing to make a
> commit at all here.

It seems as if the test has been written without understanding the raciness.
It should commit files with different line endings on top of
a file with mixed line endings.
The warning should be checked (and here "git add" can be used,
or the file can be commited directly).
I'm not sure why the test ended up in doing both.

However, doing it the right way triggers a bug in convert.c,
(some warnings are missing, so I need some days to come up
with a proper patch)

Thanks for reporting & digging.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t0027 racy?

2016-08-09 Thread Torsten Bögershausen
On Tue, Aug 09, 2016 at 07:49:38AM -0400, Jeff King wrote:
> On Tue, Aug 09, 2016 at 11:33:37AM +0000, Torsten Bögershausen wrote:
> 
> > > The second one seems plausible, given the history of issues with
> > > changing CRLF settings for an existing checkout. I'm not sure if it
> > > would be feasible to reset the index completely before each tested
> > > command, but that would probably solve it.
> > The content of the file has been changed (we appended the letter 'Z' to it,
> > so even if mtime is the same, st.st_size should differ.
> > And it seems as if the commit is triggered, see below.
> 
> I don't think I made myself clear. It's not a question of whether there
> is something to commit. It's that when git asks the index "what is the
> sha1 of the content at this path?", the index may be able to answer
> directly (the file is up-to-date, so we return the index value), or it
> may have to go to the filesystem and read the file content. It is this
> latter which triggers convert_to_git(), which is what generates the
> message in question.
> 
> For a more stripped-down example, try:
> 
>   git add foo
>   git commit -m msg
> 
> versus:
> 
>   git add foo
>   sleep 1
>   git commit -m msg
> 
> In the latter case, we should not generally need convert_to_git() in the
> "commit" step. It was already done by "git add", and we reuse the cached
> result.
> 
> Whereas in the first one, we may run into the racy-index problem and
> have to re-read the file to be on the safe side.
> 
> -Peff

Thanks for the explanation, so there are 2 chances for a race.

I assume that the suggested "touch" will fix race#2 in most cases.
In my understanding, the change of the file size will be more reliable:
 

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 2860d2d..9933a9b 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -120,6 +120,7 @@ commit_chk_wrnNNO () {
cp $f $fname &&
printf Z >>"$fname" &&
git -c core.autocrlf=$crlf add $fname 2>/dev/null &&
+   printf Z >>"$fname" &&
git -c core.autocrlf=$crlf commit -m "commit_$fname" $fname 
>"${pfx}_$f.err" 2>&1
done
---
Does anybody agree ?
And, by the way, the convert warning may be issued twice, once in
"git add" and once in "git commit".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t0027 racy?

2016-08-09 Thread Torsten Bögershausen
On Tue, Aug 09, 2016 at 02:51:10AM -0400, Jeff King wrote:
> On Mon, Aug 08, 2016 at 08:32:24PM +0000, Torsten Bögershausen wrote:
> 
> > > The verbose output is not very exciting, though:
> > > 
> > >   expecting success: 
> > >   check_warning "$lfwarn" ${pfx}_LF.err
> > > 
> > >   --- NNO_attr_auto_aeol_crlf_false_LF.err.expect 2016-08-08 
> > > 15:26:37.061701392 +
> > >   +++ NNO_attr_auto_aeol_crlf_false_LF.err.actual 2016-08-08 
> > > 15:26:37.061701392 +
> > >   @@ -1 +0,0 @@
> > >   -warning: LF will be replaced by CRLF
> > >   not ok 114 - commit NNO files crlf=false attr=auto LF
> > [...]
> > The warning is missing, but should be there:
> > 
> > The file has LF, and after commit and a new checkout these LF will
> > be convertet into CRLF.
> > 
> > So why isn't the warning there (but here on my oldish machines)
> 
> To be clear, the warning _is_ there when I just run t0027 by itself, and
> the test passes.  It's only under heavy load that it isn't. So it's a
> race condition either in the test script or in git itself.
> 
> Usually race conditions like these are due to one of:
> 
>   - git dying from SIGPIPE before it has a chance to output the command.
> But I don't see any pipes being used in the test script.
> 
>   - index raciness causing us to avoid reading file content. For
> example, if you do:
> 
>   echo foo >bar
>   git add bar
> 
> Then _usually_ "bar" and the index will have the same mtime. And
> therefore subsequent commands that need to refresh the index will
> re-read the content of "bar", because they cannot tell from the stat
> information if we have the latest version of "bar" in the index or
> not (it could have been written after the index update, but in the
> same second).
> 
> But on a slow or heavily loaded system (or if you simply get unlucky
> in crossing the boundary to a new second), they'll have different
> mtimes. And therefore git knows it can skip reading the content from
> the filesystem.
> 
> So if your test relies on git actually re-converting the file
> content, it would sometimes randomly fail.
> 
> The second one seems plausible, given the history of issues with
> changing CRLF settings for an existing checkout. I'm not sure if it
> would be feasible to reset the index completely before each tested
> command, but that would probably solve it.
The content of the file has been changed (we appended the letter 'Z' to it,
so even if mtime is the same, st.st_size should differ.
And it seems as if the commit is triggered, see below.
> 

(I got the stress script working; no I can reproduce it nicely)

Is it possible to vote for a 3rd option ?
I added some more warnings, to have always some output in stderr. 

This is the good case, this test case passed:
cat NNO_attr_auto_aeol_crlf_true_LF.err 

warning: check_safe_crlf in NNO_attr_auto_aeol_crlf_true_LF.txt 2
warning: LF was here in NNO_attr_auto_aeol_crlf_true_LF.txt..
warning: LF will be replaced by CRLF in NNO_attr_auto_aeol_crlf_true_LF.txt.
The file will have its original line endings in your working directory.
warning: check_safe_crlf in NNO_attr__aeol_crlf_true_CRLF.txt 0
warning: check_safe_crlf in NNO_attr__aeol_crlf_true_CRLF_mix_LF.txt 0
warning: check_safe_crlf in NNO_attr__aeol_crlf_true_LF.txt 0
warning: check_safe_crlf in NNO_attr__aeol_lf_true_CRLF.txt 0
warning: check_safe_crlf in NNO_attr__aeol_lf_true_CRLF_mix_LF.txt 0
warning: check_safe_crlf in NNO_attr__aeol_lf_true_LF.txt 0
warning: check_safe_crlf in NNO_attr_auto_aeol_lf_true_LF.txt 0
[master 2decee0] commit_NNO_attr_auto_aeol_crlf_true_LF.txt
 Author: A U Thor 
 1 file changed, 2 insertions(+), 2 deletions(-)
---
And this one failed, the same code base, but a different file:
cat commit_NNO_attr_auto_aeol_crlf_input_LF.

[master ce2045a] commit_NNO_attr_auto_aeol_crlf_input_LF.txt
 Author: A U Thor 
 1 file changed, 2 insertions(+), 2 deletions(-)

---
Both had been generated with a patched convert.c:
git diff convert.c
index 67d69b5..ae64a58 100644
--- a/convert.c
+++ b/convert.c
@@ -191,6 +191,7 @@ static enum eol output_eol(enum crlf_action crlf_action)
 static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 struct text_stat *stats, enum safe_crlf checksafe)
 {
+   warning("check_safe_crlf in %s %d", path, (int)checksafe);
if (!checksafe)
return;
 
@@ -210,6 +211,7 @@ static void check_safe_crlf(const char *path, enum 
crlf_action crlf_action,
 * CRLFs would 

Re: t0027 racy?

2016-08-08 Thread Torsten Bögershausen
On Mon, Aug 08, 2016 at 11:29:26AM -0400, Jeff King wrote:
> On Mon, Aug 08, 2016 at 05:05:07PM +0200, Johannes Schindelin wrote:
> 
> > I remember that you did a ton of work on t0027. Now I see problems, and
> > not only that the entire script now takes a whopping 4 minutes 20 seconds
> > to run on my high-end Windows machine.
> > 
> > It appears that t0027 fails randomly for me, in seemingly random places.
> > Sometimes all 1388 cases pass. Sometimes "29 - commit NNO files crlf=true
> > attr=auto LF" fails. Sometimes it is "24 - commit NNO files crlf=false
> > attr=auto LF". Sometimes it is "114 - commit NNO files crlf=false
> > attr=auto LF", and sometimes "111 - commit NNO files attr=auto aeol=lf
> > crlf=false CRLF_mix_LF".
> > 
> > When I run it with -i -v -x --tee, it passes every single time (taking
> > over 5 minutes, just to make things worse)...
> > 
> > Any idea about any possible races?
> 
> Try:
> 
>   https://github.com/peff/git/blob/meta/stress
> 
> which you can run as "sh /path/to/stress t0027" in the top-level of your
> git repository. I got failure within about 30 seconds on t0027 (though 5
> minutes? Yeesh. It runs in 9s on my laptop. I weep for you).
> 
> The verbose output is not very exciting, though:
> 
>   expecting success: 
>   check_warning "$lfwarn" ${pfx}_LF.err
> 
>   --- NNO_attr_auto_aeol_crlf_false_LF.err.expect 2016-08-08 
> 15:26:37.061701392 +
>   +++ NNO_attr_auto_aeol_crlf_false_LF.err.actual 2016-08-08 
> 15:26:37.061701392 +
>   @@ -1 +0,0 @@
>   -warning: LF will be replaced by CRLF
>   not ok 114 - commit NNO files crlf=false attr=auto LF

(I realized that t0027 is not yet self-explaining, I have it on my list)

NNO_attr_auto_aeol_crlf_false_LF means:

NNO:   "Not NOrmalized" A file had been commited with CRLF in the repo
attr_auto: .gitattributes has "* text=auto"
aeol_eol   .gitattributes has "eol=crlf"
crlf_false git config core.autocrlf = false
LF We commit a file with LF line endings.

This should happend:
- The file is commited "as is", with LF line endings.
- While commiting, git should print the warning
- "warning: LF will be replaced by CRLF" to stderr
- stderr is piped (redirected) into NNO_attr_auto_aeol_crlf_false_LF.err
- we grep for "will be replaced by" in xx.err and pipe it into xx.err.actual

The rest is test_cmp, this is what you see.
The warning is missing, but should be there:

The file has LF, and after commit and a new checkout these LF will
be convertet into CRLF.

So why isn't the warning there (but here on my oldish machines)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t0027 racy?

2016-08-08 Thread Torsten Bögershausen
On 2016-08-08 17.05, Johannes Schindelin wrote:
> Hi Torsten,
> 
> I remember that you did a ton of work on t0027. Now I see problems, and
> not only that the entire script now takes a whopping 4 minutes 20 seconds
> to run on my high-end Windows machine.
> 
> It appears that t0027 fails randomly for me, in seemingly random places.
> Sometimes all 1388 cases pass. Sometimes "29 - commit NNO files crlf=true
> attr=auto LF" fails. Sometimes it is "24 - commit NNO files crlf=false
> attr=auto LF". Sometimes it is "114 - commit NNO files crlf=false
> attr=auto LF", and sometimes "111 - commit NNO files attr=auto aeol=lf
> crlf=false CRLF_mix_LF".
> 
> When I run it with -i -v -x --tee, it passes every single time (taking
> over 5 minutes, just to make things worse)...
> 
> Any idea about any possible races?
Just to double-check: I assume that you have this
commit ded2444ad8ab8128cae2b91b8efa57ea2dd8c7a5
Author: Torsten Bögershausen 
Date:   Mon Apr 25 18:56:27 2016 +0200

t0027: make commit_chk_wrnNNO() reliable
in your tree ?


Is there a special pattern ?
Did you
a) Update the machine ?
b) Update Git code base ?
or both ?
(Yes, I know that this may be stupid questions)

Is it only the NNO tests that fail ?
Did they ever pass ?
(I think I run them some time ago on a Virtual machine running Windows 7)

I see only "commit NNO files" in you report, they belong to
check_warning(), which is called around line 126 in t0027.


check_warning() does a grep on a file which has been re-directed from stderr.

How reproducible is the problem ?
If you add
exit 0
After the last "commit_chk_wrnNNO" line (line 418),
does
ls -l crlf*.err
give you any hint ?




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Aug 2016, #02; Thu, 4)

2016-08-07 Thread Torsten Bögershausen
On 2016-08-05 01.32, Junio C Hamano wrote:
> Mike Hommey  writes:
[]

>> What kind of support are you expecting?
> 
> The only rationale I recall you justifying this series was that this
> makes the resulting code easier to read, but I do not recall other
> people agreeing with you, and I do not particularly see the
> resulting code easier to follow.
> 
If that helps:
I can offer to make a code review,
and come back at the end of the week or so.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 11/12] convert: add filter..process option

2016-08-06 Thread Torsten Bögershausen
On 2016-08-06 00.06, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
>
>> On 2016-08-03 18.42, larsxschnei...@gmail.com wrote:
>>> The filter is expected to respond with the result content in zero
>>> or more pkt-line packets and a flush packet at the end. Finally, a
>>> "result=success" packet is expected if everything went well.
>>> 
>>> packet:  git< SMUDGED_CONTENT
>>> packet:  git< 
>>> packet:  git< result=success\n
>>> 
>> I would really send the diagnostics/return codes before the content.
> I smell the assumption "by the time the filter starts output, it
> must have finished everything and knows both size and the status".
>
> I'd prefer to have a protocol that allows us to do streaming I/O on
> both ends when possible, even if the initial version of the filters
> (and the code that sits on the Git side) hold everything in-core
> before starting to talk.
>
>>> If the result content is empty then the filter is expected to respond
>>> only with a flush packet and a "result=success" packet.
>> ...
>> Which may be:
>>
>> packet:  git< result=success\n
>> packet:  git< SMUDGED_CONTENT
>> packet:  git< 
>>
>> or for an empty file:
>>
>> packet:  git< result=success\n
>> packet:  git< SMUDGED_CONTENT
>> packet:  git< 
> The above two look the same to me.
Copy-paste error.
i see that we need a status after the complete transfer,
and after some thinking I would like to take back my comment.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 11/12] convert: add filter..process option

2016-08-05 Thread Torsten Bögershausen
On 2016-08-03 18.42, larsxschnei...@gmail.com wrote:
> The filter is expected to respond with the result content in zero
> or more pkt-line packets and a flush packet at the end. Finally, a
> "result=success" packet is expected if everything went well.
> 
> packet:  git< SMUDGED_CONTENT
> packet:  git< 
> packet:  git< result=success\n
> 
I would really send the diagnostics/return codes before the content.

> If the result content is empty then the filter is expected to respond
> only with a flush packet and a "result=success" packet.
> 
> packet:  git< 
> packet:  git< result=success\n
> 

Which may be:

packet:  git< result=success\n
packet:  git< SMUDGED_CONTENT
packet:  git< 

or for an empty file:

packet:  git< result=success\n
packet:  git< SMUDGED_CONTENT
packet:  git< 


or in case of an error:
packet:  git< result=reject\n
# And this will not send the "" packet

Does this makes sense ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 07/12] run-command: add clean_on_exit_handler

2016-08-05 Thread Torsten Bögershausen
On 2016-08-05 15.08, Lars Schneider wrote:

[]
> Yeah it could do that. But then the filter cannot do things like
> modifying the index after the fact... however, that might be considered
> nasty by the Git community anyways... I am thinking about dropping
> this patch in the next roll as it is not strictly necessary for my
> current use case.
(Thanks Peff for helping me out with the EOF explanation)

I would say that a filter is a filter, and should do nothing else than filtering
one file,
(or a stream).
When you want to modify the index, a hook may be your friend.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Making file permissions match

2016-08-03 Thread Torsten Bögershausen
On Wed, Aug 03, 2016 at 09:46:06AM -0400, jonsm...@gmail.com wrote:
> I'm working with some Windows programmers that don't believe in file
> permissions They keep sending me zip files of their source tree.  I
> have my copy of the tree in git on Linux with all of the correct file
> permissions.
> 
> So I unzip the archive they send me and to see what they changed. I
> unzip it on top of my git tree. But now all of the file permissions
> don't match. The code deltas are there but there is also all of this
> noise from the file permissions.
> 
> Is there an easy way to deal with this? I want to keep the code deltas
> and ignore the permission changes. Since there are about 100K files
> this is too much to do manually.

(I'm not sure if I understand "match" correctly)

You can ignore changes in the executable permissions:
git config core.filemode false

Please let us know if it works
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 03/10] pkt-line: add packet_flush_gentle()

2016-08-02 Thread Torsten Bögershausen
On Sun, Jul 31, 2016 at 11:45:08PM +0200, Lars Schneider wrote:
> 
> > On 31 Jul 2016, at 22:36, Torstem Bögershausen  wrote:
> > 
> > 
> > 
> >> Am 29.07.2016 um 20:37 schrieb larsxschnei...@gmail.com:
> >> 
> >> From: Lars Schneider 
> >> 
> >> packet_flush() would die in case of a write error even though for some 
> >> callers
> >> an error would be acceptable.
> > What happens if there is a write error ?
> > Basically the protocol is out of synch.
> > Lenght information is mixed up with payload, or the other way
> > around.
> > It may be, that the consequences of a write error are acceptable,
> > because a filter is allowed to fail.
> > What is not acceptable is a "broken" protocol.
> > The consequence schould be to close the fd and tear down all
> > resources. connected to it.
> > In our case to terminate the external filter daemon in some way,
> > and to never use this instance again.
> 
> Correct! That is exactly what is happening in kill_protocol2_filter()
> here:

Wait a second.
Is kill the same as shutdown ?
I would expect that
The process terminates itself as soon as it detects EOF.
As there is nothing more read.

Then the next question: The combination of kill & protocol in kill_protocol(),
what does it mean ?
Is it more like a graceful shutdown_protocol() ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] convert: add filter..process option

2016-07-28 Thread Torsten Bögershausen


On 07/27/2016 02:06 AM, larsxschnei...@gmail.com wrote:
Some comments inline
[]
> The filter is expected to respond with the result content size as
> ASCII number in bytes and the result content in packet format with
> a flush packet at the end:
> 
> packet:  git< 57
> packet:  git< SMUDGED_CONTENT
> packet:  git< 
how does the filter report possible errors here ?
Let's say I want to convert UTF-8 into ISO-8859-1,
but the conversion is impossible.

 > packet:  git< -1
 > packet:  git< Error message, (or empty), followed by a '\n'
 > packet:  git< 

Side note: a filter may return length 0.
Suggestion: add 2 test cases.


> 
> Please note: In a future version of Git the capability "stream"
> might be supported. In that case the content size must not be
> part of the filter response.
>
> Afterwards the filter is expected to wait for the next command.
>
> Signed-off-by: Lars Schneider 
> Helped-by: Martin-Louis Bright 
> Signed-off-by: Lars Schneider 
> ---
>  Documentation/gitattributes.txt |  54 +++-
>  convert.c   | 269 
++--

>  t/t0021-conversion.sh   | 175 ++
>  t/t0021/rot13-filter.pl | 146 ++
>  4 files changed, 631 insertions(+), 13 deletions(-)
>  create mode 100755 t/t0021/rot13-filter.pl
>
> diff --git a/Documentation/gitattributes.txt 
b/Documentation/gitattributes.txt

> index 8882a3e..8fb40d2 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -300,7 +300,11 @@ checkout, when the `smudge` command is 
specified, the command is

>  fed the blob object from its standard input, and its standard
>  output is used to update the worktree file.  Similarly, the
>  `clean` command is used to convert the contents of worktree file
> -upon checkin.
> +upon checkin. By default these commands process only a single
> +blob and terminate. If a long running filter process (see section
> +below) is used then Git can process all blobs with a single filter
> +invocation for the entire life of a single Git command (e.g.
> +`git add .`).
>
>  One use of the content filtering is to massage the content into a shape
>  that is more convenient for the platform, filesystem, and the user 
to use.

> @@ -375,6 +379,54 @@ substitution.  For example:
>  
>
>
> +Long Running Filter Process
> +^^^
> +
> +If the filter command (string value) is defined via
> +filter..process then Git can process all blobs with a
> +single filter invocation for the entire life of a single Git
> +command by talking with the following packet format (pkt-line)
> +based protocol over standard input and standard output.
> +
> +Git starts the filter on first usage and expects a welcome
> +message, protocol version number, and filter capabilities
> +seperated by spaces:
> +
> +packet:  git< git-filter-protocol
> +packet:  git< version 2
> +packet:  git< clean smudge
> +
> +Supported filter capabilities are "clean" and "smudge".
> +
> +Afterwards Git sends a command (e.g. "smudge" or "clean" - based
> +on the supported capabilities), the filename, the content size as
> +ASCII number in bytes, and the content in packet format with a
> +flush packet at the end:
> +
> +packet:  git> smudge
> +packet:  git> testfile.dat
> +packet:  git> 7
> +packet:  git> CONTENT
> +packet:  git> 
> +
> +
> +The filter is expected to respond with the result content size as
> +ASCII number in bytes and the result content in packet format with
> +a flush packet at the end:
> +
> +packet:  git< 57
> +packet:  git< SMUDGED_CONTENT
> +packet:  git< 
> +
> +Please note: In a future version of Git the capability "stream"
> +might be supported. In that case the content size must not be
> +part of the filter response.
> +
> +Afterwards the filter is expected to wait for the next command.
> +A demo implementation can be found in `t/t0021/rot13-filter.pl`
> +located in the Git core repository.
> +
> +
>  Interaction between checkin/checkout attributes
>  ^^^
>
> diff --git a/convert.c b/convert.c
> index 522e2c5..5ff200b 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -3,6 +3,7 @@
>  #include "run-command.h"
>  #include "quote.h"
>  #include "sigchain.h"
> +#include "pkt-line.h"
>
>  /*
>   * convert.c - convert a file when checking it out and checking it in.
> @@ -481,11 +482,232 @@ static int apply_filter(const char *path, 
const char *src, size_t len, int fd,

>return ret;
>  }
>
> +static off_t multi_packet_read(struct strbuf *sb, const int fd, 
const size_t size)

> +{
> +  off_t bytes_read;

Re: [PATCH v2 4/5] convert: generate large test files only once

2016-07-26 Thread Torsten Bögershausen



On 07/27/2016 02:06 AM, larsxschnei...@gmail.com wrote:

From: Lars Schneider 

Generate a more interesting large test file with random characters in
between and reuse this test file in multiple tests. Run tests formerly
marked as EXPENSIVE every time but with a smaller test file.

Signed-off-by: Lars Schneider 
---
 t/t0021-conversion.sh | 35 +--
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7b45136..b9911a4 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -4,6 +4,13 @@ test_description='blob conversion via gitattributes'

 . ./test-lib.sh

+if test_have_prereq EXPENSIVE
+then
+   T0021_LARGE_FILE_SIZE=2048
+else
+   T0021_LARGE_FILE_SIZE=30
+fi
+
 cat test.i &&
git add test test.t test.i &&
rm -f test test.t test.i &&
-   git checkout -- test test.t test.i
+   git checkout -- test test.t test.i &&
+
+   mkdir -p generated-test-data &&
+   for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
+   do
+   # Generate 1MB of empty data and 100 bytes of random characters
+   printf "%1048576d" 1
+   printf "$(LC_ALL=C tr -dc "A-Za-z0-9" >8)) count=1 2>/dev/null)"

I'm not sure how portable /dev/urandom is.
The other thing, that "really random" numbers are an overkill, and
it may be easier to use pre-defined numbers,

The rest of 1..4 looks good, I will look at 5/5 later.


+   done >generated-test-data/large.file
 '

 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -199,9 +214,9 @@ test_expect_success 'required filter clean failure' '
 test_expect_success 'filtering large input to small output should use little 
memory' '
test_config filter.devnull.clean "cat >/dev/null" &&
test_config filter.devnull.required true &&
-   for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
-   echo "30MB filter=devnull" >.gitattributes &&
-   GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
+   cp generated-test-data/large.file large.file &&
+   echo "large.file filter=devnull" >.gitattributes &&
+   GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add large.file
 '

 test_expect_success 'filter that does not read is fine' '
@@ -214,15 +229,15 @@ test_expect_success 'filter that does not read is fine' '
test_cmp expect actual
 '

-test_expect_success EXPENSIVE 'filter large file' '
+test_expect_success 'filter large file' '
test_config filter.largefile.smudge cat &&
test_config filter.largefile.clean cat &&
-   for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
-   echo "2GB filter=largefile" >.gitattributes &&
-   git add 2GB 2>err &&
+   echo "large.file filter=largefile" >.gitattributes &&
+   cp generated-test-data/large.file large.file &&
+   git add large.file 2>err &&
test_must_be_empty err &&
-   rm -f 2GB &&
-   git checkout -- 2GB 2>err &&
+   rm -f large.file &&
+   git checkout -- large.file 2>err &&
test_must_be_empty err
 '



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 12/12] bisect--helper: `get_terms` & `bisect_terms` shell function in C

2016-07-25 Thread Torsten Bögershausen



On 07/25/2016 06:53 PM, Junio C Hamano wrote:

Pranit Bauva  writes:


>>> +enum terms_defined {
>>> +   TERM_BAD = 1,
>>> +   TERM_GOOD = 2,
>>> +   TERM_NEW = 4,
>>> +   TERM_OLD = 8
>>> +};
>>> +

>>
>> What does TERM stand for  ?

The terms (words) used to denote the newer and the older parts of
the history.  Traditionally, as a regression-hunting tool (i.e. it
used to work, where did I break it?), we called older parts of the
history "good" and newer one "bad", but as people gained experience
with the tool, it was found that the pair of words was error-prone
to use for an opposite use case "I do not recall fixing it, but it
seems to have been fixed magically, when did that happen?", and a
more explicit "new" and "old" were introduced.


Thanks for the explanation.
Is there any risk that a more generic term like "TERM_BAD" may collide
with some other definition some day ?

Would it make sense to call it GIT_BISECT_TERM_BAD, GBS_TERM_BAD,
BIS_TERM_BAD or something more unique ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/3] convert: add filter..useProtocol option

2016-07-22 Thread Torsten Bögershausen



On 07/22/2016 05:49 PM, larsxschnei...@gmail.com wrote:

From: Lars Schneider 

Git's clean/smudge mechanism invokes an external filter process for every
single blob that is affected by a filter. If Git filters a lot of blobs
then the startup time of the external filter processes can become a
significant part of the overall Git execution time.

This patch adds the filter..useProtocol option which, if enabled,
keeps the external filter process running and processes all blobs with
the following protocol over stdin/stdout.

1. Git starts the filter on first usage and expects a welcome message
with protocol version number:
Git <-- Filter: "git-filter-protocol\n"
Git <-- Filter: "version 1"

Is there no terminator here ?
How long should the reading side wait without a '\n', or should it read
"version 1\n" ?



2. Git sends the command (either "smudge" or "clean"), the filename, the
content size in bytes, and the content separated by a newline character:
Git --> Filter: "smudge\n"
Git --> Filter: "testfile.dat\n"
Git --> Filter: "7\n"
Git --> Filter: "CONTENT"

3. The filter is expected to answer with the result content size in
bytes and the result content separated by a newline character:
Git <-- Filter: "15\n"
Git <-- Filter: "SMUDGED_CONTENT"

4. The filter is expected to wait for the next file in step 2, again.

Please note that the protocol filters do not support stream processing
with this implemenatation because the filter needs to know the length of

typo

the result in advance. A protocol version 2 could address this in a
future patch.

Signed-off-by: Lars Schneider 
Helped-by: Martin-Louis Bright 
---
 Documentation/gitattributes.txt |  41 +++-
 convert.c   | 210 ++--
 t/t0021-conversion.sh   | 170 
 t/t0021/rot13.pl|  80 +++
 4 files changed, 494 insertions(+), 7 deletions(-)
 create mode 100755 t/t0021/rot13.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8882a3e..7026d62 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -300,7 +300,10 @@ checkout, when the `smudge` command is specified, the 
command is
 fed the blob object from its standard input, and its standard
 output is used to update the worktree file.  Similarly, the
 `clean` command is used to convert the contents of worktree file
-upon checkin.
+upon checkin. By default these commands process only a single
+blob and terminate. If the setting filter..useProtocol is
+enabled then Git can process all blobs with a single filter command
+invocation (see filter protocol below).

 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -375,6 +378,42 @@ substitution.  For example:
 


+Filter Protocol
+^^^
+
+If the setting filter..useProtocol is enabled then Git
+can process all blobs with a single filter command invocation
+by talking with the following protocol over stdin/stdout.
+
+Git starts the filter on first usage and expects a welcome
+message with protocol version number:
+
+Git < Filter: "git-filter-protocol\n"
+Git < Filter: "version 1"
+
+
+Afterwards Git sends a blob command (either "smudge" or "clean"),
+the filename, the content size in bytes, and the content separated
+by a newline character:
+
+Git > Filter: "smudge\n"
+Git > Filter: "testfile.dat\n"
+Git > Filter: "7\n"
+Git > Filter: "CONTENT"
+
+
+The filter is expected to answer with the result content size in
+bytes and the result content separated by a newline character:
+
+Git < Filter: "15\n"
+Git < Filter: "SMUDGED_CONTENT"
+
+
+Afterwards the filter is expected to wait for the next blob process
+command. A demo implementation can be found in `t/t0021/rot13.pl`
+located in the Git core repository.
+
+
 Interaction between checkin/checkout attributes
 ^^^

diff --git a/convert.c b/convert.c
index 522e2c5..91ce86f 100644
--- a/convert.c
+++ b/convert.c
@@ -481,12 +481,188 @@ static int apply_filter(const char *path, const char 
*src, size_t len, int fd,
return ret;
 }

+static int cmd_process_map_init = 0;
+static struct hashmap cmd_process_map;
+
+struct cmd2process {
+   struct hashmap_entry ent; /* must be the first member! */
+   const char *cmd;
+   long protocol;
+   struct child_process process;
+};
+
+static int cmd2process_cmp(const struct cmd2process *e1,
+   const struct 
cmd2process *e2,
+   const void *unused)
+{
+   return

Re: [PATCH v10 12/12] bisect--helper: `get_terms` & `bisect_terms` shell function in C

2016-07-21 Thread Torsten Bögershausen



On 07/20/2016 11:47 PM, Pranit Bauva wrote:

Reimplement the `get_terms` and `bisect_terms` shell function in C and
add `bisect-terms` subcommand to `git bisect--helper` to call it from
git-bisect.sh .

In the shell version, the terms were identified by strings but in C
version its done by bit manipulation and passing the integer value to
the function.

Using `--bisect-terms` subcommand is a temporary measure to port shell
function in C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired and will be called by some
other methods.

Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 builtin/bisect--helper.c | 74 +++-
 git-bisect.sh| 35 ++-
 2 files changed, 75 insertions(+), 34 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 001096a..185a8ad 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -8,6 +8,13 @@
 #include "run-command.h"
 #include "prompt.h"

+enum terms_defined {
+   TERM_BAD = 1,
+   TERM_GOOD = 2,
+   TERM_NEW = 4,
+   TERM_OLD = 8
+};
+

What does TERM stand for  ?
It could be TERMinal, TERMinator or just TERM.
Something like BIS_TERM_DEF_BAD .. may be more intuitive,
and may avoid name clashes in the long run.

And why are the defines 1,2,4,8 ?
It looks as if a #define bitmap may be a better choice here ?
How do we do these kind of bit-wise opions otherwise ?


 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
@@ -26,6 +33,7 @@ static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --bisect-write 
[]"),
N_("git bisect--helper --bisect-check-and-set-terms   
"),
N_("git bisect--helper --bisect-next-check []  
term_bad, fp) == EOF ||
+ strbuf_getline(&terms->term_good, fp) == EOF;
+
+   fclose(fp);
+   return res ? -1 : 0;
+}
+
+static int bisect_terms(struct bisect_terms *terms, int term_defined)
+{
+   if (get_terms(terms)) {
+   fprintf(stderr, "no terms defined\n");
+   return -1;
+   }
+   if (!term_defined) {
+   printf("Your current terms are %s for the old state\nand "
+  "%s for the new state.\n", terms->term_good.buf,
+  terms->term_bad.buf);
+   return 0;
+   }
+
+   if (term_defined == TERM_GOOD || term_defined == TERM_OLD)
+   printf("%s\n", terms->term_good.buf);
+   if (term_defined == TERM_BAD || term_defined == TERM_NEW)
+   printf("%s\n", terms->term_bad.buf);

May be a switch-case ?
Or at least "else if" ?


+
+   return 0;
+}
+
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
enum {
@@ -369,9 +414,11 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
CHECK_EXPECTED_REVS,
BISECT_WRITE,
CHECK_AND_SET_TERMS,
-   BISECT_NEXT_CHECK
+   BISECT_NEXT_CHECK,
+   BISECT_TERMS
} cmdmode = 0;
int no_checkout = 0, res = 0;
+   enum terms_defined term_defined = 0;
struct option options[] = {
OPT_CMDMODE(0, "next-all", &cmdmode,
 N_("perform 'git bisect next'"), NEXT_ALL),
@@ -389,6 +436,16 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
 N_("check and set terms in a bisection state"), 
CHECK_AND_SET_TERMS),
OPT_CMDMODE(0, "bisect-next-check", &cmdmode,
 N_("check whether bad or good terms exist"), 
BISECT_NEXT_CHECK),
+   OPT_CMDMODE(0, "bisect-terms", &cmdmode,
+N_("print out the bisect terms"), BISECT_TERMS),
+   OPT_BIT(0, "term-bad", &term_defined,
+N_("show the bad term"), TERM_BAD),
+   OPT_BIT(0, "term-good", &term_defined,
+N_("show the good term"), TERM_GOOD),
+   OPT_BIT(0, "term-new", &term_defined,
+N_("show the new term"), TERM_NEW),
+   OPT_BIT(0, "term-old", &term_defined,
+N_("show the old term"), TERM_OLD),
OPT_BOOL(0, "no-checkout", &no_checkout,
 N_("update BISECT_HEAD instead of checking out the current 
commit")),
OPT_END()
@@ -399,6 +456,16 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
argc = parse_options(argc, argv, prefix, options,
 git_bisect_helper_usage, 0);

+   if (cmdmode != BISECT_TERMS && term_defined)
+   die(_("--term-bad, --term-good, --ter

Re: Looking for help to understand external filter driver code

2016-07-19 Thread Torsten Bögershausen



On 07/20/2016 12:01 AM, Lars Schneider wrote:


On 19 Jul 2016, at 23:33, Junio C Hamano  wrote:


Lars Schneider  writes:


Git writes --> 4 byte filename length
Git writes --> filename string


Why limit to 32GB?  Perhaps NUL termination is more appropriate
here?

OK, I will use NUL termination for the filename.
You're also right about the limit - I will use 8 byte to encode the
content length.

Is there any reason to encode the file length in binary format?
With all the discussions about big endianess, little endianess, 4GiB or 
32 GiB.

How about simply writing the length as ASCII ?

Unless we don't want to have a "spare" field for future extensions,
it could be good to add an option field, which may be empty.
On top of that, do we want a field separator different from the line
separator ?

How about this:


 may be "var=value;var2=value2" or simply ""
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers

2016-07-15 Thread Torsten Bögershausen



On 07/15/2016 12:38 AM, Jeff King wrote:

On Thu, Jul 14, 2016 at 03:30:58PM -0700, Junio C Hamano wrote:


If we move to time_t everywhere, I think we'll need an extra
TIME_T_IS_64BIT, but we can cross that bridge when we come to it.

Likewise I think we'll need SIZE_T_IS_64BIT eventually (for real 32-bit
systems; LLP64 systems like Windows will then be able to run the test).


I guess I wrote essentially the same thing before refreshing my
Inbox.

I am a bit fuzzy between off_t and size_t; the former is for the
size of things you see on the filesystem, while the latter is for
you to give malloc(3).  I would have thought that off_t is the type
we would want at the end of the raw object header, denoting the size
of a blob object when deflated, which could be larger than the size
of a region of memory we can get from malloc(3), in which case we
would use the streaming interface.


Yeah, your understanding is right (s/deflated/inflated/). I agree that
off_t is probably a better size for blobs. Traditionally git assumed any
object could fit in memory. The streaming interface helps that somewhat,
but I think there are cases where we still must load a blob (e.g., if it
is stored as a delta). In theory that never happens because of
core.bigfilethreshold, but you may get a packfile from somebody with a
higher threshold than you.

I wouldn't be surprised if there are other cases that aren't smart
enough to use the streaming interface yet, but the solution there is to
make them smarter. :)

So off_t is probably better. We do need to be careful, though, when
allocating objects. E.g., this:

  off_t size;
  struct git_istream *stream;
  void *buf;

  stream = open_istream(sha1, &type, &size, NULL);
  buf = xmalloc(size);
  while (1) {
/* read stream into buf */
  }

is a security hole when size_t is less than off_t (it gets truncated in
the call to xmalloc, which allocates too few bytes). This is a toy
example, obviously, but it's something to watch out for.

-Peff

That code is "illegal", it should be
 buf = xmalloc(xsize_t(size));

And the transition from off_t into size_t
should always got via xsize_t():

static inline size_t xsize_t(off_t len)
{
if (len > (size_t) len)
die("Cannot handle files this big");
return (size_t)len;
}

There are some more things to be done, on the long run:
- convert "unsigned long" into either off_t of size_t in e.g. convert.c
- Use the streaming interface to analyze if blobs are binary
  (That is already on my list, the old "stream and early out"
  from the olc 10/10, gmane/$293010 or so can be reused)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB

2016-07-14 Thread Torsten Bögershausen



On 07/14/2016 06:48 PM, Johannes Sixt wrote:

Am 30.06.2016 um 11:09 schrieb Jeff King:

+/*
+ * This is the max value that a ustar size header can specify, as it
is fixed
+ * at 11 octal digits. POSIX specifies that we switch to extended
headers at
+ * this size.
+ */
+#define USTAR_MAX_SIZE 0777UL


This is too large by one bit for our 32-bit unsigned long on Windows:

archive-tar.c: In function 'write_tar_entry':
archive-tar.c:295: warning: integer constant is too large for 'unsigned
long' type
archive-tar.c: In function 'write_global_extended_header':
archive-tar.c:332: warning: integer constant is too large for 'unsigned
long' type
archive-tar.c:335: warning: integer constant is too large for 'unsigned
long' type
archive-tar.c:335: warning: overflow in implicit constant conversion

-- Hannes

Similar problem on 32 Bit Linux:
 In function ‘write_global_extended_header’:
archive-tar.c:29:25: warning: overflow in implicit constant conversion 
[-Woverflow]

 #define USTAR_MAX_MTIME 0777UL
 ^
archive-tar.c:335:16: note: in expansion of macro ‘USTAR_MAX_MTIME’
   args->time = USTAR_MAX_MTIME;

--
I want to volunteer to do more tests on 32 bit Linux ;-)
Does this fix it and look as a good thing to change ?


--- a/archive-tar.c
+++ b/archive-tar.c
@@ -332,7 +332,7 @@ static void write_global_extended_header(struct 
archiver_args *args)

if (args->time > USTAR_MAX_MTIME) {
strbuf_append_ext_header_uint(&ext_header, "mtime",
  args->time);
-   args->time = USTAR_MAX_MTIME;
+   args->time = (time_t)USTAR_MAX_MTIME;
}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)

2016-07-13 Thread Torsten Bögershausen



On 07/13/2016 12:20 AM, Joey Hess wrote:

Torsten Bögershausen wrote re jh/clean-smudge-annex:

The thing is that we need to check the file system to find .gitatttibutes,
even if we just did it 1 nanosecond ago.

So the .gitattributes is done 3 times:
-1 would_convert_to_git_filter_fd(
-2 assert(would_convert_to_git_filter_fd(path));
-3 convert.c/convert_to_git_filter_fd()

The only situation where this could be useful is when the .gitattributes
change between -1 and -2,
but then they would have changed between -1 and -3, and convert.c
will die().

Does it make sense to remove -2 ?


There's less redundant work going on than at first seems, because
.gitattribute files are only read once and cached. Verified by strace.


OK, I think I missed that work (not enough time for Git at the moment)
Junio, please help me out, do we have a cache here now?
I tried to figure out that following your attr branch, but failed.

So, the redundant work is only in the processing that convert_attrs() does
of the cached .gitattributes.

Notice that there was a similar redundant triple call to convert_attrs()
before my patch set:

1. index_fd checks would_convert_to_git_filter_fd
2. index_stream_convert_blob does assert(would_convert_to_git_filter_fd(path))
   (Again redundantly since 1. is its only caller and has already
   checked.)
3. in convert_to_git_filter_fd

If convert_attrs() is somehow expensive, it might be worth passing a
struct conv_attrs * into the functions that currently call
convert_attrs(). But it does not look expensive to me.

I have that on the list, but seems to be uneccesary now.


I think it would be safe enough to remove both asserts, at least as the
code is structured now.


OK.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

2016-07-11 Thread Torsten Bögershausen


How do things look at this point?  This version is what I ended up
queuing in 'pu', but I took your "Thanks" in $gmane/299120 to only
mean "Thanks for feeding some ideas to help me move forward", not
necessarily "Tnanks that looks like the right approach." yet, so
right now both topics are stalled and waiting for an action from
you.

Yes, the code looks good to me.
And the commit message does explain what is going on.

For my taste, these 3 lines don't explain too much,may be remove them ?
> The test update was taken from a series by Torsten Bögershausen
> that attempted to fix this with a different approach (which was a
> lot more intrusive).
So thanks for your efforts, ack from my side.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

2016-07-08 Thread Torsten Bögershausen



On 07/08/2016 06:36 PM, Junio C Hamano wrote:

Torsten Bögershausen  writes:


I dunno.  I really do not like that extra sha1 argument added all
over the callchain by this patch.

Or did you mean other calls to add_cacheinfo()?


I didn't mean too much - the whole call chain touches code where I
am not able to comment on details.
I'm happy to test other implementations, if someone suggests a
path, so to say.


I did a bit of experiment.

When 1/3 alone is applied, and then only changes for t/t6038 from
3/3 is picked, (i.e. we do not add the extra "don't look at index,
check this contents"), your "Merge addition of text=auto eol=CRLF"
test would fail.

And then with this further on top:

diff --git a/merge-recursive.c b/merge-recursive.c
index b880ae5..628c8ed 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -202,6 +202,9 @@ static int add_cacheinfo(unsigned int mode, const unsigned 
char *sha1,
const char *path, int stage, int refresh, int options)
 {
struct cache_entry *ce;
+
+   if (!stage)
+   remove_file_from_cache(path);
ce = make_cache_entry(mode, sha1 ? sha1 : null_sha1, path, stage,
  (refresh ? (CE_MATCH_REFRESH |
  CE_MATCH_IGNORE_MISSING) : 0 ));


Thanks :-)
Did that experiment made it to a branch somewhere ?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

2016-07-08 Thread Torsten Bögershausen



On 07/07/16 20:43, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
>
>> This is the call stack:
>>
>> merge-recursive.c/add_cacheinfo(unsigned int mode, const unsigned 
char *sha1,

>>const char *path, int stage, int refresh, int options)
>> {
>>struct cache_entry *ce;
>>ce = make_cache_entry
>>if (!ce)
>>return error(_("addinfo_cache failed for path '%s'"), path);
>>return add_cache_entry(ce, options);
>> }
>> #Calls
>> read-cache.c/make_cache_entry(path=file sha1=0x99b633 stage=0)
>>
>>
>> struct cache_entry *make_cache_entry(unsigned int mode,
>>const unsigned char *sha1, const char *path, int stage,
>>unsigned int refresh_options)
>> {
>> [snip]
>>ret = refresh_cache_entry(ce, refresh_options);
>>if (ret != ce)
>>free(ce);
>>return ret;
>> }
>>
>> #Calls
>> refresh_cache_ent(&the_index, ce, options, NULL, NULL);
>>
>> #Calls
>> ie_modified()
>>
>> #Calls
>> read-cache.c/ie_match_stat()
>>
>> #Calls
>> changed |= ce_modified_check_fs(ce, st);
>>
>> #Calls
>> ce_compare_data(path=file sha1=0x99b633)
>>
>> #Calls
>> index_fd(..., ..., ce->name, ...)
>> #Note the sha is lost here, the parameter sha is the output.
>>
>> Deep down, has_cr_in_index(path) will look at ad55e2 instead,
>> which is wrong.
> The call to add_cacheinfo() is made with 99b633 to record the 3-way
> merge result to the index in this callchain:
>
>  merge_trees()
>  -> git_merge_trees()
>  -> process_renames() # does nothing for this case
>  -> process_entry()
> -> merge_content()
>-> merge_file_special_markers()
>   -> merge_file_1()
>  -> merge_3way()
> -> ll_merge() # correctly handles the renormalization
>  -> write_sha1_file() # this is what gives us 99b633
> -> update_file() # this is fed the 99b633 write_sha1_file() computed
>-> update_file_flags()
>   -> read_sha1_file() # reads 99b633
>   -> convert_to_working_tree()
>   -> write_in_full() # updates the working tree
>   -> add_cacheinfo() # to record 99b633 at "file" in the index
>
> So refresh() may incorrectly find that 99b633 does not match what is
> in the filesystem if it looks at _any_ entry in the index.  We know
> we have written the file ourselves, so by definition it should match
> ;-)

Does it, always ?
There was a lengthy discussion around January, if
git checkout -f
should always give a clean workspace.
(My suggestion was yes, please), but the conclusion
was to always check the other way around:
What would "git add" say ?
If the smudge/clean don't provide
a round trip, then the worktree should not be clean.

My understanding is, that after merge, the filters must be round trip
(and all other conversions), otherwise the merge willl fail.

In this case the merge fails because of a bug in Git.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

2016-07-08 Thread Torsten Bögershausen



On 07/07/16 20:43, Junio C Hamano wrote:

Torsten Bögershausen  writes:


This is the callstack:

merge-recursive.c/add_cacheinfo(unsigned int mode, const unsigned char *sha1,
const char *path, int stage, int refresh, int options)
{
struct cache_entry *ce;
ce = make_cache_entry
if (!ce)
return error(_("addinfo_cache failed for path '%s'"), path);
return add_cache_entry(ce, options);
}
#Calls
read-cache.c/make_cache_entry(path=file sha1=0x99b633 stage=0)


struct cache_entry *make_cache_entry(unsigned int mode,
const unsigned char *sha1, const char *path, int stage,
unsigned int refresh_options)
{
 [snip]
ret = refresh_cache_entry(ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
}

#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);

#Calls
ie_modified()

#Calls
read-cache.c/ie_match_stat()

#Calls
changed |= ce_modified_check_fs(ce, st);

#Calls
ce_compare_data(path=file sha1=0x99b633)

#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here, the parameter sha is the output.

Deep down, has_cr_in_index(path) will look at ad55e2 instead,
which is wrong.

The call to add_cacheinfo() is made with 99b633 to record the 3-way
merge result to the index in this callchain:

  merge_trees()
  -> git_merge_trees()
  -> process_renames() # does nothing for this case
  -> process_entry()
 -> merge_content()
-> merge_file_special_markers()
   -> merge_file_1()
  -> merge_3way()
 -> ll_merge() # correctly handles the renormalization
  -> write_sha1_file() # this is what gives us 99b633
 -> update_file() # this is fed the 99b633 write_sha1_file() computed
-> update_file_flags()
   -> read_sha1_file() # reads 99b633
   -> convert_to_working_tree()
   -> write_in_full() # updates the working tree
   -> add_cacheinfo() # to record 99b633 at "file" in the index

So refresh() may incorrectly find that 99b633 does not match what is
in the filesystem if it looks at _any_ entry in the index.  We know
we have written the file ourselves, so by definition it should match
;-) Even though I am not sure why that should affect the overall
correctness of the program (because we have written the correct
result to both working tree and to the index), this needs to be
fixed.

I am however not convinced passing the full SHA-1 is a good
solution.

It seems at least that this is the correct way to do it.

The make_cache_entry() function may be creating a cache
entry to stuff in a non-default index (i.e. not "the_index"), but
its caller does not have a way to tell it which index the resulting
cache entry will go, and calls refresh_cache_entry(), which always
assumes that the caller is interested in "the_index", so what
add_cacheinfo() does by calling make_cache_entry() feels wrong in
the first place.  Also, the call from update_file_flags() knows that
the working tree is in sync with the resulting cache entry.

Perhaps update_file_flags() should not even call add_cacheinfo()
there in the first place?  I wonder if it should just call
add_file_to_index()---after all, even though the current code
already knows that 99b633 _is_ the result it wants in the index, it
still goes to the filesystem and rehashes.

I dunno.  I really do not like that extra sha1 argument added all
over the callchain by this patch.

Or did you mean other calls to add_cacheinfo()?

I didn't mean too much - the whole call chain touches code where I am not able 
to
comment on details.
I'm happy to test other implementations, if someone suggests a path, so to say.




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

2016-07-07 Thread Torsten Bögershausen
On 2016-07-06 16.57, Junio C Hamano wrote:
> Torsten Bögershausen  writes:
> 
>> At some point inside the merge, Git calls read-cache.c/ce_compare_data(),
>>   to check if the path named "file" is clean.
>>   According to the tree information, the path "file" has the sha1 99b633.
>>   #Note:
>>   #sha1 99b633 is "first line\nsame line\n"
> 
> I thought your scenario was that our side had CRLF both in the blob
> and in the working tree.  So this is a different example?  Our side
> has LF in the blob that is checked out with CRLF in the working tree,
> and their side has CRLF in the blob?
> 
That was probably to confuse myself, and the rest of the world,
sorry for confusion.

This is the callstack:

merge-recursive.c/add_cacheinfo(unsigned int mode, const unsigned char *sha1,
const char *path, int stage, int refresh, int options)
{
struct cache_entry *ce;
ce = make_cache_entry
if (!ce)
return error(_("addinfo_cache failed for path '%s'"), path);
return add_cache_entry(ce, options);
}
#Calls
read-cache.c/make_cache_entry(path=file sha1=0x99b633 stage=0)


struct cache_entry *make_cache_entry(unsigned int mode,
const unsigned char *sha1, const char *path, int stage,
unsigned int refresh_options)
{
[snip]
ret = refresh_cache_entry(ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
}

#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);

#Calls
ie_modified()

#Calls
read-cache.c/ie_match_stat()

#Calls
changed |= ce_modified_check_fs(ce, st);

#Calls
ce_compare_data(path=file sha1=0x99b633)

#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here, the parameter sha is the output.

Deep down, has_cr_in_index(path) will look at ad55e2 instead,
which is wrong.


>> ce_compare_data() starts the work:
>> OK, lets run index_fd(...,ce->name,...)
>> # index_fd will simulate a "git add"  and return the sha1 (via the sha1 
>> pointer)
>> # after the hashing.
>>
>> # ce_compare_data() will later compare ce->sha1 with the result stored in
>> # the local sha1. That's why a sha1 is in the parameter list.
>> # To return the resulting hash:
>>
>> ce_compare_data() calls index_fd(sha1, ..., ce->name, ...)
>>
>> #Down in index_fd():
>>
>> sha1_file.c/index_fd() calls index_core() (after consulting the attributes)
>> index_core() reads the file from the working tree into memory using
>> read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
>> to calculate the sha1.
>>
>> Before the hashing is done, index_mem() runs
>> convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
>> to convert  "blobs to git internal format".
>>
>>
>> Here, convert_to_git() consults the .gitattributes (again) to find out that
>> crlf_action is CRLF_AUTO in our case.
>> The "new safer autocrlf handling" says that if a blob as any CR, don't 
>> convert
>> CRLFs at "git add".
>>
>> convert.c/crlf_to_git() starts to do it's job:
>> - look at buf (It has CRLF, conversion may be needed)
>> - consult blob_has_cr()
>>   # Note: Before this patch, has_cr_in_index(path)) was used
>>
>> #Again, before this patch,
>> has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the
>> blob into memory.
>>
>> Now read_blob_data_from_cache() is a macro, and we end up in
>> read_blob_data_from_index(&the_index, (path), (sz))
>>
>> read-cache.c/read_blob_data_from_index() starts its work:
>>  pos = index_name_pos(istate, path, len);
>>  if (pos < 0) {
>>  /*
>>   * We might be in the middle of a merge, in which
>>   * case we would read stage #2 (ours).
>>   */
>>
>> # And here, and this is important to notice, "ours" is sha1 ad55e2,
>> # which corresponds to "first line\r\nsame line\r\n"
> 
> Where did 99b633 come from then?  There still is something missing
> in this description.
> 
> Puzzled...
This is an unfinished attempt for a commit message:
--
correct ce_compare_data() at the end of a merge

The following didn't work as expected:

 - At the end of a merge
 - merge.renormalize is true,
 - .gitattributes = "* text=auto"
 - core.eol = crlf

Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob
with LF "first line\nsame line\n".

The expected result of the merge is "first line\nsame line

Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

2016-07-02 Thread Torsten Bögershausen

On 2016-07-02 00.11, Junio C Hamano wrote:
[snip]

diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, 
struct stat *st)

if (fd >= 0) {
unsigned char sha1[20];
-   if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+   unsigned flags = HASH_USE_SHA_NOT_PATH;
+   memcpy(sha1, ce->sha1, sizeof(sha1));
+   if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);
/* index_fd() closed the file descriptor already */
}

I do not quite see how "Don't use ce->sha1, but use sha1 given to
you" flag affects this call, when sha1 and ce->sha1 are the same due
to memcpy() just before the call?



Lets step back a second. and try to explain whats going on.

Do a merge (and renormalize the files).

At some point inside the merge, Git calls read-cache.c/ce_compare_data(),
  to check if the path named "file" is clean.
  According to the tree information, the path "file" has the sha1 99b633.
  #Note:
  #sha1 99b633 is "first line\nsame line\n"

ce_compare_data() starts the work:
OK, lets run index_fd(...,ce->name,...)
# index_fd will simulate a "git add"  and return the sha1 (via the sha1 pointer)
# after the hashing.

# ce_compare_data() will later compare ce->sha1 with the result stored in
# the local sha1. That's why a sha1 is in the parameter list.
# To return the resulting hash:

ce_compare_data() calls index_fd(sha1, ..., ce->name, ...)

#Down in index_fd():

sha1_file.c/index_fd() calls index_core() (after consulting the attributes)
index_core() reads the file from the working tree into memory using
read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
to calculate the sha1.

Before the hashing is done, index_mem() runs
convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
to convert  "blobs to git internal format".


Here, convert_to_git() consults the .gitattributes (again) to find out that
crlf_action is CRLF_AUTO in our case.
The "new safer autocrlf handling" says that if a blob as any CR, don't convert
CRLFs at "git add".

convert.c/crlf_to_git() starts to do it's job:
- look at buf (It has CRLF, conversion may be needed)
- consult blob_has_cr()
  # Note: Before this patch, has_cr_in_index(path)) was used

#Again, before this patch,
has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the
blob into memory.

Now read_blob_data_from_cache() is a macro, and we end up in
read_blob_data_from_index(&the_index, (path), (sz))

read-cache.c/read_blob_data_from_index() starts its work:
pos = index_name_pos(istate, path, len);
if (pos < 0) {
/*
 * We might be in the middle of a merge, in which
 * case we would read stage #2 (ours).
 */

# And here, and this is important to notice, "ours" is sha1 ad55e2,
# which corresponds to "first line\r\nsame line\r\n"
# On our way in the callstack the information what to check had been lost:
# blob 99b633 and nothing else.
# read_blob_data_from_index() doesn't know the sha, but makes a guess,
# derived from path.
# The guess works OK for most callers: to take "ours" in a middle
# of a merge, but not here.

#Back to crlf_to_git():
The (wrong) blob has CRs in it, so crlf_to_git() decides not to convert CRLFs,
(and this is wrong).

# And now we go all the way back in the call chain:

The content in the worktree which is "first line\r\nsame line\r\n" is hashed:
hash_sha1_file(buf, size, typename(type), sha1);
#and the result is stored in the return pointer "sha1": ad55e2

#Back to ce_compare_data():
match = hashcmp(sha1, ce->sha1);

#And here sha1=ad55e2 != ce->sha199b633

The whole merge is aborted:

   error: addinfo_cache failed for path 'file'
file: unmerged (cbd69ec7cd12dd0989e853923867d94c8519aa52)
file: unmerged (ad55e240aeb42e0d9a0e18d6d8b02dd82ee3e527)
file: unmerged (99b633103c15c20cebebf821133ab526b0ff90b2)
fatal: git write-tree failed to write a tree
#Side note: cbd69ec is another file in the working tree.

###

With this patch,
ce_compare_data() feeds the sha1 99b633 into
blob_has_cr(const unsigned char *index_blob_sha1).

crlf_to_git() says:
"first line\r\nsame line\r\n" in the worktree needs to be converted
into
"first line\nsame line\n" as the "new" blob.

The new (converted) blob "first line\nsame line\n"
is hashed into 99b633, so ce_compare_data() says
fine, converting "first line\r\nsame line\r\n" will give sha1 99b633,
lets continue with the merge, and do the normalization.

# Note1:
# The renormalization is outside what this patch fixes.
# But I didn't manage to construct a test case, which triggered
# "fatal: git write-tree failed to write a tree" without using
# merge.renormalize

# Note2:
# ce_compare_data() has (if I understand thing

Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

2016-06-30 Thread Torsten Bögershausen
On 29.06.16 18:14, Junio C Hamano wrote:
> tbo...@web.de writes:
> 
>> From: Torsten Bögershausen 
>>
>> The following didn't work as expected:
> 
> Sorry for being slow (not in response but in understanding), but
> let's examine the expectation first.

Thanks for the patience.
There is one detail missing in my description:
The check if a file in the working tree is clean:

static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
{
int match = -1;
int fd = open(ce->name, O_RDONLY);

if (fd >= 0) {
unsigned char sha1[20];
unsigned flags = HASH_USE_SHA_NOT_PATH;
--
Remove the "new feature":

git diff  read-cache.c
diff --git a/read-cache.c b/read-cache.c
index dd98b36..1f951ea 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -170,7 +170,7 @@ static int ce_compare_data(const struct cache_entry *ce, 
struct stat *st)
 
if (fd >= 0) {
unsigned char sha1[20];
-   unsigned flags = HASH_USE_SHA_NOT_PATH;
+   unsigned flags = 0;//HASH_USE_SHA_NOT_PATH;
memcpy(sha1, ce->sha1, sizeof(sha1));
if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);

The the problem can be re-produced, and debugged with help of t6038:

not ok 5 - Merge addition of text=auto eol=CRLF


> 
>>  - In a middle of a merge
>>  - merge.renormalize is true,
> 
> gitattributes(5) tells us that this would make a "virtual check-out
> and check-in of all stages", i.e. each of the three blob objects
> involved is run through convert_to_working_tree(), and the result is
> run through convert_to_git().  Now, what are these two operations
> told to do?
> 
>>  - .gitattributes = "* text=auto"
>>  - core.eol = crlf
> 
> git-config(1) tells us that a text file will be checked out with
> CRLF line endings, so convert_to_working_tree() would work that
> way.  Even though core.eol entry in that manual page does not tell
> us anything, presumably convert_to_git() is expected to turn it back
> to LF line endings.
> 
Yes, the git-config(1) may need improvements, but that can go as a later path.

>> Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob
>> with LF "first line\nsame line\n".
I may have written:
Merge a blob with CRLF "first line\r\nsame line\r\n" 
(which is checked out as
"first line\r\nsame line\r\n" 
in the working tree, and therefore clean)
and a blob with LF "first line\nsame line\n".

> 
> The former, when renormalized, should (we are discussing the
> expectation, not necessarily what is done by today's code) be
> processed like this:
> 
>  * Pretend as if blob "first line\r\nsame line\r\n" is in the stage
>#0 of the index at the path;
>  * Pretend as if the user said "git checkout" that path;
>  * Pretend as if the user then said "git add" that path.
> 
> The checkout process hopefully does not blindly turn LF into CRLF,
> making it "first line \r\r\nsame line\r\rn".  Instead the (virtual)
> working tree file will have "first line\r\nsame line\r\n".
Yes
> 
> Then "git add" should turn that CRLF into LF, which would give us
> "first line\nsame line\n", but because the "safer autocrlf" rule
> prevents it from happening.  The (real) index already has CR in it
> in the blob in stage #2, so the check-in process would (this is not
> describing the ideal, but what is done by today's code) disable
> CRLF->LF conversion.
> 
> Is that the problem you are trying to solve?
Not sure, if that problem isn't already solved.

> 
> If that is the case, I do not see how "don't use stage #2 of the
> real index; use the blob being renormalized instead" would help.
> The blob being renormalized may have CR in it, triggering that
> "safer autocrlf" rule and cause you the same trouble, no?
> 
> To me, it appears that if you consider that the "safer autocrlf" is
> a sensible thing, you _must_ expect that the renormalization would
> not work at all, in the scenario you presented.  Also,
> 
>> The expected result of the merge is "first line\nsame line\n".
> 
> if you expect this, to me, it appears that you need to reject the
> "safer autocrlf", at least during renormalization, as a sensible
> thing.  And if you _are_ to disable the "safer autocrlf" thing, then
> it does not matter what is currently in the index--the conversion
> can happen solely based on the data being converted and the
> configuration and attribute settings.
&

Re: [PATCH v4 0/6] worktree lock/unlock

2016-06-27 Thread Torsten Bögershausen
>On 03.06.16 14:19, Nguyễn Thái Ngọc Duy wrote:

Minor problem:
t2028 fails, when the test is run from a directory that is

a softlink.
(In my case 
/Users/tb/projects/git/git.pu
is a softlink to
/Users/tb/NoBackup/projects/git/git.pu/


[master (root-commit) 2519212] init
 Author: A U Thor 
 1 file changed, 1 insertion(+)
 create mode 100644 init.t
Preparing source (identifier source)
HEAD is now at 2519212 init
--- expected2016-06-27 09:50:19.0 +
+++ actual  2016-06-27 09:50:19.0 +
@@ -1,2 +1,2 @@
-worktree /Users/tb/projects/git/git.pu/t/trash directory.t2028-worktree-move
-worktree /Users/tb/projects/git/git.pu/t/trash 
directory.t2028-worktree-move/source
+worktree /Users/tb/NoBackup/projects/git/git.pu/t/trash 
directory.t2028-worktree-move
+worktree /Users/tb/NoBackup/projects/git/git.pu/t/trash 
directory.t2028-worktree-move/source
not ok 1 - setup
#

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)

2016-06-23 Thread Torsten Bögershausen



On 22/06/16 23:09, Joey Hess wrote:

Torsten Bögershausen wrote:

There is a conflict in pu:
"jh/clean-smudge-annex" does not work together with "tb/convert-peek-in-index"

(And currently pu didn't compile)

I'm sending a v4 of jh/clean-smudge-annex that is rebased on top of
tb/convert-peek-in-index to fix this.


(I will hopefully be able to do a separate review of the smudge/clean patch)

Would be appreciated. It'll be 2 weeks until I can work on this any more.


(And jo...@joeyh.name is not reachable from web.de)

I'd like to fix whatever's broken; you could send details out of band to
joeyh...@gmail.com


Currently there is one comment:
The (new) usage of assert() in sha1_file.c:
  assert(would_convert_to_git_filter_fd(path));

The thing is that we need to check the file system to find .gitatttibutes,
even if we just did it 1 nanosecond ago.

So the .gitattributes is done 3 times:
-1 would_convert_to_git_filter_fd(
-2 assert(would_convert_to_git_filter_fd(path));
-3 convert.c/convert_to_git_filter_fd()

The only situation where this could be useful is when the .gitattributes
change between -1 and -2,
but then they would have changed between -1 and -3, and convert.c
will die().

Does it make sense to remove -2 ?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t7800 readlink not found

2016-06-21 Thread Torsten Bögershausen

On 06/21/2016 08:39 PM, Junio C Hamano wrote:

Armin Kunaschik  writes:


On Tue, May 31, 2016 at 7:51 AM, Junio C Hamano  wrote:

Torsten Bögershausen  writes:


diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index 7ce4cd7..905035c 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -446,7 +446,7 @@ write_script .git/CHECK_SYMLINKS <<\EOF
   for f in file file2 sub/sub
   do
  echo "$f"
-readlink "$2/$f"
+ls -ld "$2/$f" | sed -e 's/.* -> //'
   done >actual
   EOF


I don't know how portable #ls -ld" really is.

The parts with mode bits, nlinks, uid, gid, size, and date part do
have some variations.  For example, we have been burned on ACL
enabled systems having some funny suffix after the usual mode bits
stuff.

However, as far as this test is concerned, I do not think "how
portable is the output from ls -ld" is an especially relevant
question.  None of the things we expect early in the output (the
fields I enumerated in the previous paragraph) would contain " -> ".
And we know that we do not use a filename that has " -> " (or "->")
as a substring in our tests.

We don't have to use readlink, even on platforms where we do have
readlink.  Building the conditional to be checked at runtime and
providing a shell function read_link that uses "ls -ld | sed" or
"readlink" depending on the runtime check is wasteful.

Just a short, curious question: Is this patch to be accepted/included some time?
I didn't see it in 2.8.4 nor 2.9.0. Maybe it just fell off the table...

Yes, I think this fell off the table as I was waiting for some kind
of agreement or counter-proposal, neither of which came and the
thread was forgotten.

Unless Torsten still has strong objections (or better yet, a better
implementation), I am inclined to queue it as-is.


I just double-checked the man pages for Mac OS and opengroup:
No better implementation from my side -> No objections

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)

2016-06-19 Thread Torsten Bögershausen


There is a conflict in pu:
"jh/clean-smudge-annex" does not work together with "tb/convert-peek-in-index"

(And currently pu didn't compile)

(I will hopefully be able to do a separate review of the smudge/clean patch)

(And jo...@joeyh.name is not reachable from web.de)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems installing GIT on my MAC OS X 10.11.5

2016-06-14 Thread Torsten Bögershausen
On 14.06.16 18:45, Maria Jose Fernandez wrote:
> From http://git-scm.com/download/mac I clicked to download manually.
> Then it goes to 
> https://sourceforge.net/projects/git-osx-installer/files/git-2.8.1-intel-universal-mavericks.dmg/download?use_mirror=autoselect
> I found the git - 2.8.1-intel-universal-mavericks.dmg 
> 
>  downloaded on my desktop. I open that and go through the installation 
> process and then it says is installed but not found anywhere on my computer. 
>
>
It says ?

Do you think that you open a terminal and type
which git
git --version
and post the output here ?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems installing GIT on my MAC OS X 10.11.5

2016-06-14 Thread Torsten Bögershausen
On 14.06.16 18:06, Konstantin Khomoutov wrote:
> On Tue, 14 Jun 2016 16:56:15 +0100
> Maria Jose Fernandez  wrote:
> 
>> I am doing a data science course and need to download GIT but for
>> some reason I can’t installed it. I called Apple but they couldn’t
>> help and suggested me to contact you guys.
> 
> So you proceeded to http://git-scm.com/download/mac fetched the
> installer, then found the downloaded file in the Finder, clicked it --
> and then what happened?
> --

Not sure, we need to verify first, what had happened.

Typically, if you install Xcode, there is a program
wrapper called git, saying that you need to install it
as part of Xcode.

You can do this.
You can go to http://git-scm.com/download/mac.
You can install git via homebrew, via fink or mac ports.

So it could be nice to know what exactly happened, please.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] dir: introduce file_size() to check the size of file

2016-06-12 Thread Torsten Bögershausen
>> So what I understand, you want something like this:
>>
>> +ssize_t file_size_not_zero(const char *filename)
>> +{
>> +   struct stat st;
>> +   if (stat(filename, &st) < 0)
>> +   return -1;
>> +   return !!st.st_size);
>> +}
> 
> For the purpose of bisect_reset(), Yes. BTW a similar function exist
> in builtin/am.c with the name is_empty_file(). But as Christian points
> out file_size() could help to refactor other parts of code.
> 

Please allow one or more late comments:
If is_empty_file() does what you need, then it can be moved into wrapper.c
and simply be re-used in your code.

If you want to introduce a new function, that can be used for other refactoring,
then the whole thing would ideally go into a single commit,
or into a single series.
That may probably be out of the scope for your current efforts ?

What really makes me concern is the mixture of signed - and unsigned:
ssize_t file_size(const char *filename)
+{
+   struct stat st;
+   if (stat(filename, &st) < 0)
+   return -1;
+   return xsize_t(st.st_size);
+}

To my understanding a file size is either 0, or a positive integer.
Returning -1 is of course impossible with a positive integer.

So either the function is changed like this:

int file_size(const char *filename, size_t *len)
+{
+   struct stat st;
+   if (stat(filename, &st) < 0)
+   return -1;
+   *len = xsize_t(st.st_size);
+   return 0;
+}

Or, if that works for you:

size_t file_size(const char *filename)
+{
+   struct stat st;
+   if (stat(filename, &st) < 0)
+   return 0;
+   return xsize_t(st.st_size);
+}

Or, more git-ish:

size_t file_size(const char *filename)
+{
+   struct stat st;
+   if (stat(filename, &st))
+   return 0;
+   return xsize_t(st.st_size);
+}

(And then builtin/am.c  can be changed to use the new function.

 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] dir: introduce file_size() to check the size of file

2016-06-08 Thread Torsten Bögershausen

On 06/08/2016 09:57 AM, Pranit Bauva wrote:

Hey Eric,

On Wed, Jun 8, 2016 at 1:07 PM, Eric Sunshine  wrote:

On Tue, Jun 7, 2016 at 4:54 PM, Pranit Bauva  wrote:

dir: introduce file_size() to check the size of file

At times we require to see if the file is empty and get the size of the
file. By using stat we can get the file size without actually having to
open the file to check for its contents.

The sole caller of this function in patch 4/4 does so only to check if
the file exists; it doesn't even care about the file's size, thus
neither this function nor this patch seem justified and probably ought
to be dropped unless some better and stronger justification can be
shown.

Umm, actually there is a subtle difference between file_exists() and
file_size(). file_exists() *only* checks whether the file exists or
not while file_size() can also be used to check whether the file is
empty or not also see the implementation of both of them which shows
the difference. In fact it doesn't care at all about the file size.
Now there are a lot of instances in shell scripts where there are
quite some differences with -f and -s and some places *do care* about
this subtle difference. For eg. in bisect_reset() we test whether the
file .git/BISECT_START has some contents in it. But I guess I can add
some more justification in the commit message. What do you think?


Signed-off-by: Pranit Bauva 
---
diff --git a/dir.c b/dir.c
@@ -2036,6 +2036,14 @@ int file_exists(const char *f)
+ssize_t file_size(const char *filename)
+{
+   struct stat st;
+   if (stat(filename, &st) < 0)
+   return -1;
+   return xsize_t(st.st_size);
+}
+
diff --git a/dir.h b/dir.h
@@ -248,6 +248,13 @@ extern void clear_exclude_list(struct exclude_list *el);
+/*
+ * Return the size of the file `filename`. It returns -1 if error
+ * occurred, 0 if file is empty and a positive number denoting the size
+ * of the file.
+ */
+extern ssize_t file_size(const char *);



So what I understand, you want something like this:

+ssize_t file_size_not_zero(const char *filename)
+{
+   struct stat st;
+   if (stat(filename, &st) < 0)
+   return -1;
+   return !!st.st_size);
+}



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #02; Mon, 6)

2016-06-07 Thread Torsten Bögershausen




* mh/connect (2016-06-06) 10 commits
- connect: [host:port] is legacy for ssh
- connect: move ssh command line preparation to a separate function
- connect: actively reject git:// urls with a user part
- connect: change the --diag-url output to separate user and host
- connect: make parse_connect_url() return the user part of the url as a 
separate value
- connect: group CONNECT_DIAG_URL handling code
- connect: make parse_connect_url() return separated host and port
- connect: re-derive a host:port string from the separate host and port 
variables
- connect: call get_host_and_port() earlier
- connect: document why we sometimes call get_port after get_host_and_port



Ok, folks, is everybody happy with this version?



I am.


* tb/convert-peek-in-index (2016-05-24) 2 commits
- convert: ce_compare_data() checks for a sha1 of a path
- read-cache: factor out get_sha1_from_index() helper
The motivation is rather iffy.


Will be re-rolled.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Minor Bug in Renaming Branches

2016-06-06 Thread Torsten Bögershausen

On 06/06/2016 09:35 PM, Stefan Beller wrote:

On Mon, Jun 6, 2016 at 12:17 PM, Torsten Bögershausen  wrote:


A limitation is introduced by Mac OS and Windows:
BRANCH/NAME and branch/name refer to the same object in the file
system.
As a workaround, you can pack the branch names:
git pack-refs --all


Once you packed a branch into the packed refs file, you can
create another loose branch of different capitalization,
which then 'hides' the packed ref?

That sounds error prone to me, as a seemingly unrelated branch
changed its value:

 git branch BRANCH 012345
 git pack-refs --all
 git branch branch BRANCH^
 git rev-parse BRANCH
 (I'd expect BRANCH^ as return)

(I don't have a windows machine for testing here, so that
is pure speculation)


Yes, another reason not to use branch and BRANCH in the same repo.
(You can test under Linux & vfat)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Minor Bug in Renaming Branches

2016-06-06 Thread Torsten Bögershausen
On 06.06.16 19:52, Samuel Lijin wrote:
> Hi,
> 
> Not quite sure where to submit bug reports about Git, this was the
> best I could find, so if there's a better place to do this, please let
> me know and I will.
> 
> The short of this issue is that on Mac and Windows, if a branch has a
> slash in its name, changing it from lowercase to uppercase requires
> diving into .git/refs/heads/ and manually moving stuff around - I
> think the behavior should be at least something like this:
> http://stackoverflow.com/questions/26810252/how-to-change-my-local-github-branch-name-to-uppercase.
> 
> It happens that on both Windows box, to rename a branch from
> "branch/name" to "BRANCH/NAME", one has to actually dive into
> .git/refs/heads/ because of how Windows handles upper/lowercase in
> directory paths. (Win7, mingw64 via Git for Bash, git version
> 2.7.0.windows.1)
> user@windows-box MINGW64 ~
> $ mkdir sandbox && cd sandbox && git init
> Initialized empty Git repository in C:/Users/user/sandbox/.git/
> user@windows-box MINGW64 ~/sandbox (master)
> $ touch empty.txt && git add empty.txt && git commit -m 'initialize repo'
> [master (root-commit) 761113d] initialize repo
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  create mode 100644 empty.txt
> user@windows-box MINGW64 ~/sandbox (master)
> $ git branch
> * master
> user@windows-box MINGW64 ~/sandbox (master)
> $ git checkout -b branch/name
> Switched to a new branch 'branch/name'
> user@windows-box MINGW64 ~/sandbox (branch/name)
> $ git branch
> * branch/name
>   master
> user@windows-box MINGW64 ~/sandbox (branch/name)
> $ git branch -m BRANCH/NAME
> fatal: A branch named 'BRANCH/NAME' already exists.
> user@windows-box MINGW64 ~/sandbox (branch/name)
> $ git branch -m tmp
> user@windows-box MINGW64 ~/sandbox (tmp)
> $ git branch -m BRANCH/NAME
> user@windows-box MINGW64 ~/sandbox (BRANCH/NAME)
> $ git branch
>   branch/NAME
>   master
> user@windows-box MINGW64 ~/sandbox (BRANCH/NAME)
> $ mv .git/refs/heads/branch/ .git/refs/heads/BRANCH/
> user@windows-box MINGW64 ~/sandbox (BRANCH/NAME)
> $ git branch
> * BRANCH/NAME
>   master
> 
> Interestingly, from inside an Ubuntu VM (with the directory in
> question mounted as a VBox fileshare), this is not an issue.
> 
> A colleague on a Mac also reproduces the issue successfully (OSX
> 10.11.5, git 2.8.3):
> 
> mac-box:sandbox user$ touch empty.txt && git add empty.txt && git
> commit -m 'initial commit'
> [master (root-commit) 1f4f1fa] initial commit
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  create mode 100644 empty.txt
> mac-box:sandbox user$ git branch
> * master
> mac-box:sandbox user$ git checkout -b branch/name
> Switched to a new branch 'branch/name'
> mac-box:sandbox user$ git branch -m BRANCH/NAME
> fatal: A branch named 'BRANCH/NAME' already exists.
> mac-box:sandbox user$ git branch
> * branch/name
>   master
> mac-box:sandbox user$ git branch -m tmp
> mac-box:sandbox user$ git branch
>   master
> * tmp
> mac-box:sandbox user$ git branch -m BRANCH/NAME
> mac-box:sandbox user$ git branch
>   branch/NAME
>   master
> mac-box:sandbox user$ mv .git/refs/heads/branch/ .git/refs/heads/BRANCH/
> mac-box:sandbox user$ git branch
> * BRANCH/NAME
>   master
> 
> Thanks,
> Sam
A limitation is introduced by Mac OS and Windows:
BRANCH/NAME and branch/name refer to the same object in the file
system.
As a workaround, you can pack the branch names:
git pack-refs --all
or, 
that is my preference, avoid using the same name upper- and
lower-case.



> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cherry-pick: allow to pick to unborn branches

2016-06-06 Thread Torsten Bögershausen
On 06.06.16 15:23, Michael J Gruber wrote:
> Currently, cherry-pick allows tp pick single commits to an empty HEAD
Typo:   ^^

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #01; Thu, 2)

2016-06-05 Thread Torsten Bögershausen
On 04.06.16 18:24, Junio C Hamano wrote:
> Can we have a final submission to be queued? 
Yes

Thanks for the improvements and the discussions.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #01; Thu, 2)

2016-06-04 Thread Torsten Bögershausen
On 2016-06-04 07.14, Mike Hommey wrote:
> On Fri, Jun 03, 2016 at 04:47:33PM -0700, Junio C Hamano wrote:
>> Mike Hommey  writes:
>>
>>> In fact, the parser doesn't even reject the one that is considered
>>> invalid (the first).
>>
>> My question was what the desired behaviour is, and if your "fix"
>> gives us that desired outcome.
> 
> From my POV, the desired outcome from this patch series is that there is
> no change of behavior, and Torsten's fix makes
> git://[example.com:123]:/path/to/repo urls handled the same before and
> after the patch series.
> 
> Whether that's the desired behavior is another topic, that we can,
> IMHO, leave for later.
This is the old behavior:

 GIT_TRACE=2  git clone -v git://[github.com:9418]:/tboegi/emacs.d.git $$
16:32:28.692918 git.c:350   trace: built-in: git 'clone' '-v'
'git://[github.com:9418]:/tboegi/emacs.d.git' '95214'
Cloning into '95214'...
Looking up github.com:9418 ... fatal: Unable to look up github.com:9418 (port
9418) (nodename nor servname provided, or not known)
# Everything inside [] goes into the host part (good),
# At the same time :9418 is printed as the port (weird)

This is the new behavior:
GIT_TRACE=2  git clone -v git://[github.com:9418]:/tboegi/emacs.d.git $$
Connecting to github.com (port 9418) ... 192.30.252.122 done.
or
GIT_TRACE=2  git clone -v git://[github.com:9418]/tboegi/emacs.d.git $$
Connecting to github.com (port 9418) ... 192.30.252.122 done.

In any case, everyting inside the [] should go into the host part,
for all protocols that have a scheme.
But.
There is a little but.
the ssh protocol did, in early days, not parse
ssh://user@[::1]/path/to/repo correctly :-(
The only workaround to use a user name with hostname in [] was
to use
ssh://[user@::1]/path/to/repo

The thing I'm asking is, if we should handle

git://[github.com:9418]/tboegi/emacs.d.git
or
git://[github.com:9418]:/tboegi/emacs.d.git
the same as
git://github.com:9418/tboegi/emacs.d.git


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1308: do not get fooled by symbolic links to the source tree

2016-06-03 Thread Torsten Bögershausen

On 06/03/2016 08:53 AM, Johannes Sixt wrote:

Am 03.06.2016 um 08:10 schrieb Jeff King:

On Fri, Jun 03, 2016 at 08:05:56AM +0200, Johannes Sixt wrote:


-name=$(pwd)/.gitconfig
+name=$HOME/.gitconfig


I haven't tested this, yet, but my guess is that this breaks on 
Windows:

test-config will produce C:/foo style path, but the updated test would
expect /c/foo style path. Dscho, do you have an idea how to fix this?


Hmm. This should come directly from expand_user_path("~/.gitconfig")
which prepends the literal contents of the $HOME variable. It does go
through convert_slashes() afterwards, but I don't see any other
massaging (but I won't be surprised when you tell me there is some that
happens behind the scenes).


Yes, it happens behind the scenes: /c/foo absolute paths are a 
convention used by the POSIX emulation layer (MSYS). When bash (an 
MSYS program) runs a non-MSYS program such as git or test-config, it 
converts the /c/foo paths in the environment (and argument list) to 
c:/foo style because the non-MSYS programs do not understand the MSYS 
convention.


-- Hannes

Compiling pu didn't succed:
unix_stream_connect is missing in read_cache.c

(And many more in index-heloer.c)

(I thought that the index-helper is only compiled on systems,
  which are known to have unix-sockets and other stuff ?)

After patching that out,  t1308 fails:
-name=/c/
+name=c:/


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Jun 2016, #01; Thu, 2)

2016-06-02 Thread Torsten Bögershausen

On 06/03/2016 01:13 AM, Mike Hommey wrote:

On Thu, Jun 02, 2016 at 03:52:41PM -0700, Junio C Hamano wrote:

* mh/connect (2016-06-01) 9 commits
  - connect: move ssh command line preparation to a separate function
  - connect: actively reject git:// urls with a user part
  - connect: change the --diag-url output to separate user and host
  - connect: make parse_connect_url() return the user part of the url as a 
separate value
  - connect: group CONNECT_DIAG_URL handling code
  - connect: make parse_connect_url() return separated host and port
  - connect: re-derive a host:port string from the separate host and port 
variables
  - connect: call get_host_and_port() earlier
  - connect: document why we sometimes call get_port after get_host_and_port

  Needs review.

I /think/ Torsten reviewed it all, and his last comments are in
$gmane/295800. It's still not clear to me why he wants to remove the
comment about [].

There where 2 comments in the review.
The most important thing is that now
git://[example.com:123]/path/to/repo is valid, but it shouldn't.
This patch fixes it:

@@ -673,7 +669,7 @@ static enum protocol parse_connect_url(const char 
*url_orig, char **ret_user,
 * "host:port" and NULL.
 * To support this undocumented legacy we still need to split the port.
 */
-   if (!port)
+   if (!port && protocol == PROTO_SSH)


The other thing is that I asked for a test case for
git://[example.com:123]/path/to/repo
which shouldn't be hard to do.

If nobody else things that this comment in the code is stale:
-   /*
-* Don't do destructive transforms as protocol code does
-* '[]' unwrapping in get_host_and_port()
-*/

then just leave it as it is.
Thanks
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (May 2016, #09; Tue, 31) t1308 broken

2016-06-02 Thread Torsten Bögershausen
It seams as  ./t1308-config-set.sh is broken,
when the the directory is a soft link:

-name=/home/tb/NoBackup/projects/git/git.pu/t/trash
directory.t1308-config-set/.gitconfig
+name=/home/tb/projects/git/git.pu/t/trash directory.t1308-config-set/.gitconfig
 scope=global

 key=foo.bar
not ok 28 - iteration shows correct origins

I havent't digged further, too many conflicts in the config code, may be
somebody knows it directly ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t7800 readlink not found

2016-05-30 Thread Torsten Bögershausen

On 05/31/2016 02:26 AM, Armin Kunaschik wrote:

On 05/27/2016 06:19 AM, David Aguilar wrote:

On Wed, May 25, 2016 at 11:33:33AM +0200, Armin Kunaschik wrote:

Would you mind submitting a patch so that we can support these
tests when running on AIX/HP-UX?

I don't feel comfortable to submit patches for tests I can't verify. I
don't have valgrind and python/p4 here. Looking to the code I'd say,
patching the p4 tests with "ls -ld | sed" looks quite save.
But I'm not sure about the test-lib.sh. When you are really super
paranoid, as written in the comment, you should probably use perl like

perl -e 'print readlink $ARGV[0]' $name

as a replacement.

So, as suggested by Junio, here the readlink workaround for t7800 only.
(hopefully whitespace clean this time)

--- 8< --- 8< ---
From: Armin Kunaschik 
Subject: t7800: readlink is not portable

The readlink(1) command is not available on all platforms (notably not
on AIX and HP-UX) and can be replaced in this test with the "workaround"

ls -ld  | sed -e 's/.* -> //'

This is no universal readlink replacement but works in the controlled
test environment good enough.

Signed-off-by: Armin Kunaschik 
---

diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index 7ce4cd7..905035c 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -446,7 +446,7 @@ write_script .git/CHECK_SYMLINKS <<\EOF
  for f in file file2 sub/sub
  do
echo "$f"
-   readlink "$2/$f"
+   ls -ld "$2/$f" | sed -e 's/.* -> //'
  done >actual
  EOF


I don't know how portable #ls -ld" really is.
If there is one platform, that doesn't support readlink, would it
make sense to implement readlink() in test-lib.sh,
similar to what we have for MINGW, e.g. sort() or find() ?
And keep t7800 as it is ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/9] connect: various cleanups

2016-05-28 Thread Torsten Bögershausen
On 28.05.16 07:33, Mike Hommey wrote:
> On Sat, May 28, 2016 at 07:02:01AM +0200, Torsten Bögershausen wrote:
>> On 27.05.16 23:59, Mike Hommey wrote:
>>> On Fri, May 27, 2016 at 04:24:20PM +0200, Torsten Bögershausen wrote:
>>>> On 27.05.16 04:27, Mike Hommey wrote:
>>>>> Changes from v7:
>>>>> - Fixed comments.
>>>>>
>>>>> Mike Hommey (9):
>> All in all, a great improvement.
>> 2 things left.
>> - The comment about [] is now stale, isn't it ?
> No, it's still valid at this point, that's what the 2nd argument to
> host_end (0) does.
>
> Mike

The protocol (specific) code doesn't do this anymore, 
because that is now common to all protocols.


   /*
* Don't do destructive transforms now, the
* '[]' unwrapping is done in get_host_and_port()
*/

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/9] connect: various cleanups

2016-05-27 Thread Torsten Bögershausen
On 27.05.16 23:59, Mike Hommey wrote:
> On Fri, May 27, 2016 at 04:24:20PM +0200, Torsten Bögershausen wrote:
>> On 27.05.16 04:27, Mike Hommey wrote:
>>> Changes from v7:
>>> - Fixed comments.
>>>
>>> Mike Hommey (9):
All in all, a great improvement.
2 things left.
- The comment about [] is now stale, isn't it ?
- The legacy support should only be active for ssh, and not
  be used by other schemes.



diff --git a/connect.c b/connect.c
index 076ae09..79b8dae 100644
--- a/connect.c
+++ b/connect.c
@@ -618,10 +618,6 @@ static enum protocol parse_connect_url(const char 
*url_orig, char **ret_user,
}
}
 
-   /*
-* Don't do destructive transforms as protocol code does
-* '[]' unwrapping in get_host_and_port()
-*/
end = host_end(&host, 0);
 
if (protocol == PROTO_LOCAL)
@@ -673,7 +669,7 @@ static enum protocol parse_connect_url(const char 
*url_orig, char **ret_user,
 * "host:port" and NULL.
 * To support this undocumented legacy we still need to split the port.
 */
-   if (!port)
+   if (!port && protocol == PROTO_SSH)
port = get_port(host);
 
*ret_user = user ? xstrdup(user) : NULL;

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/9] connect: various cleanups

2016-05-27 Thread Torsten Bögershausen
On 27.05.16 04:27, Mike Hommey wrote:
> Changes from v7:
> - Fixed comments.
> 
> Mike Hommey (9):
>   connect: document why we sometimes call get_port after
> get_host_and_port
>   connect: call get_host_and_port() earlier
>   connect: re-derive a host:port string from the separate host and port
> variables
>   connect: make parse_connect_url() return separated host and port
>   connect: group CONNECT_DIAG_URL handling code
>   connect: make parse_connect_url() return the user part of the url as a
> separate value
>   connect: change the --diag-url output to separate user and host
>   connect: actively reject git:// urls with a user part
>   connect: move ssh command line preparation to a separate function
> 
>  connect.c | 235 
> +-
>  t/t5500-fetch-pack.sh |  42 ++---
>  2 files changed, 170 insertions(+), 107 deletions(-)
> 
Is the whole series somewhere on a public repo ?
No major remarks so far, if possible, I would like
to have a look at the resulting connect.c

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

2016-05-25 Thread Torsten Bögershausen

On 05/26/2016 01:34 AM, Mike Hommey wrote:

On Tue, May 24, 2016 at 06:44:26AM +0200, Torsten Bögershausen wrote:

On 05/23/2016 11:30 PM, Junio C Hamano wrote:

Torsten Bögershausen  writes:


get_host_and_port(&ssh_host, &port);
+   /* get_host_and_port may not return a port
even when
+* there is one: In the [host:port]:path case,
+* get_host_and_port is called with "[host:port]" and
+* returns "host:port" and NULL.
+* In that specific case, we still need to split the
+* port. */

Is it worth to mention that this case is "still supported legacy" ?

If it's worth mentioning anywhere, it seems to me it would start with
urls.txt?

Mike


I don't know.
urls.txt is for Git users, and connect.c is for Git developers.
urls.txt does not mention that Git follows any RFC when parsing the
URLS', it doesn't claim to be 100% compliant.
Even if it makes sense to do so, as many user simply expect Git to accept
RFC compliant URL's, and it makes the development easier, if there is
an already
written specification, that describes all the details.
The parser is not 100% RFC compliant, one example:
- old-style usgage like "git clone [host:222]:~/path/to/repo are supported

Is it an option to fix get_host_and_port() so that it returns what
the caller expects even when it is given "[host:port]"?  When the
caller passes other forms like "host:port", it expects to get "host"
and "port" parsed out into two variables.  Why can't the caller
expect to see the same happen when feeding "[host:port]" to the
function?


This is somewhat out of my head:
git clone   git://[example.com:123]:/test#illegal, malformated URL
git clone   [example.com:123]:/test   #scp-like URL, legacy
git clone   ssh://[example.com:123]:/test   #illegal, but supported as
legacy, because
git clone  ssh://[user@::1]/test   # was the only way to
specify a user name at a literal IPv6 address

May be we should have some  more test cases for malformated git:// URLs?


None of these malformed urls are rejected with or without my series
applied:

Without:
$ git fetch-pack --diag-url git://[example.com:123]:/test
Diag: url=git://[example.com:123]:/test
Diag: protocol=git
Diag: hostandport=[example.com:123]:
Diag: path=/test
$ git fetch-pack --diag-url
ssh://[example.com:123]:/test
Diag: url=ssh://[example.com:123]:/test
Diag: protocol=ssh
Diag: userandhost=example.com
Diag: port=123
Diag: path=/test

With:
$ git fetch-pack --diag-url git://[example.com:123]:/test
Diag: url=git://[example.com:123]:/test
Diag: protocol=git
Diag: user=NULL
Diag: host=example.com
Diag: port=123
Diag: path=/test
$ git fetch-pack --diag-url ssh://[example.com:123]:/test
Diag: url=ssh://[example.com:123]:/test
Diag: protocol=ssh
Diag: user=NULL
Diag: host=example.com
Diag: port=123
Diag: path=/test

Note in the first case, hostandport is "[example.com:123]:", and that
is treated as host=example.com:123 and port=NULL further down, so my
series is changing something here, but arguably, it makes it less worse.
(note that both with and without my series,
"git://[example.com:123]:42/path" is treated the same, so only a corner
case changed)

Can we go forward with the current series (modulo the comment style
thing Eric noted, and maybe adding a note about the parser handling urls
as per urls.txt), and not bloat scope it? If anything, the state of the
code after the series should make further parser changes easier.

Cheers,

Mike




Thanks for digging.

How about something like this:

/*
 * get_host_and_port may not return a port in the [host:port]:path case.
 * To support this undocumented legacy we still need to split the port.
*/

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] can not escape sharps in git config file

2016-05-24 Thread Torsten Bögershausen

On 05/24/2016 01:23 PM, Jean-Noël Avila wrote:

My mistake, sorry for the noise,

JFTR:

  * only double quotes can fully escape a string (it is safer to enclose
the whole value in double quotes)
  * backslashes have to be doubled because they are interpreted by git

So

[filter "kicad_sch"]
 clean = "sed -E 's/#(PWR|FLG)[0-9]+/#\\1?/'"
 smudge = cat


(I'm not a filter expert)
What happens if you put the sed expression into a shell script, and
configure the filter to call the script instead ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd Difference Between Windows Git and Standard Git

2016-05-24 Thread Torsten Bögershausen

On 05/24/2016 01:57 PM, Johannes Schindelin wrote:

Hi,

On Tue, 24 May 2016, Torsten Bögershausen wrote:


On 05/23/2016 08:52 PM, Junio C Hamano wrote:

Johannes Schindelin  writes:


Of course, if you are doing network mount between systems with and
without filemode support, the result would depend on where you did
the "git init", so that would not help.

Which means that other probed things like symlink support and case
sensitivity are likely to be wrong in the .git/config that the
user may want to fix.

What we could do is to make the default config setting
platform-dependent, a la CRLF_NATIVE.

I imagine that we would want this for core.filemode, core.ignorecase and
core.symlinks.

What do you think?

The reason why we probe for filemode, icase, etc. at repository
creation time and record the result in the configuration is because we
do not to want to do the auto-probing at runtime, every time we run
any Git command.

Right, I missed this of course. My idea was to have saner defaults *iff
the config variables are not set explicitly*. But they *are* set, of
course. Just not in a way that makes sense when the very same working
directory is accessed from different Operating Systems.


if core.filemode is true, Git for Windows could:
a) Behave as today, report changed files (filemode)
b) Give warning to the user (and report changed filemode)
c) Error out, saying misconfigured worktree
d) use core.filemode = false anyway.
e) Give a warning and use core.filemode = false anyway.

At the moment I tend for c), as it makes it clear what is going wrong,
what do you think ?

The problem with that is that we would need to probe again.

The probing for the filemode:
Wouldn't it be enough to run lstat() on .git/ ?
If the user-execuatable bit is not set, but core.filemode is true, error 
out ?

That would not cost too much.

  Or dictate for
all eternity that Git for Windows cannot determine the executable bit (but
who knows for certain?)
Can we can limit the eternity until the day when Windows can determine 
the executable

bit ?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd Difference Between Windows Git and Standard Git

2016-05-23 Thread Torsten Bögershausen

On 05/23/2016 08:52 PM, Junio C Hamano wrote:

Johannes Schindelin  writes:


Of course, if you are doing network mount between systems with and
without filemode support, the result would depend on where you did
the "git init", so that would not help.

Which means that other probed things like symlink support and case
sensitivity are likely to be wrong in the .git/config that the user
may want to fix.

What we could do is to make the default config setting platform-dependent,
a la CRLF_NATIVE.

I imagine that we would want this for core.filemode, core.ignorecase and
core.symlinks.

What do you think?

The reason why we probe for filemode, icase, etc. at repository
creation time and record the result in the configuration is because
we do not to want to do the auto-probing at runtime, every time we
run any Git command.  You may be able to say "On this platform, no
matter what filesystem is in use, you will always get icase and you
will never have executable bit", and a build on such a platform can
hardcode these three values.  But on other platforms these may be
per-filesystem properties, and their binaries would not be able to
hardcode the choices, which would mean we would record these three
in .git/config on these platforms when a repository is created.

Git built for Windows may have core.filemode=false as "the default
config setting platform-dependent, a la CRLF_NATIVE"; how would that
interact with a configured core.filemode value in .git/config?

if core.filemode is true, Git for Windows could:
a) Behave as today, report changed files (filemode)
b) Give warning to the user (and report changed filemode)
c) Error out, saying misconfigured worktree
d) use core.filemode = false anyway.
e) Give a warning and use core.filemode = false anyway.

At the moment I tend for c), as it makes it clear what is going wrong,
what do you think ?

[skip ignorecase for a second]

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

2016-05-23 Thread Torsten Bögershausen

On 05/23/2016 11:30 PM, Junio C Hamano wrote:

Torsten Bögershausen  writes:


get_host_and_port(&ssh_host, &port);
   +/* get_host_and_port may not return a port
even when
+* there is one: In the [host:port]:path case,
+* get_host_and_port is called with "[host:port]" and
+* returns "host:port" and NULL.
+* In that specific case, we still need to split the
+* port. */

Is it worth to mention that this case is "still supported legacy" ?

If it's worth mentioning anywhere, it seems to me it would start with
urls.txt?

Mike


I don't know.
urls.txt is for Git users, and connect.c is for Git developers.
urls.txt does not mention that Git follows any RFC when parsing the
URLS', it doesn't claim to be 100% compliant.
Even if it makes sense to do so, as many user simply expect Git to accept
RFC compliant URL's, and it makes the development easier, if there is
an already
written specification, that describes all the details.
The parser is not 100% RFC compliant, one example:
- old-style usgage like "git clone [host:222]:~/path/to/repo are supported

Is it an option to fix get_host_and_port() so that it returns what
the caller expects even when it is given "[host:port]"?  When the
caller passes other forms like "host:port", it expects to get "host"
and "port" parsed out into two variables.  Why can't the caller
expect to see the same happen when feeding "[host:port]" to the
function?


This is somewhat out of my head:
git clone   git://[example.com:123]:/test#illegal, malformated URL
git clone   [example.com:123]:/test   #scp-like URL, legacy
git clone   ssh://[example.com:123]:/test   #illegal, but supported 
as legacy, because
git clone  ssh://[user@::1]/test   # was the only 
way to specify a user name at a literal IPv6 address


May be we should have some  more test cases for malformated git:// URLs?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port

2016-05-22 Thread Torsten Bögershausen

On 05/22/2016 10:03 AM, Mike Hommey wrote:

On Sun, May 22, 2016 at 08:07:05AM +0200, Torsten Bögershausen wrote:

On 22.05.16 01:17, Mike Hommey wrote:

Signed-off-by: Mike Hommey 
---
  connect.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/connect.c b/connect.c
index c53f3f1..caa2a3c 100644
--- a/connect.c
+++ b/connect.c
@@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char 
*url,
transport_check_allowed("ssh");
get_host_and_port(&ssh_host, &port);
  
+			/* get_host_and_port may not return a port even when

+* there is one: In the [host:port]:path case,
+* get_host_and_port is called with "[host:port]" and
+* returns "host:port" and NULL.
+* In that specific case, we still need to split the
+* port. */

Is it worth to mention that this case is "still supported legacy" ?

If it's worth mentioning anywhere, it seems to me it would start with
urls.txt?

Mike


I don't know.
urls.txt is for Git users, and connect.c is for Git developers.
urls.txt does not mention that Git follows any RFC when parsing the
URLS', it doesn't claim to be 100% compliant.
Even if it makes sense to do so, as many user simply expect Git to accept
RFC compliant URL's, and it makes the development easier, if there is an 
already

written specification, that describes all the details.
The parser is not 100% RFC compliant, one example:
- old-style usgage like "git clone [host:222]:~/path/to/repo are supported
To easy the live for developers, it could make sense why the code is as 
it is,

simply to avoid that people around the world suddenly run into problems,
when they upgrade the Git version.
So if there is a comment in the code, it could help new developers to 
understand

things easier.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3   4   5   6   7   8   9   10   >