Re: Exact format of tree objets

2013-06-18 Thread Thomas Rast
Chico Sokol  writes:

> What is the encoding of the filename?

Git just considers filename a bunch of bytes that form a posix filename
(i.e., may not contain '/' and '\0').  So depending on your point of
view, it's either "no encoding" or "whatever you put into it".

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exact format of tree objets

2013-06-18 Thread Chico Sokol
What is the encoding of the filename?


--
Chico Sokol


On Tue, Jun 11, 2013 at 3:26 PM, Ilari Liusvaara
 wrote:
> On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote:
>> Is there any official documentation of tree objets format? Are tree
>> objects encoded specially in some way? How can I parse the inflated
>> contents of a tree object?
>
> Tree object consists of entries, each concatenation of:
> - Octal mode (using ASCII digits 0-7).
> - Single SPACE (0x20)
> - Filename
> - Single NUL (0x00)
> - 20-byte binary SHA-1 of referenced object.
>
> At least following octal modes are known:
> 4: Directory (tree).
> 100644: Regular file (blob).
> 100755: Executable file (blob).
> 12: Symbolic link (blob).
> 16: Submodule (commit).
>
> The entries are always sorted in (bytewise) lexicographical order,
> except directories sort like there was impiled '/' at the end.
>
> So e.g.:
> ! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z.
>
>
> The idea of sorting directories specially is that if one recurses
> upon hitting a directory and uses '/' as path separator, then the
> full filenames are in bytewise lexicographical order.
>
> -Ilari
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exact format of tree objets

2013-06-18 Thread Chico Sokol
Thanks!

By the way, where can I find this kind of specification? I couldn't
find the spec of tree objects here:
https://github.com/git/git/tree/master/Documentation


--
Chico Sokol


On Wed, Jun 12, 2013 at 11:06 AM, Jakub Narebski  wrote:
> Junio C Hamano  pobox.com> writes:
>> Chico Sokol  gmail.com> writes:
>>
>> > Is there any official documentation of tree objets format? Are tree
>> > objects encoded specially in some way? How can I parse the inflated
>> > contents of a tree object?
>> >
>> > We're suspecting that there is some kind of special format or
>> > encoding, because the command "git cat-file -p " show me ...
>> > While "git cat-file tree " generate ...
>>
>> "cat-file -p" is meant to be human-readable form.  The latter gives
>> the exact byte contents read_sha1_file() sees, which is a binary
>> format.  Essentially, it is a sequence of:
>>
>>  - mode of the entry encoded in octal, without any leading '0' pad;
>>  - pathname component of the entry, terminated with NUL;
>>  - 20-byte SHA-1 object name.
>
> I always wondered why this is the sole object format where SHA-1 is in 20-
> byte binary format and not 40-chars hexadecimal string format...
>
> --
> Jakub Narębski
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exact format of tree objets

2013-06-12 Thread Jakub Narebski
Junio C Hamano  pobox.com> writes:
> Chico Sokol  gmail.com> writes:
> 
> > Is there any official documentation of tree objets format? Are tree
> > objects encoded specially in some way? How can I parse the inflated
> > contents of a tree object?
> >
> > We're suspecting that there is some kind of special format or
> > encoding, because the command "git cat-file -p " show me ...
> > While "git cat-file tree " generate ...
> 
> "cat-file -p" is meant to be human-readable form.  The latter gives
> the exact byte contents read_sha1_file() sees, which is a binary
> format.  Essentially, it is a sequence of:
> 
>  - mode of the entry encoded in octal, without any leading '0' pad;
>  - pathname component of the entry, terminated with NUL;
>  - 20-byte SHA-1 object name.

I always wondered why this is the sole object format where SHA-1 is in 20-
byte binary format and not 40-chars hexadecimal string format...

-- 
Jakub Narębski




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exact format of tree objets

2013-06-11 Thread Junio C Hamano
Chico Sokol  writes:

> Is there any official documentation of tree objets format? Are tree
> objects encoded specially in some way? How can I parse the inflated
> contents of a tree object?
>
> We're suspecting that there is some kind of special format or
> encoding, because the command "git cat-file -p " show me ...
> While "git cat-file tree " generate ...

"cat-file -p" is meant to be human-readable form.  The latter gives
the exact byte contents read_sha1_file() sees, which is a binary
format.  Essentially, it is a sequence of:

 - mode of the entry encoded in octal, without any leading '0' pad;
 - pathname component of the entry, terminated with NUL;
 - 20-byte SHA-1 object name.

sorted in a particular order.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exact format of tree objets

2013-06-11 Thread Ilari Liusvaara
On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote:
> Is there any official documentation of tree objets format? Are tree
> objects encoded specially in some way? How can I parse the inflated
> contents of a tree object?

Tree object consists of entries, each concatenation of:
- Octal mode (using ASCII digits 0-7).
- Single SPACE (0x20)
- Filename
- Single NUL (0x00)
- 20-byte binary SHA-1 of referenced object.

At least following octal modes are known:
4: Directory (tree).
100644: Regular file (blob).
100755: Executable file (blob).
12: Symbolic link (blob).
16: Submodule (commit).

The entries are always sorted in (bytewise) lexicographical order,
except directories sort like there was impiled '/' at the end.

So e.g.:
! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z.


The idea of sorting directories specially is that if one recurses
upon hitting a directory and uses '/' as path separator, then the
full filenames are in bytewise lexicographical order.

-Ilari
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html