Re: Exact format of tree objets
Chico Sokol writes: > What is the encoding of the filename? Git just considers filename a bunch of bytes that form a posix filename (i.e., may not contain '/' and '\0'). So depending on your point of view, it's either "no encoding" or "whatever you put into it". -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
What is the encoding of the filename? -- Chico Sokol On Tue, Jun 11, 2013 at 3:26 PM, Ilari Liusvaara wrote: > On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote: >> Is there any official documentation of tree objets format? Are tree >> objects encoded specially in some way? How can I parse the inflated >> contents of a tree object? > > Tree object consists of entries, each concatenation of: > - Octal mode (using ASCII digits 0-7). > - Single SPACE (0x20) > - Filename > - Single NUL (0x00) > - 20-byte binary SHA-1 of referenced object. > > At least following octal modes are known: > 4: Directory (tree). > 100644: Regular file (blob). > 100755: Executable file (blob). > 12: Symbolic link (blob). > 16: Submodule (commit). > > The entries are always sorted in (bytewise) lexicographical order, > except directories sort like there was impiled '/' at the end. > > So e.g.: > ! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z. > > > The idea of sorting directories specially is that if one recurses > upon hitting a directory and uses '/' as path separator, then the > full filenames are in bytewise lexicographical order. > > -Ilari -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
Thanks! By the way, where can I find this kind of specification? I couldn't find the spec of tree objects here: https://github.com/git/git/tree/master/Documentation -- Chico Sokol On Wed, Jun 12, 2013 at 11:06 AM, Jakub Narebski wrote: > Junio C Hamano pobox.com> writes: >> Chico Sokol gmail.com> writes: >> >> > Is there any official documentation of tree objets format? Are tree >> > objects encoded specially in some way? How can I parse the inflated >> > contents of a tree object? >> > >> > We're suspecting that there is some kind of special format or >> > encoding, because the command "git cat-file -p " show me ... >> > While "git cat-file tree " generate ... >> >> "cat-file -p" is meant to be human-readable form. The latter gives >> the exact byte contents read_sha1_file() sees, which is a binary >> format. Essentially, it is a sequence of: >> >> - mode of the entry encoded in octal, without any leading '0' pad; >> - pathname component of the entry, terminated with NUL; >> - 20-byte SHA-1 object name. > > I always wondered why this is the sole object format where SHA-1 is in 20- > byte binary format and not 40-chars hexadecimal string format... > > -- > Jakub Narębski > > > > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
Junio C Hamano pobox.com> writes: > Chico Sokol gmail.com> writes: > > > Is there any official documentation of tree objets format? Are tree > > objects encoded specially in some way? How can I parse the inflated > > contents of a tree object? > > > > We're suspecting that there is some kind of special format or > > encoding, because the command "git cat-file -p " show me ... > > While "git cat-file tree " generate ... > > "cat-file -p" is meant to be human-readable form. The latter gives > the exact byte contents read_sha1_file() sees, which is a binary > format. Essentially, it is a sequence of: > > - mode of the entry encoded in octal, without any leading '0' pad; > - pathname component of the entry, terminated with NUL; > - 20-byte SHA-1 object name. I always wondered why this is the sole object format where SHA-1 is in 20- byte binary format and not 40-chars hexadecimal string format... -- Jakub Narębski -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
Chico Sokol writes: > Is there any official documentation of tree objets format? Are tree > objects encoded specially in some way? How can I parse the inflated > contents of a tree object? > > We're suspecting that there is some kind of special format or > encoding, because the command "git cat-file -p " show me ... > While "git cat-file tree " generate ... "cat-file -p" is meant to be human-readable form. The latter gives the exact byte contents read_sha1_file() sees, which is a binary format. Essentially, it is a sequence of: - mode of the entry encoded in octal, without any leading '0' pad; - pathname component of the entry, terminated with NUL; - 20-byte SHA-1 object name. sorted in a particular order. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Exact format of tree objets
On Tue, Jun 11, 2013 at 01:25:14PM -0300, Chico Sokol wrote: > Is there any official documentation of tree objets format? Are tree > objects encoded specially in some way? How can I parse the inflated > contents of a tree object? Tree object consists of entries, each concatenation of: - Octal mode (using ASCII digits 0-7). - Single SPACE (0x20) - Filename - Single NUL (0x00) - 20-byte binary SHA-1 of referenced object. At least following octal modes are known: 4: Directory (tree). 100644: Regular file (blob). 100755: Executable file (blob). 12: Symbolic link (blob). 16: Submodule (commit). The entries are always sorted in (bytewise) lexicographical order, except directories sort like there was impiled '/' at the end. So e.g.: ! < 0 < 9 < a < a- < a- (directory) < a (directory) < a0 < ab < b < z. The idea of sorting directories specially is that if one recurses upon hitting a directory and uses '/' as path separator, then the full filenames are in bytewise lexicographical order. -Ilari -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Exact format of tree objets
Is there any official documentation of tree objets format? Are tree objects encoded specially in some way? How can I parse the inflated contents of a tree object? We're suspecting that there is some kind of special format or encoding, because the command "git cat-file -p " show me the expected output, something like: 100644 blob 2beae51a0e14b3167fd7e81119972caef95779f4.gitignore 100644 blob 7c817960e954f0278a6eee8d58611f61445167e8LICENSE.txt 100644 blob 30e849cba985d74bfd29696f6dee5a40abaacb03README ... While "git cat-file tree " generate an strange output, which indicate some kink of encoding problem. Something like: 100644 .gitignore+��▒,��Wy�100644 LICENSE.txt|�y`�T�'�n��XaaDQg�100644 README0�I˩��K�) Thanks, -- Chico Sokol -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html