Re: Reading commit objects
On Tue, May 21, 2013 at 3:18 PM, Chico Sokol chico.so...@gmail.com wrote: Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Is there any non-standard issue in git's zlib compression? We're decompressing its contents with java default zlib api, so it should work normally, here's our code, that's printing that wrong output: import java.io.File; import java.io.FileInputStream; import java.util.zip.InflaterInputStream; import org.apache.commons.io.IOUtils; ... File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(obj)); System.out.println(IOUtils.readLines(inflaterInputStream)); ... Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. And you wonder why JGit code is confusing. We can't rely on standard Java APIs to do the right thing, because commonly used libraries have made assumptions that disagree with the way Git works. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
I'm not criticizing JGit, guys. It simply doesn't fit into our needs. We're not interested in mapping git commands in java and don't have the same RAM limitations. I know JGit team is doing a great job and we do not intend to build a library with such completeness. Are you guys contributors of JGit? Can you guys point me out to the code that unpacks git objects? The closest I could get was that class: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java It seems to be a standard and a non standard format of the packed object, as I read the comments of this method: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272 I suspect that the default inflater class of java api expect the object to be in the standard format. What the following comment mean? What's the Experimental pack-based format? Is there any docs on the specs of that? We must determine if the buffer contains the standard zlib-deflated stream or the experimental format based on the in-pack object format. Compare the header byte for each format: RFC1950 zlib w/ deflate : 0www1000 : 0 = www = 7 Experimental pack-based : Sttt : ttt = 1,2,3,4 -- Chico Sokol On Wed, May 22, 2013 at 2:59 AM, Shawn Pearce spea...@spearce.org wrote: On Tue, May 21, 2013 at 3:18 PM, Chico Sokol chico.so...@gmail.com wrote: Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Is there any non-standard issue in git's zlib compression? We're decompressing its contents with java default zlib api, so it should work normally, here's our code, that's printing that wrong output: import java.io.File; import java.io.FileInputStream; import java.util.zip.InflaterInputStream; import org.apache.commons.io.IOUtils; ... File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(obj)); System.out.println(IOUtils.readLines(inflaterInputStream)); ... Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. And you wonder why JGit code is confusing. We can't rely on standard Java APIs to do the right thing, because commonly used libraries have made assumptions that disagree with the way Git works. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. Maybe you're right, Shawn. I've also tried the following code: File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(dotGit)); ByteArrayOutputStream os = new ByteArrayOutputStream(); IOUtils.copyLarge(inflaterInputStream, os); System.out.println(new String(os.toByteArray())); But we got the same result, I'll try to read the bytes by myself (without apache IOUtils). Is the contents of a unpacked object utf-8 encoded? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Solved! It was exaclty the problem pointed by Shawn. Here is the working code: File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(dotGit)); Integer read = inflaterInputStream.read(); while(read != 0) { //reading the bytes from 'commit lenght\0' read = inflaterInputStream.read(); System.out.println((char)read.byteValue()); } ByteArrayOutputStream os = new ByteArrayOutputStream(); IOUtils.copyLarge(inflaterInputStream, os); System.out.println(new String(os.toByteArray())); Thank you all! -- Chico Sokol On Wed, May 22, 2013 at 11:25 AM, Chico Sokol chico.so...@gmail.com wrote: Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. Maybe you're right, Shawn. I've also tried the following code: File dotGit = new File(objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(dotGit)); ByteArrayOutputStream os = new ByteArrayOutputStream(); IOUtils.copyLarge(inflaterInputStream, os); System.out.println(new String(os.toByteArray())); But we got the same result, I'll try to read the bytes by myself (without apache IOUtils). Is the contents of a unpacked object utf-8 encoded? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
On Wed, May 22, 2013 at 7:25 AM, Chico Sokol chico.so...@gmail.com wrote: Your code is broken. IOUtils is probably corrupting what you get back. After inflating the stream you should see the object type (commit), space, its length in bytes as a base 10 string, and then a NUL ('\0'). Following that is the tree line, and parent(s) if any. I wonder if IOUtils discarded the remainder of the line after the NUL and did not consider the tree line. ... Is the contents of a unpacked object utf-8 encoded? Its more complicated than that. Commit objects are usually in utf-8, unless a repository configuration setting told you otherwise, or an encoding header appears in the commit. And sometimes that data lies anyway. ISO-8859-1 is one of the safer forms of reading a commit, but that also isn't always accurate. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
On Wed, May 22, 2013 at 7:20 AM, Chico Sokol chico.so...@gmail.com wrote: I'm not criticizing JGit, guys. It simply doesn't fit into our needs. We're not interested in mapping git commands in java and don't have the same RAM limitations. I guess you aren't trying to process the WebKit or Linux kernel repositories. Or you can afford more RAM than I can[1]. :-) [1] $DAY_JOB has lots of RAM. Lots. Are you guys contributors of JGit? Not really. I had nothing to do with JGit. :-) Can you guys point me out to the code that unpacks git objects? The closest I could get was that class: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java This class handles the loose object format in $GIT_DIR/objects, but does not handle objects contained in pack files. That is elsewhere, and well, more complex. Look at PackFile.java. It seems to be a standard and a non standard format of the packed object, as I read the comments of this method: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272 There are two formats, the official format that is used, and an experimental format that was discarded but is still supported for legacy reasons. I suspect that the default inflater class of java api expect the object to be in the standard format. What the following comment mean? What's the Experimental pack-based format? Is there any docs on the specs of that? Read the code. This is the dead format that is no longer written, but is still supported. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
On Tue, May 21, 2013 at 4:21 PM, Chico Sokol chico.so...@gmail.com wrote: Hello, I'm building a library to manipulate git repositories (interacting directly with the filesystem). Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit We hoped to get the same output of a git cat-file -p sha1, but that didn't happened. From a commit object, how can I find tree object hash of this commit? git rev-parse sha1: -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
On Tue, May 21, 2013 at 5:21 PM, Chico Sokol chico.so...@gmail.com wrote: Hello, I'm building a library to manipulate git repositories (interacting directly with the filesystem). Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit Does `git cat-file -p sha1` show a tree object? FWIW, I expected to see a tree line there, so maybe this object was created without a tree? I also don't see a parent listed. I did this on one of my repos: buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 'rb').read() import zlib zlib.decompress(buf) 'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent 0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister j...@szakmeister.net 1369168481 -0400\ncommitter John Szakmeister j...@szakmeister.net 1369168481 -0400\n\nGot a file listing.\n' So at least creating the commits with Git, I see a tree. How was the commit you're referencing created? Perhaps something is wrong with that process? We hoped to get the same output of a git cat-file -p sha1, but that didn't happened. From a commit object, how can I find tree object hash of this commit? I'd expect that too. -John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Is there any non-standard issue in git's zlib compression? We're decompressing its contents with java default zlib api, so it should work normally, here's our code, that's printing that wrong output: import java.io.File; import java.io.FileInputStream; import java.util.zip.InflaterInputStream; import org.apache.commons.io.IOUtils; ... File obj = new File(.git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21); InflaterInputStream inflaterInputStream = new InflaterInputStream(new FileInputStream(obj)); System.out.println(IOUtils.readLines(inflaterInputStream)); I know that here it's not the right place to ask about java issues, but we would appreciate any help any help. -- Chico Sokol On Tue, May 21, 2013 at 6:37 PM, John Szakmeister j...@szakmeister.net wrote: On Tue, May 21, 2013 at 5:21 PM, Chico Sokol chico.so...@gmail.com wrote: Hello, I'm building a library to manipulate git repositories (interacting directly with the filesystem). Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit Does `git cat-file -p sha1` show a tree object? FWIW, I expected to see a tree line there, so maybe this object was created without a tree? I also don't see a parent listed. I did this on one of my repos: buf = open('.git/objects/cd/da219e4d7beceae55af73c44cb3c9e1ec56802', 'rb').read() import zlib zlib.decompress(buf) 'commit 246\x00tree 2abfe1a7bedb29672a223a5c5f266b7dc70a8d87\nparent 0636e7ff6b79470b0cd53ceacea88e7796f202ce\nauthor John Szakmeister j...@szakmeister.net 1369168481 -0400\ncommitter John Szakmeister j...@szakmeister.net 1369168481 -0400\n\nGot a file listing.\n' So at least creating the commits with Git, I see a tree. How was the commit you're referencing created? Perhaps something is wrong with that process? We hoped to get the same output of a git cat-file -p sha1, but that didn't happened. From a commit object, how can I find tree object hash of this commit? I'd expect that too. -John -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Chico Sokol chico.so...@gmail.com writes: Hello, I'm building a library to manipulate git repositories (interacting directly with the filesystem). Currently, we're trying to parse commit objects. After decompressing the contents of a commit object file we got the following output: Who wrote this commit object you are trying to read? Us, or your library (this question is to see if you are chasing the right problem)? commit 191 author Francisco Sokol chico.so...@gmail.com 1369140112 -0300 committer Francisco Sokol chico.so...@gmail.com 1369140112 -0300 first commit We hoped to get the same output of a git cat-file -p sha1, but that didn't happened. From a commit object, how can I find tree object hash of this commit? If you care about the byte-for-byte compatibility, never use cat-file -p. That is meant for human consumption. git cat-file commit sha1 gives you the raw representation after inflating and stripping out the first type SP length LF line. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Chico Sokol chico.so...@gmail.com writes: Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Why aren't you using jgit? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
It was git who created that object. We're trying to build a improved java library focused in our needs (jgit has a really confusing api focused in solving egit needs). But we're about to get into their code to discover how to decompress git objects. -- Chico Sokol On Tue, May 21, 2013 at 7:22 PM, Junio C Hamano gits...@pobox.com wrote: Chico Sokol chico.so...@gmail.com writes: Ok, we discovered that the commit object actually contains the tree object's sha1, by reading its contents with python zlib library. So the bug must be with our java code (we're building a java lib). Why aren't you using jgit? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
Chico Sokol wrote: We're trying to build a improved java library focused in our needs (jgit has a really confusing api focused in solving egit needs). JGit is also open to contributions, including contributions that add less confusing API calls. :) See http://wiki.eclipse.org/JGit/User_Guide http://wiki.eclipse.org/EGit/Contributor_Guide#JGit http://wiki.eclipse.org/EGit/Contributor_Guide#Using_Gerrit_at_https:.2F.2Fgit.eclipse.org.2Fr https://dev.eclipse.org/mailman/listinfo/jgit-dev Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reading commit objects
On Tue, May 21, 2013 at 3:33 PM, Chico Sokol chico.so...@gmail.com wrote: It was git who created that object. We're trying to build a improved java library focused in our needs (jgit has a really confusing api focused in solving egit needs). JGit code... is confusing because its fast. We spent a lot of time trying to make things fast on the JVM, and somewhat comparable with C Git even though its not in C. Some of the low-level APIs are fast because they bypass conventional Java wisdom and just tell the #@!* machine what to do, with no pretty bits about it. Make it pretty, it goes slower. Or uses more RAM. Java likes RAM. Good luck making an improved library. JGit of course is also interested in contributions. The api package has been trying to make a simpler calling convention for common use cases that match the command line interface user are familiar with, but its still incomplete and hides some optimizations that are possible with the lower-level calls. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html