[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-757086817 Thanks @theobisproject for your hard work! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-757086817 Thanks @theobisproject for your hard work! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-755056091 > I removed the code. Now everything I wanted to address is done. Looks good. I will do some final checks these days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-752295265 > I don't think we need this method because the TarFile#getInputStream only takes an instance of TarArchiveEntry. +1. The `canReadEntryData` in `TarArchiveInputStream` was checking if the entry is a sparse entry or not - cause the sparse entries were not supported. The sparse entry was supported later but we could not remove the `canReadEntryData` because it's a API breaking change. I don't think we need it in `TarFile` any more. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-751975974 And the japicmp report no BC breaking change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-751972401 The coverage of TarFile looks good enough. Are there any other work left now? @theobisproject This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-751144813 > So we should decide what we want to do about it in this PR. Let's start with the easiest one : there are some conflicts now. Could you resolve them? :) BTW : wish you a great christmas. :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-750693574 Hey @theobisproject . Finally I can carve out some time these days. :( I'm thinking that we can make this PR merged within this Christmas. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-730879784 I have tested using your bench on my computer and got this : ``` BenchmarkMode Cnt ScoreError Units ReadLargeTarBenchmark.readAllEntries_tarFile avgt 10 5.601 ± 0.061 s/op ReadLargeTarBenchmark.readAllEntries_tarStream avgt 10 6.728 ± 0.012 s/op ReadLargeTarBenchmark.readFirstEntry_tarFile avgt 10 2.279 ± 0.017 s/op ReadLargeTarBenchmark.readFirstEntry_tarStream avgt 10 0.001 ± 0.001 s/op ReadLargeTarBenchmark.readLastEntry_tarFile avgt 10 2.266 ± 0.020 s/op ReadLargeTarBenchmark.readLastEntry_tarStreamavgt 10 13.068 ± 0.030 s/op ReadLargeTarBenchmark.readSecondEntry_tarFileavgt 10 2.257 ± 0.012 s/op ReadLargeTarBenchmark.readSecondEntry_tarStream avgt 10 0.001 ± 0.001 s/op ``` The score is average time by setting `BenchmarkMode` to `Mode.AverageTime`. I'm testing with linux-4.4.tar, which has a size of 618MB with 55708 entries. I'm testing on my machine : Windows10, SSD, Java 8. The result is interesting. Seems the `tarFile` needs an extra 2.65s to read all entries. `readAllEntries_tarStream` is slower than `readAllEntries_tarFile`, and `readLastEntry_tarStream` is even slower than eithor. Will try to figure out what's going on here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-730065042 > My testing showed the TarFile and TarArchiveStream performance is equivalent for sequential access. The performance gain for accessing a single entry in the TarFile is huge because the TarArchiveStream needs to read and skip all data until it reaches the entry. > If you would like some numbers for some cases feel free to name them. I can create some benchmarks and share them with you. That's interesting. Would you please upload the tests so I can have a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-72619 > In the SparseFilesTest extracting of sparsefile-0.1 in pax_gnu_sparse.tar is skipped because of problems with tar on Ubuntu 16.04. I tested this on Ubuntu 18.04 and everything worked fine. Should we reenable extracting of the file? Ah. I missed this in my last reviews. It's wired. :) I tested it on Ubuntu 18.04 and it do works now. That test was failed on Ubuntu 16.04 but unfortunly I don't have a Ubuntu 16.04 now. Agree with @kinow that we can leave it as-is and have it in a separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-727703377 Hi @theobisproject I have started a thread in mailing list about this. > Since I will have to do quite some work with the removal of code duplication I'd like to know how long you are willing to wait for feedback. I'm hoping to have this change dealed within version 1.21. Thanks for your great work once again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-699962152 IMO, it's OK to have a random access for Tar - even through it would be slower as we read the `tar archive` to get the positions of entries. But I think we should hear more voice from others about this idea, cause this is a big change and `tar archives` are not designed to have a random access. @theobisproject And as you said, there are still some work(e.g. the refactor of the duplication) to do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files
PeterAlfredLee commented on pull request #113: URL: https://github.com/apache/commons-compress/pull/113#issuecomment-659805215 > In my opinion this adds no value for Commons IO and should not be used outside of the tar implementation. +1. The `TarArchiveSparseZeroInputStream` will only return 0 if you are reading it. It's simple enough and I think this should be limited to tar in Commons Compress. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org