[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2021-01-09 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-757086817


   Thanks @theobisproject for your hard work!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2021-01-08 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-757086817


   Thanks @theobisproject for your hard work!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2021-01-05 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-755056091


   > I removed the code. Now everything I wanted to address is done.
   
   Looks good. I will do some final checks these days.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-12-29 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-752295265


   > I don't think we need this method because the TarFile#getInputStream only 
takes an instance of TarArchiveEntry.
   
   +1.
   
   The `canReadEntryData` in `TarArchiveInputStream` was checking if the entry 
is a sparse entry or not - cause the sparse entries were not supported. The 
sparse entry was supported later but we could not remove the `canReadEntryData` 
because it's a  API breaking change.
   
   I don't think we need it in `TarFile` any more.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-12-28 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-751975974


   And the japicmp report no BC breaking change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-12-28 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-751972401


   The coverage of TarFile looks good enough. Are there any other work left 
now? @theobisproject 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-12-24 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-751144813


   > So we should decide what we want to do about it in this PR.
   
   Let's start with the easiest one : there are some conflicts now. Could you 
resolve them? :)
   
   BTW : wish you a great christmas. :-)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-12-23 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-750693574


   Hey @theobisproject .
   Finally I can carve out some time these days. :(
   I'm thinking that we can make this PR merged within this Christmas. WDYT?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-11-19 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-730879784


   I have tested using your bench on my computer and got this :
   ```
   BenchmarkMode  Cnt   ScoreError  
Units
   ReadLargeTarBenchmark.readAllEntries_tarFile avgt   10   5.601 ±  0.061  
 s/op
   ReadLargeTarBenchmark.readAllEntries_tarStream   avgt   10   6.728 ±  0.012  
 s/op
   ReadLargeTarBenchmark.readFirstEntry_tarFile avgt   10   2.279 ±  0.017  
 s/op
   ReadLargeTarBenchmark.readFirstEntry_tarStream   avgt   10   0.001 ±  0.001  
 s/op
   ReadLargeTarBenchmark.readLastEntry_tarFile  avgt   10   2.266 ±  0.020  
 s/op
   ReadLargeTarBenchmark.readLastEntry_tarStreamavgt   10  13.068 ±  0.030  
 s/op
   ReadLargeTarBenchmark.readSecondEntry_tarFileavgt   10   2.257 ±  0.012  
 s/op
   ReadLargeTarBenchmark.readSecondEntry_tarStream  avgt   10   0.001 ±  0.001  
 s/op
   ```
   
   The score is average time by setting `BenchmarkMode` to `Mode.AverageTime`. 
I'm testing with linux-4.4.tar, which has a size of 618MB with 55708 entries. 
I'm testing on my machine : Windows10, SSD, Java 8.
   
   The result is interesting. Seems the `tarFile` needs an extra 2.65s to read 
all entries. `readAllEntries_tarStream` is slower than 
`readAllEntries_tarFile`, and `readLastEntry_tarStream` is even slower than 
eithor. Will try to figure out what's going on here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-11-18 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-730065042


   > My testing showed the TarFile and TarArchiveStream performance is 
equivalent for sequential access. The performance gain for accessing a single 
entry in the TarFile is huge because the TarArchiveStream needs to read and 
skip all data until it reaches the entry.
   
   > If you would like some numbers for some cases feel free to name them. I 
can create some benchmarks and share them with you.
   
   That's interesting. Would you please upload the tests so I can have a look? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-11-15 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-72619


   > In the SparseFilesTest extracting of sparsefile-0.1 in pax_gnu_sparse.tar 
is skipped because of problems with tar on Ubuntu 16.04. I tested this on 
Ubuntu 18.04 and everything worked fine. Should we reenable extracting of the 
file?
   
   Ah. I missed this in my last reviews. It's wired. :)
   I tested it on Ubuntu 18.04 and it do works now. That test was failed on 
Ubuntu 16.04 but unfortunly I don't have a Ubuntu 16.04 now. Agree with @kinow 
that we can leave it as-is and have it in a separate PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-11-15 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-727703377


   Hi @theobisproject 
   I have started a thread in mailing list about this.
   
   > Since I will have to do quite some work with the removal of code 
duplication I'd like to know how long you are willing to wait for feedback.
   
   I'm hoping to have this change dealed within version 1.21.
   
   Thanks for your great work once again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-09-28 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-699962152


   IMO, it's OK to have a random access for Tar - even through it would be 
slower as we read the `tar archive` to get the positions of entries. But I 
think we should hear more voice from others about this idea, cause this is a 
big change and `tar archives` are not designed to have a random access.
   
@theobisproject And as you said, there are still some work(e.g. the 
refactor of the duplication) to do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-compress] PeterAlfredLee commented on pull request #113: COMPRESS-540: Implement TarFile to allow random access to tar files

2020-07-16 Thread GitBox


PeterAlfredLee commented on pull request #113:
URL: https://github.com/apache/commons-compress/pull/113#issuecomment-659805215


   > In my opinion this adds no value for Commons IO and should not be used 
outside of the tar implementation.
   
   +1. The `TarArchiveSparseZeroInputStream` will only return 0 if you are 
reading it. It's simple enough and I think this should be limited to tar in 
Commons Compress.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org