psalagnac commented on pull request #120:
URL: https://github.com/apache/solr/pull/120#issuecomment-858766473


   Thanks for all you feedback on this PR @gerlowskija 
   
   > What distinguishes a few of the I/O classes in this PR (e.g. 
BlobIndexInput) from existing alternatives (e.g. BufferedChecksumIndexInput) 
that do similar things at a quick glance?
   
   Any implementation of `BackupRepository` requires a friendly `IndexInput` 
for method `openInput()`.
   
   The two only things we have when working with S3 are an `InputStream` 
created by AWS client (so can't make much assumptions on its actual 
implementation) and the total file length. I haven't found any existing 
implementation of `IndexInput` in Solr codebase that would work with these two 
parameters only. Maybe there is one I missed, if so please point me to it.
   
   My understanding is `BufferedChecksumIndexInput` requires a delegate 
`IndexInput` to access data.
   
   BTW, I just found a bug in your implementation of `IndexInput` for GCS 
backup repository. We had this similar bug happening in production environments 
with AWS client. I'm not sure about details of AWS client internals, but there 
is a buffer somewhere in the stack. Depending exactly on when you invoke 
`read()` on the input stream, the buffer may be full or partially full. But the 
read operation is non-blocking, so if the buffer has less data than you want to 
read, you only get what's already in the buffer. That's why checking returned 
number of read bytes was mandatory here. Since contract of `IndexInput` does 
not support reading less bytes that initially requires, I had to solve this 
with a retry loop.
   
   With AWS, it happens for real. With GCP, it may be just theoretical? That 
may be some code to share between different blob implementations? 😄 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to