psalagnac commented on pull request #120: URL: https://github.com/apache/solr/pull/120#issuecomment-858766473
Thanks for all you feedback on this PR @gerlowskija > What distinguishes a few of the I/O classes in this PR (e.g. BlobIndexInput) from existing alternatives (e.g. BufferedChecksumIndexInput) that do similar things at a quick glance? Any implementation of `BackupRepository` requires a friendly `IndexInput` for method `openInput()`. The two only things we have when working with S3 are an `InputStream` created by AWS client (so can't make much assumptions on its actual implementation) and the total file length. I haven't found any existing implementation of `IndexInput` in Solr codebase that would work with these two parameters only. Maybe there is one I missed, if so please point me to it. My understanding is `BufferedChecksumIndexInput` requires a delegate `IndexInput` to access data. BTW, I just found a bug in your implementation of `IndexInput` for GCS backup repository. We had this similar bug happening in production environments with AWS client. I'm not sure about details of AWS client internals, but there is a buffer somewhere in the stack. Depending exactly on when you invoke `read()` on the input stream, the buffer may be full or partially full. But the read operation is non-blocking, so if the buffer has less data than you want to read, you only get what's already in the buffer. That's why checking returned number of read bytes was mandatory here. Since contract of `IndexInput` does not support reading less bytes that initially requires, I had to solve this with a retry loop. With AWS, it happens for real. With GCP, it may be just theoretical? That may be some code to share between different blob implementations? 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org