alamb commented on code in PR #6961:
URL: https://github.com/apache/arrow-rs/pull/6961#discussion_r1915012018


##########
object_store/src/azure/client.rs:
##########
@@ -1058,7 +1058,7 @@ impl TryFrom<Blob> for ObjectMeta {
         Ok(Self {
             location: Path::parse(value.name)?,
             last_modified: value.properties.last_modified,
-            size: value.properties.content_length as usize,
+            size: value.properties.content_length,

Review Comment:
   that is a good sign in my mind



##########
object_store/src/chunked.rs:
##########
@@ -100,7 +100,7 @@ impl ObjectStore for ChunkedStore {
                         if exhausted {
                             return None;
                         }
-                        while buffer.len() < chunk_size {
+                        while buffer.len() < chunk_size as usize {

Review Comment:
   What would happen in this case if buffer.len() is 32 bit and chunk size is 
greater than `usize::MAX` This might truncate chunk size to the lower 32 bits 
right?
   
   Maybe to be safer we could check the conversion rather than `as`.
   
   I tried this locally and it seemed to work:
   ```diff
   (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ git diff
   diff --git a/object_store/src/chunked.rs b/object_store/src/chunked.rs
   index 30fcf73022..149352c53e 100644
   --- a/object_store/src/chunked.rs
   +++ b/object_store/src/chunked.rs
   @@ -85,6 +85,13 @@ impl ObjectStore for ChunkedStore {
   
        async fn get_opts(&self, location: &Path, options: GetOptions) -> 
Result<GetResult> {
            let r = self.inner.get_opts(location, options).await?;
   +        let chunk_size: usize = self.chunk_size.try_into()
   +            .map_err(|e| {
   +                crate::Error::Generic {
   +                    store: "ChunkedStore",
   +                    source: Box::new(e),
   +                }
   +            })?;
            let stream = match r.payload {
                #[cfg(all(feature = "fs", not(target_arch = "wasm32")))]
                GetResultPayload::File(file, path) => {
   @@ -93,7 +100,7 @@ impl ObjectStore for ChunkedStore {
                GetResultPayload::Stream(stream) => {
                    let buffer = BytesMut::new();
                    futures::stream::unfold(
   -                    (stream, buffer, false, self.chunk_size),
   +                    (stream, buffer, false, chunk_size),
                        |(mut stream, mut buffer, mut exhausted, chunk_size)| 
async move {
                            // Keep accumulating bytes until we reach capacity 
as long
   ```
   
   



##########
object_store/src/lib.rs:
##########
@@ -904,7 +904,7 @@ pub struct ObjectMeta {
     /// The last modified time
     pub last_modified: DateTime<Utc>,
     /// The size in bytes of the object

Review Comment:
   I think adding some context about why this is u64 (and not usize) might be 
helpful as it may not be obvious on a quick read
   
   ```suggestion
       /// The size in bytes of the object. 
       ///
       /// Note this is not `usize` as `object_store` supports 32-bit 
architectures such as WASM
   ```



##########
object_store/src/util.rs:
##########
@@ -196,20 +197,20 @@ pub enum GetRange {
     /// an error will be returned. Additionally, if the range ends after the 
end
     /// of the object, the entire remainder of the object will be returned.
     /// Otherwise, the exact requested range will be returned.
-    Bounded(Range<usize>),
+    Bounded(Range<u64>),

Review Comment:
   likewise it would be nice here to add a quick note about hy u64 is used 
rather than `usize`



##########
object_store/src/memory.rs:
##########
@@ -266,15 +269,15 @@ impl ObjectStore for InMemory {
         })
     }
 
-    async fn get_ranges(&self, location: &Path, ranges: &[Range<usize>]) -> 
Result<Vec<Bytes>> {
+    async fn get_ranges(&self, location: &Path, ranges: &[Range<u64>]) -> 
Result<Vec<Bytes>> {
         let entry = self.entry(location).await?;
         ranges
             .iter()
             .map(|range| {
                 let r = GetRange::Bounded(range.clone())
-                    .as_range(entry.data.len())
+                    .as_range(entry.data.len() as u64)
                     .map_err(|source| Error::Range { source })?;
-
+                let r = r.start as usize..r.end as usize;

Review Comment:
   I think this should also check conversion to generate an error if an offset 
greater than 32 bits is requested -- though it may be theoretical as the buffer 
wouldn't be able to hold that data. I just worry this would potentially return 
the wrong offset (if the size was truncated to 32 bits)



##########
object_store/src/lib.rs:
##########
@@ -1060,7 +1060,7 @@ impl GetResult {
                             path: path.clone(),
                         })?;
 
-                    let mut buffer = Vec::with_capacity(len);
+                    let mut buffer = Vec::with_capacity(len as usize);

Review Comment:
   This also seems like it has the possibility of overflowing. Maybe it doesn't 
matter as the `read_to_end` is going to fail anyways but it again might be 
safter to explicitly try to convert
   
   Something like
   
   ```suggestion
                       let mut buffer = if let Ok(len) = len.try_into() {
                           Vec::with_capacity(len)
                       } else {
                           Vec::new()
                       };
   ```



##########
object_store/src/local.rs:
##########
@@ -848,16 +842,16 @@ pub(crate) fn chunked_stream(
                     }
 
                     let to_read = remaining.min(chunk_size);
-                    let mut buffer = Vec::with_capacity(to_read);
+                    let mut buffer = Vec::with_capacity(to_read as usize);

Review Comment:
   same comment as above with `as usize` and `to_read` -- I think we should 
explictly check the conversation



##########
object_store/src/lib.rs:
##########
@@ -1019,7 +1019,7 @@ pub struct GetResult {
     /// The [`ObjectMeta`] for this object
     pub meta: ObjectMeta,
     /// The range of bytes returned by this request

Review Comment:
   ```suggestion
       /// The range of bytes returned by this request
       ///
       /// Note this is not `usize` as `object_store` supports 32-bit 
architectures such as WASM
   ```



##########
object_store/src/util.rs:
##########
@@ -332,7 +333,9 @@ mod tests {
             &ranges,
             |range| {
                 fetches.push(range.clone());
-                futures::future::ready(Ok(Bytes::from(src[range].to_vec())))
+                futures::future::ready(Ok(Bytes::from(
+                    src[range.start as usize..range.end as usize].to_vec(),

Review Comment:
   again, I think we need to check the conversion here rather than truncating
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to