coryan commented on a change in pull request #11812:
URL: https://github.com/apache/arrow/pull/11812#discussion_r760205288
##########
File path: cpp/src/arrow/filesystem/gcsfs.cc
##########
@@ -324,17 +407,46 @@ Result<std::shared_ptr<io::InputStream>>
GcsFileSystem::OpenInputStream(
return Status::IOError("Only files can be opened as input streams");
}
ARROW_ASSIGN_OR_RAISE(auto p, GcsPath::FromString(info.path()));
- return impl_->OpenInputStream(p);
+ return impl_->OpenInputStream(p.bucket, p.object, gcs::Generation(),
+ gcs::ReadFromOffset());
}
Result<std::shared_ptr<io::RandomAccessFile>> GcsFileSystem::OpenInputFile(
const std::string& path) {
- return Status::NotImplemented("The GCS FileSystem is not fully implemented");
+ ARROW_ASSIGN_OR_RAISE(auto p, GcsPath::FromString(path));
+ auto metadata = impl_->GetObjectMetadata(p);
Review comment:
Yes, it does. I am trying to ensure that `Read()` and `ReadAt()` and
`Seek()` when going back are using the same generation of an object [*]. We
could try to use undocumented (and likely to break) APIs to extract the
generation without this roundtrip. If it turns out the roundtrip (and I should
add, the additional API charges) are really important, then I would rather add
a documented API to the C++ client library and then use that here.
[*]: you probably know this, but objects in GCS are versioned. You can have
more than one version of the same object, and/or have the "latest" version
replaced while you are reading from it. I would think we want all operations
in one `io::RandomAccessFile` to refer to the same generation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]