westonpace commented on a change in pull request #12530:
URL: https://github.com/apache/arrow/pull/12530#discussion_r829848259
##########
File path: cpp/src/arrow/filesystem/path_util.h
##########
@@ -30,13 +30,16 @@ namespace fs {
namespace internal {
constexpr char kSep = '/';
+constexpr char kFilenameSep = '_';
// Computations on abstract paths (not local paths with system-dependent
behaviour).
// Abstract paths are typically used in URIs.
// Split an abstract path into its individual components.
ARROW_EXPORT
-std::vector<std::string> SplitAbstractPath(const std::string& s);
+std::vector<std::string> SplitAbstractPath(util::string_view path, const char&
sep);
+ARROW_EXPORT
+std::vector<std::string> SplitAbstractPath(const std::string& path);
Review comment:
Do we need two methods or could we get away with:
```
ARROW_EXPORT
std::vector<std::string> SplitAbstractPath(util::string_view path, char sep
= kSep)
```
Also, you don't need to pass primitives by const reference (it is generally
harmless and happens sometimes with templates but if we know the type is a
primitive then it's clearer do away with it).
##########
File path: cpp/src/arrow/dataset/partition.cc
##########
@@ -361,7 +411,32 @@ Result<std::string> DirectoryPartitioning::FormatValues(
break;
}
- return fs::internal::JoinAbstractPath(std::move(segments));
+ return std::make_pair(fs::internal::JoinAbstractPath(std::move(segments)),
"");
+}
+
+Result<std::pair<std::string, std::string>> FilenamePartitioning::FormatValues(
+ const ScalarVector& values) const {
+ std::vector<std::string>
segments(static_cast<size_t>(schema_->num_fields()));
+
+ for (int i = 0; i < schema_->num_fields(); ++i) {
+ if (values[i] != nullptr && values[i]->is_valid) {
+ segments[i] = values[i]->ToString();
+ continue;
+ }
+
+ if (auto illegal_index = NextValid(values, i)) {
+ // XXX maybe we should just ignore keys provided after the first absent
one?
+ return Status::Invalid("No partition key for ",
schema_->field(i)->name(),
+ " but a key was provided subsequently for ",
+ schema_->field(*illegal_index)->name(), ".");
+ }
+
+ // if all subsequent keys are absent we'll just print the available keys
+ break;
Review comment:
Are there tests in place for this case? In other words, do we have a
test where the schema is something like `schema({field("a", int32()),
field("b", int32())}` and we only pass in one field (or no fields)?
##########
File path: cpp/src/arrow/filesystem/path_util.h
##########
@@ -95,22 +98,22 @@ std::vector<std::string>
MinimalCreateDirSet(std::vector<std::string> dirs);
// Join the components of an abstract path.
template <class StringIt>
-std::string JoinAbstractPath(StringIt it, StringIt end) {
+std::string JoinAbstractPath(StringIt it, StringIt end, const char& sep =
kSep) {
std::string path;
for (; it != end; ++it) {
if (it->empty()) continue;
if (!path.empty()) {
- path += kSep;
+ path += sep;
}
path += *it;
}
return path;
}
template <class StringRange>
-std::string JoinAbstractPath(const StringRange& range) {
- return JoinAbstractPath(range.begin(), range.end());
+std::string JoinAbstractPath(const StringRange& range, const char& sep = kSep)
{
Review comment:
```suggestion
std::string JoinAbstractPath(const StringRange& range, char sep = kSep) {
```
##########
File path: cpp/src/arrow/filesystem/path_util.cc
##########
@@ -30,28 +30,23 @@ namespace internal {
// XXX How does this encode Windows UNC paths?
-std::vector<std::string> SplitAbstractPath(const std::string& path) {
+std::vector<std::string> SplitAbstractPath(util::string_view path, const char&
sep) {
std::vector<std::string> parts;
- auto v = util::string_view(path);
- // Strip trailing slash
- if (v.length() > 0 && v.back() == kSep) {
- v = v.substr(0, v.length() - 1);
- }
Review comment:
We would still need to strip the trailing slash. If you need to keep
the trailing slash to make the partitioning code work then I would stick with
two methods but have the second method (that doesn't strip the trailing
separator) be in partition.cc.
##########
File path: cpp/src/arrow/filesystem/path_util.h
##########
@@ -30,13 +30,16 @@ namespace fs {
namespace internal {
constexpr char kSep = '/';
+constexpr char kFilenameSep = '_';
Review comment:
We should move this constant into partition.h and rename it to something
like `kFilenamePartitionSep`. When I see `kFilenameSep` I think of `/` or `.`
##########
File path: cpp/src/arrow/filesystem/path_util.h
##########
@@ -95,22 +98,22 @@ std::vector<std::string>
MinimalCreateDirSet(std::vector<std::string> dirs);
// Join the components of an abstract path.
template <class StringIt>
-std::string JoinAbstractPath(StringIt it, StringIt end) {
+std::string JoinAbstractPath(StringIt it, StringIt end, const char& sep =
kSep) {
Review comment:
```suggestion
std::string JoinAbstractPath(StringIt it, StringIt end, char sep = kSep) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]