ryanrussell opened a new issue, #8120:
URL: https://github.com/apache/arrow-rs/issues/8120
# ARROW-012-015: Remaining Arrow 56.0 Breaking Changes
## Problem
Several smaller but critical breaking changes in Arrow 56.0 need linter
coverage for complete migration support.
## API Changes Details
### ARROW-012: Int96::to_i64() Removal (Parquet)
```rust
-pub fn parquet::data_type::Int96::to_i64(&self) -> i64
```
### ARROW-013: Parquet Level Enum Removal
```rust
-pub enum parquet::column::writer::Level
-pub parquet::column::writer::Level::Column
-pub parquet::column::writer::Level::Page
```
### ARROW-014: Field Dictionary Behavior Changes
**Behavioral breaking change** in commit ed0213143:
- `Field::try_merge()` no longer checks `dict_id` equality
- `Field::contains()` no longer compares `dict_id` fields
This can cause subtle bugs in schema merging logic.
### ARROW-015: New Decimal Types (Enhancement Detection)
```rust
+pub arrow_schema::DataType::Decimal32(u8, i8)
+pub arrow_schema::DataType::Decimal64(u8, i8)
```
## Implementation Task
Create `src/rules/arrow_012_015_misc_breaking_changes.rs` with these
specifications:
### Rule Implementation
```rust
use regex::Regex;
use std::path::Path;
use crate::output::{Issue, Severity};
use crate::rules::Rule;
/// ARROW-012-015: Detect miscellaneous breaking changes in Arrow 56.0
///
/// Covers Int96::to_i64 removal, Level enum removal, Field behavior changes,
/// and opportunities to use new Decimal32/64 types.
pub struct Arrow012015Rule;
impl Rule for Arrow012015Rule {
fn rule_id(&self) -> &'static str {
"ARROW-012-015"
}
fn check_rust_source(&self, file_path: &Path, content: &str) ->
Result<Vec<Issue>, Box<dyn std::error::Error>> {
let mut issues = Vec::new();
for (line_num, line) in content.lines().enumerate() {
// ARROW-012: Int96::to_i64() removal
if line.contains(".to_i64()") && (line.contains("Int96") ||
line.contains("int96")) {
if let Some(mat) =
Regex::new(r"\.to_i64\s*\(\s*\)")?.find(line) {
issues.push(Issue {
rule_id: "ARROW-012".to_string(),
severity: Severity::Error,
message: "Int96::to_i64() method was removed in
Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some("Use manual conversion:
(int96.data[0] as i64) | ((int96.data[1] as i64) << 32). Note: This loses
precision if the value exceeds i64 range.".to_string()),
auto_fixable: false,
deprecated_since: Some("56.0.0".to_string()),
changelog_url:
Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
migration_guide_url:
Some("https://arrow.apache.org/docs/rust/migration_guide.html#int96-to-i64".to_string()),
});
}
}
// ARROW-013: Level enum usage
if let Some(mat) =
Regex::new(r"Level::(Column|Page)\b")?.find(line) {
issues.push(Issue {
rule_id: "ARROW-013".to_string(),
severity: Severity::Error,
message: "parquet::column::writer::Level enum was
removed in Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some("The Level enum and its variants
(Column, Page) were removed. Use the appropriate writer configuration methods
directly.".to_string()),
auto_fixable: false,
deprecated_since: Some("56.0.0".to_string()),
changelog_url:
Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
migration_guide_url:
Some("https://arrow.apache.org/docs/rust/migration_guide.html#level-enum-removal".to_string()),
});
}
// ARROW-014: Field merge/contains behavior change warnings
if line.contains("try_merge") || line.contains("contains") {
if line.contains("Field") || line.contains("field") {
if let Some(mat) =
Regex::new(r"\.(try_merge|contains)\s*\(")?.find(line) {
issues.push(Issue {
rule_id: "ARROW-014".to_string(),
severity: Severity::Warning,
message: "Field::try_merge() and
Field::contains() behavior changed in Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some("These methods no longer check
dict_id equality. If you depend on dict_id comparison, add explicit checks:
field1.dict_id == field2.dict_id".to_string()),
auto_fixable: false,
deprecated_since: Some("56.0.0 (behavior
change)".to_string()),
changelog_url:
Some("https://github.com/apache/arrow-rs/pull/7968".to_string()),
migration_guide_url:
Some("https://arrow.apache.org/docs/rust/migration_guide.html#field-dict-id-behavior".to_string()),
});
}
}
}
// ARROW-015: Suggest new Decimal32/64 types for small precision
decimals
if let Some(mat) =
Regex::new(r"DataType::Decimal128\s*\(\s*([1-9]|1[0-8]),")?.find(line) {
if let Some(captures) =
Regex::new(r"DataType::Decimal128\s*\(\s*(\d+),")?.captures(line) {
if let Ok(precision) = captures[1].parse::<u8>() {
let suggestion = if precision <= 9 {
"Consider using DataType::Decimal32 for better
performance with precision ≤9"
} else if precision <= 18 {
"Consider using DataType::Decimal64 for better
performance with precision ≤18"
} else {
continue;
};
issues.push(Issue {
rule_id: "ARROW-015".to_string(),
severity: Severity::Info,
message: "New smaller decimal types available in
Arrow 56.0".to_string(),
file_path: file_path.to_path_buf(),
line: line_num + 1,
column: mat.start() + 1,
suggestion: Some(suggestion.to_string()),
auto_fixable: false,
deprecated_since: None,
changelog_url:
Some("https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md#560".to_string()),
migration_guide_url:
Some("https://arrow.apache.org/docs/rust/migration_guide.html#new-decimal-types".to_string()),
});
}
}
}
}
Ok(issues)
}
fn check_cargo_toml(&self, _file_path: &Path, _content: &str) ->
Result<Vec<Issue>, Box<dyn std::error::Error>> {
Ok(Vec::new())
}
}
```
### Tests Required
```rust
#[cfg(test)]
mod tests {
use super::*;
use std::path::PathBuf;
#[test]
fn test_detects_int96_to_i64() {
let rule = Arrow012015Rule;
let content = r#"
use parquet::data_type::Int96;
fn convert_int96(value: Int96) -> i64 {
value.to_i64() // Should trigger ARROW-012
}
"#;
let issues = rule.check_rust_source(&PathBuf::from("test.rs"),
content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-012"));
}
#[test]
fn test_detects_level_enum() {
let rule = Arrow012015Rule;
let content = "let level = Level::Column;";
let issues = rule.check_rust_source(&PathBuf::from("test.rs"),
content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-013"));
}
#[test]
fn test_warns_field_behavior_change() {
let rule = Arrow012015Rule;
let content = r#"
if field1.contains(&field2) {
field1.try_merge(&field2)?;
}
"#;
let issues = rule.check_rust_source(&PathBuf::from("test.rs"),
content).unwrap();
assert_eq\!(issues.iter().filter(|i| i.rule_id ==
"ARROW-014").count(), 2);
assert\!(issues.iter().any(|i| matches\!(i.severity,
Severity::Warning)));
}
#[test]
fn test_suggests_decimal32() {
let rule = Arrow012015Rule;
let content = "DataType::Decimal128(5, 2)";
let issues = rule.check_rust_source(&PathBuf::from("test.rs"),
content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015" &&
matches\!(i.severity, Severity::Info)));
}
#[test]
fn test_suggests_decimal64() {
let rule = Arrow012015Rule;
let content = "DataType::Decimal128(15, 4)";
let issues = rule.check_rust_source(&PathBuf::from("test.rs"),
content).unwrap();
assert\!(issues.iter().any(|i| i.rule_id == "ARROW-015"));
}
#[test]
fn test_no_suggestion_for_large_precision() {
let rule = Arrow012015Rule;
let content = "DataType::Decimal128(25, 8)"; // Precision > 18
let issues = rule.check_rust_source(&PathBuf::from("test.rs"),
content).unwrap();
assert\!(\!issues.iter().any(|i| i.rule_id == "ARROW-015"));
}
}
```
### Integration Steps
1. Add to `src/rules/mod.rs`:
```rust
pub mod arrow_012_015_misc_breaking_changes;
// In RuleEngine::new():
rules.push(Box::new(arrow_012_015_misc_breaking_changes::Arrow012015Rule));
```
2. Test patterns:
```rust
int96_val.to_i64() // ARROW-012 Error
Level::Column // ARROW-013 Error
field.try_merge(&other) // ARROW-014 Warning
DataType::Decimal128(7, 2) // ARROW-015 Info
```
## Acceptance Criteria
- ✅ Detects Int96::to_i64() usage (Error)
- ✅ Detects Level enum variants (Error)
- ✅ Warns about Field behavior changes (Warning)
- ✅ Suggests better decimal types (Info)
- ✅ Appropriate severity levels for each issue type
- ✅ Comprehensive test coverage
## Migration Summary
- **ARROW-012**: Manual Int96 conversion required
- **ARROW-013**: Level enum completely removed
- **ARROW-014**: Behavior change in Field methods
- **ARROW-015**: Performance improvement opportunity
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]