GitHub user matt-martin closed a discussion: Issues with repartition
Hello,
I'm fairly new to Rust and Datafusion so please excuse the basic question. I'm
trying to understand why repartition does not always produce the desired number
of partitions. Here's a very basic test I constructed:
```rust
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn repartition_test() {
let data_vec = (1..200).map(|x| x.to_string()).collect::<Vec<_>>();
let batch = RecordBatch::try_new(
Arc::new(Schema::new(vec![Field::new("foo", DataType::Utf8,
false)])),
vec![Arc::new(StringArray::from(data_vec))]
).unwrap();
let result = SessionContext::new().read_batch(batch)
.unwrap()
.repartition(Partitioning::Hash(vec![col("foo")], 3))
.unwrap()
.collect_partitioned()
.await
.unwrap();
print!("RESULTS look like: {:?}", result);
assert_eq!(result.len(), 3);
}
}
```
If I run the test:
```sh
cargo test -- tests::repartition_test --exact
```
I see the following output:
```sh
running 1 test
test tests::repartition_test ... FAILED
failures:
---- tests::repartition_test stdout ----
RESULTS look like: [[RecordBatch { schema: Schema { fields: [Field { name:
"foo", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }], metadata: {} }, columns: [StringArray
[
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
"10",
...179 elements...,
"190",
"191",
"192",
"193",
"194",
"195",
"196",
"197",
"198",
"199",
]], row_count: 199 }]]thread 'tests::repartition_test' panicked at
src/main.rs:808:9:
assertion `left == right` failed
left: 1
right: 3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::repartition_test
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out;
finished in 0.02s
```
I'm probably missing something obvious, but shouldn't the top level vector
returned by collect_partitioned have 3 elements (i.e. 1 for each partition)?
GitHub link: https://github.com/apache/datafusion/discussions/9701
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]