vegarsti commented on code in PR #8069:
URL: https://github.com/apache/arrow-rs/pull/8069#discussion_r2321089424
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -4293,4 +4304,50 @@ mod tests {
assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024);
assert_eq!(get_dict_page_size(col1_meta), 1024 * 1024 * 4);
}
+
+ #[test]
+ fn arrow_writer_run_end_encoded() {
+ // Create a run array of strings
+ let mut builder = StringRunBuilder::<Int16Type>::new();
+ builder.extend(
+ vec![Some("alpha"); 1000]
+ .into_iter()
+ .chain(vec![Some("beta"); 1000]),
+ );
+ let run_array: RunArray<Int16Type> = builder.finish();
+ println!("run_array type: {:?}", run_array.data_type());
+ let schema = Arc::new(Schema::new(vec![Field::new(
+ "ree",
+ run_array.data_type().clone(),
+ run_array.is_nullable(),
+ )]));
+
+ // Write to parquet
+ let mut parquet_bytes: Vec<u8> = Vec::new();
+ let mut writer = ArrowWriter::try_new(&mut parquet_bytes,
schema.clone(), None).unwrap();
+ let batch = RecordBatch::try_new(schema.clone(),
vec![Arc::new(run_array)]).unwrap();
+ writer.write(&batch).unwrap();
+ writer.close().unwrap();
+
+ // Schema of output is plain, not dictionary or REE encoded!!
Review Comment:
Yeah, let's do the reader in another PR!
I've applied your diff -- thank you! The test I added
(`arrow_writer_run_end_encoded_string`) is now failing with `Casting from Utf8
to RunEndEncoded` here in `cast_with_options` when the source type is `Utf8`:
https://github.com/apache/arrow-rs/blob/4506998155a5d915e7d70ffb8e0d511a24ada4ee/arrow-cast/src/cast/mod.rs#L1248-L1250
Similarly, added another test arrow_writer_run_end_encoded_int which also
fails on the cast in `cast_with_options`.
I know there's work on casting in progress here:
https://github.com/apache/arrow-rs/pull/7713#discussion_r2320634788
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -4293,4 +4304,50 @@ mod tests {
assert_eq!(get_dict_page_size(col0_meta), 1024 * 1024);
assert_eq!(get_dict_page_size(col1_meta), 1024 * 1024 * 4);
}
+
+ #[test]
+ fn arrow_writer_run_end_encoded() {
+ // Create a run array of strings
+ let mut builder = StringRunBuilder::<Int16Type>::new();
+ builder.extend(
+ vec![Some("alpha"); 1000]
+ .into_iter()
+ .chain(vec![Some("beta"); 1000]),
+ );
+ let run_array: RunArray<Int16Type> = builder.finish();
+ println!("run_array type: {:?}", run_array.data_type());
+ let schema = Arc::new(Schema::new(vec![Field::new(
+ "ree",
+ run_array.data_type().clone(),
+ run_array.is_nullable(),
+ )]));
+
+ // Write to parquet
+ let mut parquet_bytes: Vec<u8> = Vec::new();
+ let mut writer = ArrowWriter::try_new(&mut parquet_bytes,
schema.clone(), None).unwrap();
+ let batch = RecordBatch::try_new(schema.clone(),
vec![Arc::new(run_array)]).unwrap();
+ writer.write(&batch).unwrap();
+ writer.close().unwrap();
+
+ // Schema of output is plain, not dictionary or REE encoded!!
Review Comment:
Yeah, let's do the reader in another PR after all!
I've applied your diff -- thank you! The test I added
(`arrow_writer_run_end_encoded_string`) is now failing with `Casting from Utf8
to RunEndEncoded` here in `cast_with_options` when the source type is `Utf8`:
https://github.com/apache/arrow-rs/blob/4506998155a5d915e7d70ffb8e0d511a24ada4ee/arrow-cast/src/cast/mod.rs#L1248-L1250
Similarly, added another test arrow_writer_run_end_encoded_int which also
fails on the cast in `cast_with_options`.
I know there's work on casting in progress here:
https://github.com/apache/arrow-rs/pull/7713#discussion_r2320634788
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]