[jira] [Created] (ARROW-13300) [Integration] Add Rust map
Neville Dipale created ARROW-13300: -- Summary: [Integration] Add Rust map Key: ARROW-13300 URL: https://issues.apache.org/jira/browse/ARROW-13300 Project: Apache Arrow Issue Type: New Feature Reporter: Neville Dipale I'm working on Rust map support at https://github.com/apache/arrow-rs/pull/491. We can add integration testing support after the PR is merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12156) [Rust] Calculate the size of a RecordBatch
Neville Dipale created ARROW-12156: -- Summary: [Rust] Calculate the size of a RecordBatch Key: ARROW-12156 URL: https://issues.apache.org/jira/browse/ARROW-12156 Project: Apache Arrow Issue Type: New Feature Reporter: Neville Dipale We can compute the size of an array, but there's no facility yet to compute the size of a recordbatch. This is useful if we need to measure the size of data we're about to write. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12153) [Rust] [Parquet] Return file metadata after writing Parquet file
Neville Dipale created ARROW-12153: -- Summary: [Rust] [Parquet] Return file metadata after writing Parquet file Key: ARROW-12153 URL: https://issues.apache.org/jira/browse/ARROW-12153 Project: Apache Arrow Issue Type: New Feature Reporter: Neville Dipale Assignee: Neville Dipale Parquet writers like delta-rs rely on the Parquet metadata to write file-level statistics for file pruning purposes. We currently do not expose these stats, requiring the writer to read the file that has just been written, to get the stats. This is more problematic for in-memory sinks, as there is currently no way of getting the metadata from the sink before it's persisted. Explore if we can expose these stats to the writer, to make the above easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12121) [Rust] [Parquet] Arrow writer benchmarks
Neville Dipale created ARROW-12121: -- Summary: [Rust] [Parquet] Arrow writer benchmarks Key: ARROW-12121 URL: https://issues.apache.org/jira/browse/ARROW-12121 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale The common concern with Parquet's Arrow readers and writers is that they're slow. My diagnosis is that we rely on a chain of processes, which introduces overhead. For example, writing an Arrow RecordBatch involves the following: 1. Iterate through arrays to create def/rep levels 2. Extract Parquet primitive values from arrays using these levels 3. Write primitive values, validating them in the process (when they already should be validated) 4. Split the already materialised values into small batches for Parquet chunks (consider where we have 1e6 values in a batch) 5. Write these batches, computing the stats of each batch, and encoding values The above is as a side-effect of convenience, as it would likely require a lot more effort to bypass some of the steps. I have ideas around going from step 1 to 5 directly, but won't know if it's better if there aren't performance benchmarks. I also struggle to see if I'm making improvements while I clean up the writer code, especially removing the allocations that I created to reduce the complexity of the level calculations. With ARROW-12120 (random array & batch generator), it becomes more convenient to benchmark (and test many combinations of) the Arrow writer. I would thus like to start adding benchmarks for the Arrow writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12120) [Rust] Generate random arrays and batches
Neville Dipale created ARROW-12120: -- Summary: [Rust] Generate random arrays and batches Key: ARROW-12120 URL: https://issues.apache.org/jira/browse/ARROW-12120 Project: Apache Arrow Issue Type: Bug Reporter: Neville Dipale Assignee: Neville Dipale I need a random data generator for the Parquet <> Arrow integration. It takes me a while to craft a test case, so being able to create random data would make it a bit easier to improve test coverage and catch edge-cases in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12116) [Rust] Fix or ignore 1.51 clippy lints
Neville Dipale created ARROW-12116: -- Summary: [Rust] Fix or ignore 1.51 clippy lints Key: ARROW-12116 URL: https://issues.apache.org/jira/browse/ARROW-12116 Project: Apache Arrow Issue Type: Bug Reporter: Neville Dipale Assignee: Neville Dipale Rust 1.51 introduces some lints that have broken CI. We can either fix or ignore them, depending on the amount of time it'll take to fix them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12043) [Rust] [Parquet] Write fixed size binary arrays
Neville Dipale created ARROW-12043: -- Summary: [Rust] [Parquet] Write fixed size binary arrays Key: ARROW-12043 URL: https://issues.apache.org/jira/browse/ARROW-12043 Project: Apache Arrow Issue Type: Sub-task Reporter: Neville Dipale We already write FSB when writing binary arrays, so this extends the support by removing unimplemented code paths -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12019) [Rust] [Parquet] Update README for 2.6.0 support
Neville Dipale created ARROW-12019: -- Summary: [Rust] [Parquet] Update README for 2.6.0 support Key: ARROW-12019 URL: https://issues.apache.org/jira/browse/ARROW-12019 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale The Parquet README still talks about supporting 2.4.0, with a TODO for 2.5.0. When the 2.6.0 support is completed, we can update the README. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12018) [Rust] [Parquet] Write lower precision Arrow decimal to int32/64
Neville Dipale created ARROW-12018: -- Summary: [Rust] [Parquet] Write lower precision Arrow decimal to int32/64 Key: ARROW-12018 URL: https://issues.apache.org/jira/browse/ARROW-12018 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale When ARROW-10818 is completed, we should start writing decimal arrays of lower precisions as i32 and i64. I have left a TODO in the code as part of ARROW-11824 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11898) [Rust] Pretty print columns
Neville Dipale created ARROW-11898: -- Summary: [Rust] Pretty print columns Key: ARROW-11898 URL: https://issues.apache.org/jira/browse/ARROW-11898 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale We can pretty print a slice of record batches, but it's also useful to pretty print a slice of columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11824) [Rust] [Parquet] Use logical types in Arrow writer
Neville Dipale created ARROW-11824: -- Summary: [Rust] [Parquet] Use logical types in Arrow writer Key: ARROW-11824 URL: https://issues.apache.org/jira/browse/ARROW-11824 Project: Apache Arrow Issue Type: Sub-task Reporter: Neville Dipale Start using the logical type for Arrow <> Parquet schema conversion, so that we can support more Arrow types, like nanosecond temporal types. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11803) [Rust] Parquet] Support v2 LogicalType
Neville Dipale created ARROW-11803: -- Summary: [Rust] Parquet] Support v2 LogicalType Key: ARROW-11803 URL: https://issues.apache.org/jira/browse/ARROW-11803 Project: Apache Arrow Issue Type: Sub-task Reporter: Neville Dipale Assignee: Neville Dipale We currently do not read nor write the version 2 logical types. This is mainly because we do not have a mapping for it from parquet-format-rs. To implement this, we can: - convert "parquet::basic::LogicalType" to "parquet::basic::ConvertedType" - implement "parquet::basic::LogicalType" which mirrors "parquet_format::LogicalType" - create a mapping between ConvertedType and LogicalType - write LogicalType to "parquet_format::SchemaElement" if v2 of the writer is used This would be a good starting point for implementing 2.6 types (UUID, NANOS precision time & timestamp). Follow-up work would be: - parsing v2 of the schema - Using v2 in the Arrow writer (mostly schema conversion) - Supporting nanosecond precision time & timestamp -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11798) [Integration] Update testing submodule
Neville Dipale created ARROW-11798: -- Summary: [Integration] Update testing submodule Key: ARROW-11798 URL: https://issues.apache.org/jira/browse/ARROW-11798 Project: Apache Arrow Issue Type: Task Reporter: Neville Dipale Updates submodule after ARROW-11666, and removes references to files that no longer exist (generated_large_batch) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11619) [Rust] String-based path field projection
Neville Dipale created ARROW-11619: -- Summary: [Rust] String-based path field projection Key: ARROW-11619 URL: https://issues.apache.org/jira/browse/ARROW-11619 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Neville Dipale Similar to ARROW-11618, we could benefit from the ability to pluck out specific fields from an Arrow schema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11618) [Rust] [Parquet] String-based path column projection
Neville Dipale created ARROW-11618: -- Summary: [Rust] [Parquet] String-based path column projection Key: ARROW-11618 URL: https://issues.apache.org/jira/browse/ARROW-11618 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: Neville Dipale There is currently no way to select a column by its path, e.g. 'a.b.c'. We have to select the column by its index, which is not trivial for nested structures. For example, if a record has the following schema, the column indices are shown in parentheses: {code} schema: a [struct] ("a") b [struct] ("a.b") c [int32] ("a.b.c") d [struct] ("a.b.d") e [int32]("a.b.d.e") [0] f [bool] ("a.b.d.f") [1] g [int64] ("a.b.g")[2] {code} if one wants to select 'a.b', they need to know that 'a.b' spans 3 (0 to 2) columns. This is inconvenient, and potentially forces readers to read whole records to avoid this inconvenience. A string-based projection could allow one to select columns 0 and 1 via "a.b.d" or column 2 via "a.b.g" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11605) [Rust] Adopt a MSRV policy
Neville Dipale created ARROW-11605: -- Summary: [Rust] Adopt a MSRV policy Key: ARROW-11605 URL: https://issues.apache.org/jira/browse/ARROW-11605 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: Neville Dipale With all our crates now supporting stable Rust, we can decide on a Minimum Supported Rust Version, so that we don't introduce breakage to people relying on older Rust versions. We could: * Determine what the earliest Rust version that compiles is (at least 1.39 due to async in DF) * Use this version in CI * Decide on, and document, a policy for how we update versions This might mean that when there's fresh new changes landing in Stable, we'd likely hold off on them until those changes meet our MSRV. Thoughts [~Dandandan] [~alamb] [~jorgecarleitao] [~andygrove] [~paddyhoran] [~sunchao]? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11599) [Rust] Add function to create array with all nulls
Neville Dipale created ARROW-11599: -- Summary: [Rust] Add function to create array with all nulls Key: ARROW-11599 URL: https://issues.apache.org/jira/browse/ARROW-11599 Project: Apache Arrow Issue Type: New Feature Reporter: Neville Dipale Assignee: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11381) [Rust] [Parquet] LZ4 compressed files written in Rust can't be opened with C++
Neville Dipale created ARROW-11381: -- Summary: [Rust] [Parquet] LZ4 compressed files written in Rust can't be opened with C++ Key: ARROW-11381 URL: https://issues.apache.org/jira/browse/ARROW-11381 Project: Apache Arrow Issue Type: Bug Affects Versions: 3.0.0 Reporter: Neville Dipale Parquet files that are written with LZ4 compression, cannot be read from pyarrow. It seems that the issue might be the LZ4 block vs frame, which we're also seeing in ARROW-8767. I'll update this JIRA with more info, as I'm struggling to get pyspark up on MacOS (Rosetta 2 issues) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11365) [Rust] [Parquet] Implement parsers for v2 of the text schema
Neville Dipale created ARROW-11365: -- Summary: [Rust] [Parquet] Implement parsers for v2 of the text schema Key: ARROW-11365 URL: https://issues.apache.org/jira/browse/ARROW-11365 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 3.0.0 Reporter: Neville Dipale V2 of the writer produces schema like: required INT32 fieldname INTEGER(32, true); We should support parsing this format, as it maps to logical types. I'm unsure of what the implications are for fields that don't have a logical type representation, but have a converted type (e.g. INTERVAL). We can try write a V2 file with parquet-cpp and observe the behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11364) [Rust] Umbrella issue for parquet 2.6.0 support
Neville Dipale created ARROW-11364: -- Summary: [Rust] Umbrella issue for parquet 2.6.0 support Key: ARROW-11364 URL: https://issues.apache.org/jira/browse/ARROW-11364 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 3.0.0 Reporter: Neville Dipale This is the umbrella issue where we can collect everything related to parquet 2.6.0 support (parquet-format-rs: 2.6.1). It looks like there's some plumbing needed on the typesystem + parsing logic to fully support writing and reading v2 of the file format. Existing compatibility issues can also be linked to this, or added as sub-tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11312) [Rust] Implement FromIter for timestamps, that includes timezone info
Neville Dipale created ARROW-11312: -- Summary: [Rust] Implement FromIter for timestamps, that includes timezone info Key: ARROW-11312 URL: https://issues.apache.org/jira/browse/ARROW-11312 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale We currently have TimestampArray::from_vec and TimestampArray::from_opt_vec in order to provide timezone information. We do not have an option that uses FromIter. When implementing this, we should search the codebase (esp Parquet) and replace the vector-based methods above with iterators where it makes sense. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11308) [Rust] [Parquet] Add Arrow decimal array writer
Neville Dipale created ARROW-11308: -- Summary: [Rust] [Parquet] Add Arrow decimal array writer Key: ARROW-11308 URL: https://issues.apache.org/jira/browse/ARROW-11308 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11294) [Rust] [Parquet] Read list field correctly in >
Neville Dipale created ARROW-11294: -- Summary: [Rust] [Parquet] Read list field correctly in > Key: ARROW-11294 URL: https://issues.apache.org/jira/browse/ARROW-11294 Project: Apache Arrow Issue Type: Sub-task Reporter: Neville Dipale I noticed that when reading >>, we overwrite the list's field name with the struct's one. If we have a struct called "a", and a list called "items", the list gets the name "a", which is incorrect. See the test case called "arrow::arrow_writer::tests::arrow_writer_complex", which produces this behaviour. The test will be merged as part of ARROW-10766. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11271) [Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability
Neville Dipale created ARROW-11271: -- Summary: [Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability Key: ARROW-11271 URL: https://issues.apache.org/jira/browse/ARROW-11271 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale Assignee: Neville Dipale We currently do not propagate child nullability correctly when reading parquet files from Spark 3.0.1 (parquet-mr 1.10.1). For example, the below taken from [https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] is currently interpreted incorrectly: {code:java} // List (list nullable, elements non-null) optional group my_list (LIST) { repeated group list { required binary element (UTF8); } }{code} The Arrow type should be: {code:java} Field::new( "my_list", DataType::List( box Field::new("element", DataType::Utf8, nullable: false), ), nullable: true ){code} but we currently end up with {code:java} Field::new( "my_list", DataType::List( box Field::new("list", DataType::Utf8, nullable: true), ), nullable: true ) {code} This doesn't seem to be an issue with the master branch as of opening this issue, so it might not be severe enough to try force into the 3.0.0 release. I tested null and non-null Spark files, and was able to read them correctly. This becomes an issue with nested lists, which I'm working on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11194) [Rust] Enable SIMD for aarch64
Neville Dipale created ARROW-11194: -- Summary: [Rust] Enable SIMD for aarch64 Key: ARROW-11194 URL: https://issues.apache.org/jira/browse/ARROW-11194 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale Enable SIMD for aarch64, which includes the Apple ARM CPUs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11187) [Rust] [Parquet] Pin specific parquet-format-rs version
Neville Dipale created ARROW-11187: -- Summary: [Rust] [Parquet] Pin specific parquet-format-rs version Key: ARROW-11187 URL: https://issues.apache.org/jira/browse/ARROW-11187 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale We released paquet-format-rs v2.7.0, which has some incomatibilities with v2.6.x, so we should pin to the latter version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11181) [Rust] [Parquet] Document supported features
Neville Dipale created ARROW-11181: -- Summary: [Rust] [Parquet] Document supported features Key: ARROW-11181 URL: https://issues.apache.org/jira/browse/ARROW-11181 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale Document supported Parquet features in the Rust implementation, similar to ARROW-10941 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11126) [Rust] Docunent and test ARROW-10656
Neville Dipale created ARROW-11126: -- Summary: [Rust] Docunent and test ARROW-10656 Key: ARROW-11126 URL: https://issues.apache.org/jira/browse/ARROW-11126 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale Looks like I rebased against the PR branch, but didn't push my changes before the PR was merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11125) [Rust] Implement logical equality for list arrays
Neville Dipale created ARROW-11125: -- Summary: [Rust] Implement logical equality for list arrays Key: ARROW-11125 URL: https://issues.apache.org/jira/browse/ARROW-11125 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale Assignee: Neville Dipale We implemented logical equality for struct arrays, but not list arrays. This work is now required for the Parquet nested list writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11063) [Rust] Validate null counts when building arrays
Neville Dipale created ARROW-11063: -- Summary: [Rust] Validate null counts when building arrays Key: ARROW-11063 URL: https://issues.apache.org/jira/browse/ARROW-11063 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale ArrayDataBuilder allows the user to specify a null count, alternatively calculating it if it is not set. The problem is that the user-specified null count is never validated against the actual count of the buffer. I suggest removing the ability to specify a null-count, and instead always calculating it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11061) [Rust] Validate array properties against schema
Neville Dipale created ARROW-11061: -- Summary: [Rust] Validate array properties against schema Key: ARROW-11061 URL: https://issues.apache.org/jira/browse/ARROW-11061 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale We have a problem when it comes to nested arrays, where one could create a > where the array fields can't be null, but the list can have null slots. This creates a lot of work when working with such nested arrays, because we have to create work-arounds to account for this, and take unnecessarily slower paths. I propose that we prevent this problem at the source, by: * checking that a batch can't be created with arrays that have incompatible null contracts * preventing list and struct children from being non-null if any descendant of such children are null (might be less of an issue for structs) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11060) [Rust] Logical equality for list arrays
Neville Dipale created ARROW-11060: -- Summary: [Rust] Logical equality for list arrays Key: ARROW-11060 URL: https://issues.apache.org/jira/browse/ARROW-11060 Project: Apache Arrow Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Neville Dipale Apply logical equality to lists. This requires computing the merged nulls of a list and its child based on list offsets. For example, a list having 3 slots, and 5 values (0, 1, 3, 5) needs to be expanded to 5 null masks. If the list is validity is [true, false, true], and the values are [t, f, t, f, t] we would get: [t, f, f, t, t] AND [t, f, t, f, t] = [t, f, f, f, t] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10925) [Rust] Validate temporal data that has restrictions
Neville Dipale created ARROW-10925: -- Summary: [Rust] Validate temporal data that has restrictions Key: ARROW-10925 URL: https://issues.apache.org/jira/browse/ARROW-10925 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale Some temporal data types have restrictions (e.g. date64 should be a multiple of 8640). We should validate them when creating the arrays. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10893) [Rust] [DataFusion] Easier clippy fixes
Neville Dipale created ARROW-10893: -- Summary: [Rust] [DataFusion] Easier clippy fixes Key: ARROW-10893 URL: https://issues.apache.org/jira/browse/ARROW-10893 Project: Apache Arrow Issue Type: Sub-task Components: Rust - DataFusion Affects Versions: 2.0.0 Reporter: Neville Dipale Address some of the clippy lints that clippy can fix automatically -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10771) [Rust] Extend JSON schema inference to nested types
Neville Dipale created ARROW-10771: -- Summary: [Rust] Extend JSON schema inference to nested types Key: ARROW-10771 URL: https://issues.apache.org/jira/browse/ARROW-10771 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale Schema inference is currently limited to primitive types and lists of primitive types. This ticket is for work to extend it to nested types -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10770) [Rust] Support reading nested JSON lists
Neville Dipale created ARROW-10770: -- Summary: [Rust] Support reading nested JSON lists Key: ARROW-10770 URL: https://issues.apache.org/jira/browse/ARROW-10770 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale The JSON reader now supports reading nested structs, but we are still left with nested lists, which can be lists of lists, or lists of structs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10766) [Rust] Compute nested definition and repetition for list arrays
Neville Dipale created ARROW-10766: -- Summary: [Rust] Compute nested definition and repetition for list arrays Key: ARROW-10766 URL: https://issues.apache.org/jira/browse/ARROW-10766 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale This extends on ARROW-9728 by only focusing on list array repetition and definition levels -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10764) [Rust] Inline small JSON and CSV test files
Neville Dipale created ARROW-10764: -- Summary: [Rust] Inline small JSON and CSV test files Key: ARROW-10764 URL: https://issues.apache.org/jira/browse/ARROW-10764 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale Some of our tests use small CSV and JSON files, which we could inline in the code, instead of adding more files to test data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10757) [Rust] [CI] Sporadic failures due to disk filling up
Neville Dipale created ARROW-10757: -- Summary: [Rust] [CI] Sporadic failures due to disk filling up Key: ARROW-10757 URL: https://issues.apache.org/jira/browse/ARROW-10757 Project: Apache Arrow Issue Type: Bug Components: CI, Rust Reporter: Neville Dipale Assignee: Neville Dipale CI is failing due to disk size filling up, affecting almost all Rust PRs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10684) [Rust] Logical equality should consider parent array nullability
Neville Dipale created ARROW-10684: -- Summary: [Rust] Logical equality should consider parent array nullability Key: ARROW-10684 URL: https://issues.apache.org/jira/browse/ARROW-10684 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale When creating a struct array with a primitive child array, it is possible for the child to be non-nullable, while its parent struct array is nullable. In this scenario, the child array's slots where the parent is null, become invalidated, such that an array with [1, 2, 3] having slot 2 being null, should be interpreted as [1, 0, 3]. This issue becomes evident in Parquet roundtrip tests, as we end up not able to correctly compare nested structures that have non-null children. The specification caters for the above behaviour, see [http://arrow.apache.org/docs/format/Columnar.html#struct-layout] . When a struct has nulls, its child array(s) nullability is subject to the parent struct. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10674) [Rust] Add integration tests for Decimal type
Neville Dipale created ARROW-10674: -- Summary: [Rust] Add integration tests for Decimal type Key: ARROW-10674 URL: https://issues.apache.org/jira/browse/ARROW-10674 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale We have basic decimal support, but we have not yet included decimals in the integration testing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10550) [Rust] [Parquet] Write nested types (struct, list)
Neville Dipale created ARROW-10550: -- Summary: [Rust] [Parquet] Write nested types (struct, list) Key: ARROW-10550 URL: https://issues.apache.org/jira/browse/ARROW-10550 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale Fix For: 3.0.0 After being able to compute arbitrarily nested definition and repetitions, we should be able to write structs and lists -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10391) [Rust] [Parquet] Nested Arrow reader
Neville Dipale created ARROW-10391: -- Summary: [Rust] [Parquet] Nested Arrow reader Key: ARROW-10391 URL: https://issues.apache.org/jira/browse/ARROW-10391 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale The objective here is to create a reader that complies with at least Parquet 2.4.0. It complements the tasks for the writer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10334) [Rust] [Parquet] Support reading and writing Arrow NullArray
Neville Dipale created ARROW-10334: -- Summary: [Rust] [Parquet] Support reading and writing Arrow NullArray Key: ARROW-10334 URL: https://issues.apache.org/jira/browse/ARROW-10334 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10299) [Rust] Support reading and writing V5 of IPC metadata
Neville Dipale created ARROW-10299: -- Summary: [Rust] Support reading and writing V5 of IPC metadata Key: ARROW-10299 URL: https://issues.apache.org/jira/browse/ARROW-10299 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale This is mostly alignment issues and tracking when we encounter the v4 legacy padding. I had done this work in another branch, but discarded it without noticing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10289) [Rust] Support reading dictionary streams
Neville Dipale created ARROW-10289: -- Summary: [Rust] Support reading dictionary streams Key: ARROW-10289 URL: https://issues.apache.org/jira/browse/ARROW-10289 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 2.0.0 Reporter: Neville Dipale We support reading dictionaries in the IPC file reader. We should do the same with the stream reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10269) [Rust] Update nightly: Oct 2020 Edition
Neville Dipale created ARROW-10269: -- Summary: [Rust] Update nightly: Oct 2020 Edition Key: ARROW-10269 URL: https://issues.apache.org/jira/browse/ARROW-10269 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: Neville Dipale We should update to a more recent nighly after the 2.0.0 release. It carries some clippy annoyances, which will mean that I have to revert much of what I did around float comparisons. Might also be preferable to do this sooner, so that we can complete the clippy integration and throw away the carrot in favour of the stick. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10268) [Rust] Support writing dictionaries to IPC file and stream
Neville Dipale created ARROW-10268: -- Summary: [Rust] Support writing dictionaries to IPC file and stream Key: ARROW-10268 URL: https://issues.apache.org/jira/browse/ARROW-10268 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale We currently do not support writing dictionary arrays to the IPC file and stream format. When this is supported, we can test the integration with other implementations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10261) [Rust] [BREAKING] Lists should take Field instead of DataType
Neville Dipale created ARROW-10261: -- Summary: [Rust] [BREAKING] Lists should take Field instead of DataType Key: ARROW-10261 URL: https://issues.apache.org/jira/browse/ARROW-10261 Project: Apache Arrow Issue Type: Sub-task Components: Integration, Rust Affects Versions: 1.0.1 Reporter: Neville Dipale There is currently no way of tracking nested field metadata on lists. For example, if a list's children are nullable, there's no way of telling just by looking at the Field. This causes problems with integration testing, and also affects Parquet roundtrips. I propose the breaking change of [Large|FixedSize]List taking a Field instead of Box, as this will overcome this issue, and ensure that the Rust implementation passes integration tests. CC [~andygrove] [~jorgecarleitao] [~alamb] [~jhorstmann] ([~carols10cents] as this addresses some of the roundtrip failures). I'm leaning towards this landing in 3.0.0, as I'd love for us to have completed or made significant traction on the Arrow Parquet writer (and reader), and integration testing, by then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10259) [Rust] Support field metadata
Neville Dipale created ARROW-10259: -- Summary: [Rust] Support field metadata Key: ARROW-10259 URL: https://issues.apache.org/jira/browse/ARROW-10259 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale The biggest hurdle to adding field metadata is HashMap and HashSet not implementing Hash, Ord and PartialOrd. I was thinking of implementing the metadata as a Vec<(String, String)> to overcome this limitation, and then serializing correctly to JSON. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10258) [Rust] Support extension arrays
Neville Dipale created ARROW-10258: -- Summary: [Rust] Support extension arrays Key: ARROW-10258 URL: https://issues.apache.org/jira/browse/ARROW-10258 Project: Apache Arrow Issue Type: New Feature Components: Integration, Rust Affects Versions: 1.0.1 Reporter: Neville Dipale This should include: * supporting the Arrow format * supporting field metadata We can optionally: * support recognising known extensions (like UUID) I'm mainly opening this up for wider visibility, I noticed that I was catching strays from metadata integration tests failing because Field doesn't support metadata :( -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10225) [Rust] [Parquet] Fix bull bitmap comparisons in roundtrip tests
Neville Dipale created ARROW-10225: -- Summary: [Rust] [Parquet] Fix bull bitmap comparisons in roundtrip tests Key: ARROW-10225 URL: https://issues.apache.org/jira/browse/ARROW-10225 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale The Arrow spec allows makes the null bitmap optional if an array has no nulls [~carols10cents], so the tests that were failing were because we're comparing `None` with a 100% populated bitmap. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10198) [Dev] Python merge script doesn't close PRs if not merged on master
Neville Dipale created ARROW-10198: -- Summary: [Dev] Python merge script doesn't close PRs if not merged on master Key: ARROW-10198 URL: https://issues.apache.org/jira/browse/ARROW-10198 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Affects Versions: 1.0.1 Reporter: Neville Dipale When using the merge script to merge PRs against non-master branches, the PR on Github doesn't get closed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10191) [Rust] [Parquet] Add roundtrip tests for single column batches
Neville Dipale created ARROW-10191: -- Summary: [Rust] [Parquet] Add roundtrip tests for single column batches Key: ARROW-10191 URL: https://issues.apache.org/jira/browse/ARROW-10191 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale To aid with test coverage and picking up information loss during Parquet and Arrow roundtrips, we can add tests that assert that all supported Arrow datatypes can be written and read correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10168) [Rust] Extend arrow schema conversion to projected fields
Neville Dipale created ARROW-10168: -- Summary: [Rust] Extend arrow schema conversion to projected fields Key: ARROW-10168 URL: https://issues.apache.org/jira/browse/ARROW-10168 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale When writing Arrow data to Parquet, we serialise the schema's IPC representation. This schema is then read back by the Parquet reader, and used to preserve the array type information from the original Arrow data. We however do not rely on the above mechanism when reading projected columns from a Parquet file; i.e. if we have a file with 3 columns, but we only read 2 columns, we do not yet rely on the serialised arrow schema; and can thus lose type information. This behaviour was deliberately left out, as the function *parquet_to_arrow_schema_by_columns* does not check for the existence of arrow schema in the metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10103) [Rust] Add a Contains kernel
Neville Dipale created ARROW-10103: -- Summary: [Rust] Add a Contains kernel Key: ARROW-10103 URL: https://issues.apache.org/jira/browse/ARROW-10103 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale Add a `contains` function that checks whether a list array contains a primitive value. The result of the function is a boolean array -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10095) [Rust] [Parquet] Update for IPC changes
Neville Dipale created ARROW-10095: -- Summary: [Rust] [Parquet] Update for IPC changes Key: ARROW-10095 URL: https://issues.apache.org/jira/browse/ARROW-10095 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale The IPC changes made to comply with MetadataVersion 4 broke the rust-parquet writer branch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10041) [Rust] Possible to create LargeStringArray with DataType::Utf8
Neville Dipale created ARROW-10041: -- Summary: [Rust] Possible to create LargeStringArray with DataType::Utf8 Key: ARROW-10041 URL: https://issues.apache.org/jira/browse/ARROW-10041 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale We don't perform enough checks on ArrayData when creating StringArray and LargeStringArray. As they use different integer sizes for offsets, this can create a problem where Offset> could be correctly reinterpreted as Offset> and vice versa. We should add checks that pervent this. The same might apply for List and LargeList -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10040) [Rust] Create a way to slice unalligned offset buffers
Neville Dipale created ARROW-10040: -- Summary: [Rust] Create a way to slice unalligned offset buffers Key: ARROW-10040 URL: https://issues.apache.org/jira/browse/ARROW-10040 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale We have limitations on the boolean kernels, where we can't apply the kernels on buffers whose offsets aren't a multiple of 8. This has the potential of preventing users from applying some computations on arrays whose offsets aren't divisible by 8. We could create methods on Buffer that allow slicing buffers and copying them into aligned buffers. An idea would be Buffer::slice(&self, offset: usize, len: usize) -> Buffer; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9981) [Rust] Allow configuring flight IPC with IpcWriteOptions
Neville Dipale created ARROW-9981: - Summary: [Rust] Allow configuring flight IPC with IpcWriteOptions Key: ARROW-9981 URL: https://issues.apache.org/jira/browse/ARROW-9981 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.0 Reporter: Neville Dipale We have introduced an IPC write option, but we use the default for the arrow-flight crate, which is not ideal. Change this to allow configuring writer options. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9980) [Rust] Fix parquet crate clippy lints
Neville Dipale created ARROW-9980: - Summary: [Rust] Fix parquet crate clippy lints Key: ARROW-9980 URL: https://issues.apache.org/jira/browse/ARROW-9980 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.0 Reporter: Neville Dipale This addresses most clippy lints on the parquet crate. Other remaining lints can be addressed as part of future PRs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9979) [Rust] Fix arrow crate clippy lints
Neville Dipale created ARROW-9979: - Summary: [Rust] Fix arrow crate clippy lints Key: ARROW-9979 URL: https://issues.apache.org/jira/browse/ARROW-9979 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.0 Reporter: Neville Dipale This fixes many clippy lints, but not all. It takes hours to address lints, ansd we can work on remaining ones in future PRs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9978) [Rust] Umbrella issue for clippy integration
Neville Dipale created ARROW-9978: - Summary: [Rust] Umbrella issue for clippy integration Key: ARROW-9978 URL: https://issues.apache.org/jira/browse/ARROW-9978 Project: Apache Arrow Issue Type: New Feature Components: CI, Rust Affects Versions: 1.0.0 Reporter: Neville Dipale This is an umbrella issue to collate outstanding and new tasks to enable clippy integration -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9957) [Rust] Remove unmaintained tempdir dependency
Neville Dipale created ARROW-9957: - Summary: [Rust] Remove unmaintained tempdir dependency Key: ARROW-9957 URL: https://issues.apache.org/jira/browse/ARROW-9957 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Affects Versions: 1.0.0 Reporter: Neville Dipale Replace tempdir with tempfile, also removing older versions of some dependencies like rand. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9848) [Rust] Implement changes to ensure flatbuffer alignment
Neville Dipale created ARROW-9848: - Summary: [Rust] Implement changes to ensure flatbuffer alignment Key: ARROW-9848 URL: https://issues.apache.org/jira/browse/ARROW-9848 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.0 Reporter: Neville Dipale See ARROW-6313, changes were made to all IPC implementations except for Rust -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9841) [Rust] Update checked-in flatbuffer files
Neville Dipale created ARROW-9841: - Summary: [Rust] Update checked-in flatbuffer files Key: ARROW-9841 URL: https://issues.apache.org/jira/browse/ARROW-9841 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.0 Reporter: Neville Dipale We can't automatically generate flatbuffer files in Rust due to a bug with required fields. The currently checked-in generated files are outdated, and should either be updated manually or by building the flatbuffers project from master in order to update them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9777) [Rust] Implement IPC changes to catch up to 1.0.0 format
Neville Dipale created ARROW-9777: - Summary: [Rust] Implement IPC changes to catch up to 1.0.0 format Key: ARROW-9777 URL: https://issues.apache.org/jira/browse/ARROW-9777 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 1.0.0 Reporter: Neville Dipale There are a number of IPC changes and features which the Rust implementation has fallen behind on. It's effectively using the legacy format that was released in 0.14.x. Some that I encountered are: * change padding from 4 bytes to 8 bytes (along with the padding algorithm) * add an IPC writer option to support the legacy format and updated format * add error handling for the different metadata versions, we should support v4+ so it's an oversight to not explicitly return errors if unsupported versions are read Some of the work already has Jiras open (e.g. body compression), I'll find them and mark them as related to this. I'm tight for spare time, but I'll try work on this before the next release (along with the Parquet writer) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9728) [Rust] [Parquet] Compute nested spacing
Neville Dipale created ARROW-9728: - Summary: [Rust] [Parquet] Compute nested spacing Key: ARROW-9728 URL: https://issues.apache.org/jira/browse/ARROW-9728 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.0 Reporter: Neville Dipale When computing definition levels for deeply nested arrays that include lists, the definition levels are correctly calculated, but they are not translated into correct indexes for the eventual primitive arrays. For example, an int32 array could have no null values, but be a child of a list that has null values. If say the first 5 values of the int32 array are members of the first list item (i.e. list_array[0] = [1,2,3,4,5], and that list is itself a child of a struct whose index is null, the whole 5 values of the int32 array *should* be skipped. Further, the list's definition and repetition levels will be represented by 1 slot instead of the 5. The current logic cannot cater for this, and potentially results in slicing the int32 array incorrectly (sometimes including some of those first 5 values). This Jira is for the work necessary to compute the index into the eventual leaf arrays correctly. I started doing it as part of the initial writer PR, but it's complex and is blocking progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9413) [Rust] Fix clippy lint on master
Neville Dipale created ARROW-9413: - Summary: [Rust] Fix clippy lint on master Key: ARROW-9413 URL: https://issues.apache.org/jira/browse/ARROW-9413 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 0.17.1 Reporter: Neville Dipale There was a clippy lint error with the float sort PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9411) [Rust] Update dependencies
Neville Dipale created ARROW-9411: - Summary: [Rust] Update dependencies Key: ARROW-9411 URL: https://issues.apache.org/jira/browse/ARROW-9411 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 0.17.1 Reporter: Neville Dipale Update dependencies like tonic and rand (to reduce total dependencies) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9408) [Integration] Tests do not run in Windows due to numpy 64-bit errors
Neville Dipale created ARROW-9408: - Summary: [Integration] Tests do not run in Windows due to numpy 64-bit errors Key: ARROW-9408 URL: https://issues.apache.org/jira/browse/ARROW-9408 Project: Apache Arrow Issue Type: Bug Components: Integration Affects Versions: 0.17.1 Reporter: Neville Dipale We found that the integer range check when generating integration data doesn't work in Windows because the default C integers that numpy uses are 32-bit by default in Windows. This fixes that issue by forcing 64-bit integers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9292) [Rust] Update feature matrix with passing tests
Neville Dipale created ARROW-9292: - Summary: [Rust] Update feature matrix with passing tests Key: ARROW-9292 URL: https://issues.apache.org/jira/browse/ARROW-9292 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 0.17.0 Reporter: Neville Dipale When we created the feature matrix, I preemptively populated the Rust column with supported features. We've subsequently been having trouble with integration tests. This blocker is so that I can update the feature matrix before 1.0.0 release based on which tests are passing by then. CC [~wesm] [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9274) [Rust] Builds failing due to IPC test failures
Neville Dipale created ARROW-9274: - Summary: [Rust] Builds failing due to IPC test failures Key: ARROW-9274 URL: https://issues.apache.org/jira/browse/ARROW-9274 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Neville Dipale I just saw this after merging 2 PRs, I'm investigating what the cause of the failures is -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9082) [Rust] - Stream reader fail when steam not ended with (optional) 0xFFFFFFFF 0x00000000"
[ https://issues.apache.org/jira/browse/ARROW-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-9082: -- Component/s: Rust > [Rust] - Stream reader fail when steam not ended with (optional) 0x > 0x" > > > Key: ARROW-9082 > URL: https://issues.apache.org/jira/browse/ARROW-9082 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.17.1 >Reporter: Eyal Leshem >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > according to spec : > [https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format] , > the 0x 0x is optional in the arrow response stream , but > currently when client receive such response it's read all the batches well , > but return an error in the end (instead of Ok(None)) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9082) [Rust] - Stream reader fail when steam not ended with (optional) 0xFFFFFFFF 0x00000000"
[ https://issues.apache.org/jira/browse/ARROW-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-9082. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7384 [https://github.com/apache/arrow/pull/7384] > [Rust] - Stream reader fail when steam not ended with (optional) 0x > 0x" > > > Key: ARROW-9082 > URL: https://issues.apache.org/jira/browse/ARROW-9082 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.1 >Reporter: Eyal Leshem >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > according to spec : > [https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format] , > the 0x 0x is optional in the arrow response stream , but > currently when client receive such response it's read all the batches well , > but return an error in the end (instead of Ok(None)) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9062) [Rust] Support to read JSON into dictionary type
[ https://issues.apache.org/jira/browse/ARROW-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-9062. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7379 [https://github.com/apache/arrow/pull/7379] > [Rust] Support to read JSON into dictionary type > > > Key: ARROW-9062 > URL: https://issues.apache.org/jira/browse/ARROW-9062 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Sven Wagner-Boysen >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently a JSON reader build from a schema using the type dictionary for one > of the fields in the schema will fail with JsonError("struct types are not > yet supported") > {code:java} > let builder = ReaderBuilder::new().with_schema(..) > let mut reader: Reader = > builder.build::(File::open(path).unwrap()).unwrap(); > let rb = reader.next().unwrap() > {code} > > Suggested solution: > Support reading into a dictionary in Json Reader: > [https://github.com/apache/arrow/blob/master/rust/arrow/src/json/reader.rs#L368] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9088) [Rust] Recent version of arrow crate does not compile into wasm target
[ https://issues.apache.org/jira/browse/ARROW-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132699#comment-17132699 ] Neville Dipale commented on ARROW-9088: --- I've made prettytable-rs optional, so once the PR is merged, you should be able to turn it off. I forgot that we removed libc at some point, so it didn't dawn on me that we can now compile arrow to wasm. > [Rust] Recent version of arrow crate does not compile into wasm target > -- > > Key: ARROW-9088 > URL: https://issues.apache.org/jira/browse/ARROW-9088 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Sergey Todyshev >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > arrow 0.16 compiles successfully into wasm32-unknown-unknown, but recent git > version does not. it would be nice to fix that. > compiler errors: > > {noformat} > error[E0433]: failed to resolve: could not find `unix` in `os` > --> > /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18 > | > 41 | use std::os::unix::ffi::OsStringExt; > | could not find `unix` in `os` > > error[E0432]: unresolved import `unix` >--> > /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5 > | > 6 | use unix; > | no `unix` in the root{noformat} > the problem is that prettytable-rs dependency depends on term->dirs which > causes this error > consider making prettytable-rs as dev dependency > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9088) [Rust] Recent version of arrow crate does not compile into wasm target
[ https://issues.apache.org/jira/browse/ARROW-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale reassigned ARROW-9088: - Assignee: Neville Dipale > [Rust] Recent version of arrow crate does not compile into wasm target > -- > > Key: ARROW-9088 > URL: https://issues.apache.org/jira/browse/ARROW-9088 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Sergey Todyshev >Assignee: Neville Dipale >Priority: Major > > arrow 0.16 compiles successfully into wasm32-unknown-unknown, but recent git > version does not. it would be nice to fix that. > compiler errors: > > {noformat} > error[E0433]: failed to resolve: could not find `unix` in `os` > --> > /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18 > | > 41 | use std::os::unix::ffi::OsStringExt; > | could not find `unix` in `os` > > error[E0432]: unresolved import `unix` >--> > /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5 > | > 6 | use unix; > | no `unix` in the root{noformat} > the problem is that prettytable-rs dependency depends on term->dirs which > causes this error > consider making prettytable-rs as dev dependency > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9095) [Rust] Fix NullArray to comply with spec
Neville Dipale created ARROW-9095: - Summary: [Rust] Fix NullArray to comply with spec Key: ARROW-9095 URL: https://issues.apache.org/jira/browse/ARROW-9095 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 0.17.0 Reporter: Neville Dipale When I implemented the NullArray, I didn't comply with the spec under the premise that I'd handle reading and writing IPC in a spec-compliant way as that looked like the easier approach. After some integration testing, I realised that I wasn't doing it correctly, so it's better to comply with the spec by not allocating any buffers for the array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-3089) [Rust] Add ArrayBuilder for different Arrow arrays
[ https://issues.apache.org/jira/browse/ARROW-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-3089. --- Assignee: Neville Dipale Resolution: Implemented The remaining task was completed > [Rust] Add ArrayBuilder for different Arrow arrays > -- > > Key: ARROW-3089 > URL: https://issues.apache.org/jira/browse/ARROW-3089 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Chao Sun >Assignee: Neville Dipale >Priority: Major > Fix For: 1.0.0 > > > Similar to the CPP version, we should add `ArrayBuilder` for different kinds > of Arrow arrays. This provides a convenient way to incrementally build Arrow > arrays. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9007) [Rust] Support appending arrays by merging array data
[ https://issues.apache.org/jira/browse/ARROW-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-9007. --- Resolution: Fixed Issue resolved by pull request 7365 [https://github.com/apache/arrow/pull/7365] > [Rust] Support appending arrays by merging array data > - > > Key: ARROW-9007 > URL: https://issues.apache.org/jira/browse/ARROW-9007 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Affects Versions: 0.17.0 >Reporter: Neville Dipale >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > ARROW-9005 introduces a concat kernel which allows for concatenating multiple > arrays of the same type into a single array. This is useful for sorting on > multiple arrays, among other things. > The concat kernel is implemented for most array types, but not yet for nested > arrays (lists, structs, etc). > This Jira is for creating a way of appending/merging all array types, so that > concat (and functionality that depends on it) can support all array types. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-3089) [Rust] Add ArrayBuilder for different Arrow arrays
[ https://issues.apache.org/jira/browse/ARROW-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-3089: -- Fix Version/s: 1.0.0 > [Rust] Add ArrayBuilder for different Arrow arrays > -- > > Key: ARROW-3089 > URL: https://issues.apache.org/jira/browse/ARROW-3089 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Chao Sun >Priority: Major > Fix For: 1.0.0 > > > Similar to the CPP version, we should add `ArrayBuilder` for different kinds > of Arrow arrays. This provides a convenient way to incrementally build Arrow > arrays. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8170) [Rust] [Parquet] Allow Position to support arbitrary Cursor type
[ https://issues.apache.org/jira/browse/ARROW-8170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-8170: -- Summary: [Rust] [Parquet] Allow Position to support arbitrary Cursor type (was: [Rust] Allow Position to support arbitrary Cursor type) > [Rust] [Parquet] Allow Position to support arbitrary Cursor type > > > Key: ARROW-8170 > URL: https://issues.apache.org/jira/browse/ARROW-8170 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.16.0 >Reporter: Jeong, Heon >Priority: Trivial > Original Estimate: 1h > Remaining Estimate: 1h > > Hi, I'm currently writing in-memory page writer in order to support buffered > row group writer (just like in C++ version), and... > * I'd like to reuse SerializedPageWriter > * SerializedPageWriter requires sink supports util::Position (which is > private) > * There's Position impl for Cursor, but unnecessarily restricts to mutable > references for internal type. > So I'd like to make one line change in order to lift that type restriction > and allow me implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9053) [Rust] Add sort for lists and structs
Neville Dipale created ARROW-9053: - Summary: [Rust] Add sort for lists and structs Key: ARROW-9053 URL: https://issues.apache.org/jira/browse/ARROW-9053 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7252) [Rust] [Parquet] Reading UTF-8/JSON/ENUM field results in a lot of vec allocation
[ https://issues.apache.org/jira/browse/ARROW-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-7252: -- Summary: [Rust] [Parquet] Reading UTF-8/JSON/ENUM field results in a lot of vec allocation (was: [Rust] Reading UTF-8/JSON/ENUM field results in a lot of vec allocation) > [Rust] [Parquet] Reading UTF-8/JSON/ENUM field results in a lot of vec > allocation > - > > Key: ARROW-7252 > URL: https://issues.apache.org/jira/browse/ARROW-7252 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Wong Shek Hei >Priority: Minor > > While reading a very large parquet file with basically all string fields was > very slow(430MB gzipped), after profiling with osx instruments, I noticed > that a lot of time is spent in "convert_byte_array", in particular, > "reserving" and allocating Vec::with_capacity, which is done before > String::from_utf8_unchecked. > It seems like using String as the underlying storage is causing this(String > uses Vec for its underlying storage), this also requires copying from > slice to vec. > "Field::Str" is a pub enum so I am not sure how "refactorable" is the > String part, for example, converting it into a &str(we can perhaps then defer > the conversion from &[u8] to Vec until the user really needs a String) > But of course, changing it to &str can result in quite a bit of interface > changes... So I am wondering if there are already some plans or solution on > the way to improve the handling of the "Field::Str" case? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5153) [Rust] [Parquet] Use IntoIter trait for write_batch/write_mini_batch
[ https://issues.apache.org/jira/browse/ARROW-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-5153: -- Summary: [Rust] [Parquet] Use IntoIter trait for write_batch/write_mini_batch (was: [Rust] Use IntoIter trait for write_batch/write_mini_batch) > [Rust] [Parquet] Use IntoIter trait for write_batch/write_mini_batch > > > Key: ARROW-5153 > URL: https://issues.apache.org/jira/browse/ARROW-5153 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Xavier Lange >Priority: Major > > Writing data to a parquet file requires a lot of copying and intermediate Vec > creation. Take a record struct like: > {{struct MyData {}}{{ name: String,}}{{ address: Option}}{{}}} > Over the course of working sets of this data, you'll have the bulk data > Vec, the names column in a Vec<&String>, the address column in a > Vec>. This puts extra memory pressure on the system, at the > minimum we have to allocate a Vec the same size as the bulk data even if we > are using references. > What I'm proposing is to use an IntoIter style. This will maintain backward > compat as a slice automatically implements IntoIter. Where > ColumnWriterImpl#write_batch goes from "values: &[T::T]"to values "values: > IntoIter". Then you can do things like > {{ write_batch(bulk.iter().map(|x| x.name), None, None)}}{{ > write_batch(bulk.iter().map(|x| x.address), Some(bulk.iter().map(|x| > x.is_some())), None)}} > and you can see there's no need for an intermediate Vec, so no short-term > allocations to write out the data. > I am writing data with many columns and I think this would really help to > speed things up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4927) [Rust] Update top level README to describe current functionality
[ https://issues.apache.org/jira/browse/ARROW-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-4927: -- Fix Version/s: 1.0.0 > [Rust] Update top level README to describe current functionality > > > Key: ARROW-4927 > URL: https://issues.apache.org/jira/browse/ARROW-4927 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.12.0 >Reporter: Andy Grove >Priority: Minor > Fix For: 1.0.0 > > > Update top level Rust README to reflect new functionality, such as SIMD, > cast, date/time, DataFusion, etc -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4465) [Rust] [DataFusion] Add support for ORDER BY
[ https://issues.apache.org/jira/browse/ARROW-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127239#comment-17127239 ] Neville Dipale commented on ARROW-4465: --- [~andygrove] [~houqp] does ARROW-9005 completely cover this? > [Rust] [DataFusion] Add support for ORDER BY > > > Key: ARROW-4465 > URL: https://issues.apache.org/jira/browse/ARROW-4465 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > > As a user, I would like to be able to specify an ORDER BY clause on my query. > Work involved: > * Add OrderBy to LogicalPlan enum > * Write query planner code to translate SQL AST to OrderBy (SQL parser that > we use already supports parsing ORDER BY) > * Implement SortRelation > My high level thoughts on implementing the SortRelation: > * Create Arrow array of uint32 same size as batch and populate such that > each element contains its own index i.e. array will be 0, 1, 2, 3 > * Find a Rust crate for sorting that allows us to provide our own comparison > lambda > * Implement the comparison logic (probably can reuse existing execution code > - see filter.rs for how it implements comparison expressions) > * Use index array to store the result of the sort i.e. no need to rewrite > the whole batch, just the index > * Rewrite the batch after the sort has completed > It would also be good to see how Gandiva has implemented this > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6189) [Rust] [Parquet] Plain encoded boolean column chunks limited to 2048 values
[ https://issues.apache.org/jira/browse/ARROW-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-6189: -- Summary: [Rust] [Parquet] Plain encoded boolean column chunks limited to 2048 values (was: [Rust] Plain encoded boolean column chunks limited to 2048 values) > [Rust] [Parquet] Plain encoded boolean column chunks limited to 2048 values > --- > > Key: ARROW-6189 > URL: https://issues.apache.org/jira/browse/ARROW-6189 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.14.1 >Reporter: Simon Jones >Priority: Major > > encoding::PlainEncoder::new creates a BitWriter with 256 bytes of storage, > which limits the data page size that can be used. > I suggest that in > {{impl Encoder for PlainEncoder}} > the return value of put_value is tested and the BitWriter flushed+cleared > whenever it runs out of space. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8993) [Rust] Support reading non-seekable sources in text readers
[ https://issues.apache.org/jira/browse/ARROW-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-8993: -- Fix Version/s: 1.0.0 > [Rust] Support reading non-seekable sources in text readers > --- > > Key: ARROW-8993 > URL: https://issues.apache.org/jira/browse/ARROW-8993 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mohamed Zenadi >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > It would be interesting to be able to read already compressed json files. > This is is regularly used, with many storing their files as json.gz (we do > the same). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-9047) [Rust] Setting 0-bits of a 0-length bitset segfaults
[ https://issues.apache.org/jira/browse/ARROW-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-9047. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7360 [https://github.com/apache/arrow/pull/7360] > [Rust] Setting 0-bits of a 0-length bitset segfaults > > > Key: ARROW-9047 > URL: https://issues.apache.org/jira/browse/ARROW-9047 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Max Burke >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > See PR for details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9047) [Rust] Setting 0-bits of a 0-length bitset segfaults
[ https://issues.apache.org/jira/browse/ARROW-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale reassigned ARROW-9047: - Assignee: Max Burke > [Rust] Setting 0-bits of a 0-length bitset segfaults > > > Key: ARROW-9047 > URL: https://issues.apache.org/jira/browse/ARROW-9047 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.17.0 >Reporter: Max Burke >Assignee: Max Burke >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See PR for details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9047) [Rust] Setting 0-bits of a 0-length bitset segfaults
[ https://issues.apache.org/jira/browse/ARROW-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-9047: -- Component/s: Rust > [Rust] Setting 0-bits of a 0-length bitset segfaults > > > Key: ARROW-9047 > URL: https://issues.apache.org/jira/browse/ARROW-9047 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.17.0 >Reporter: Max Burke >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See PR for details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8723) [Rust] Remove SIMD specific benchmark code
[ https://issues.apache.org/jira/browse/ARROW-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-8723. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7359 [https://github.com/apache/arrow/pull/7359] > [Rust] Remove SIMD specific benchmark code > -- > > Key: ARROW-8723 > URL: https://issues.apache.org/jira/browse/ARROW-8723 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Now that SIMD is behind a feature flag it's trivial to compare SIMD vs > non-SIMD and the SIMD versions of benchmarks can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8723) [Rust] Remove SIMD specific benchmark code
[ https://issues.apache.org/jira/browse/ARROW-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-8723: -- Affects Version/s: 0.17.0 > [Rust] Remove SIMD specific benchmark code > -- > > Key: ARROW-8723 > URL: https://issues.apache.org/jira/browse/ARROW-8723 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Affects Versions: 0.17.0 >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Now that SIMD is behind a feature flag it's trivial to compare SIMD vs > non-SIMD and the SIMD versions of benchmarks can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9047) [Rust] Setting 0-bits of a 0-length bitset segfaults
[ https://issues.apache.org/jira/browse/ARROW-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-9047: -- Affects Version/s: 0.17.0 > [Rust] Setting 0-bits of a 0-length bitset segfaults > > > Key: ARROW-9047 > URL: https://issues.apache.org/jira/browse/ARROW-9047 > Project: Apache Arrow > Issue Type: Improvement >Affects Versions: 0.17.0 >Reporter: Max Burke >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See PR for details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9007) [Rust] Support appending arrays by merging array data
[ https://issues.apache.org/jira/browse/ARROW-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-9007: -- Fix Version/s: 1.0.0 > [Rust] Support appending arrays by merging array data > - > > Key: ARROW-9007 > URL: https://issues.apache.org/jira/browse/ARROW-9007 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Affects Versions: 0.17.0 >Reporter: Neville Dipale >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > ARROW-9005 introduces a concat kernel which allows for concatenating multiple > arrays of the same type into a single array. This is useful for sorting on > multiple arrays, among other things. > The concat kernel is implemented for most array types, but not yet for nested > arrays (lists, structs, etc). > This Jira is for creating a way of appending/merging all array types, so that > concat (and functionality that depends on it) can support all array types. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-9007) [Rust] Support appending arrays by merging array data
[ https://issues.apache.org/jira/browse/ARROW-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale reassigned ARROW-9007: - Assignee: Neville Dipale > [Rust] Support appending arrays by merging array data > - > > Key: ARROW-9007 > URL: https://issues.apache.org/jira/browse/ARROW-9007 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Affects Versions: 0.17.0 >Reporter: Neville Dipale >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > ARROW-9005 introduces a concat kernel which allows for concatenating multiple > arrays of the same type into a single array. This is useful for sorting on > multiple arrays, among other things. > The concat kernel is implemented for most array types, but not yet for nested > arrays (lists, structs, etc). > This Jira is for creating a way of appending/merging all array types, so that > concat (and functionality that depends on it) can support all array types. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8993) [Rust] Support reading non-seekable sources in text readers
[ https://issues.apache.org/jira/browse/ARROW-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale updated ARROW-8993: -- Summary: [Rust] Support reading non-seekable sources in text readers (was: [Rust] Support gzipped json files) > [Rust] Support reading non-seekable sources in text readers > --- > > Key: ARROW-8993 > URL: https://issues.apache.org/jira/browse/ARROW-8993 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Mohamed Zenadi >Priority: Minor > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > It would be interesting to be able to read already compressed json files. > This is is regularly used, with many storing their files as json.gz (we do > the same). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8906) [Rust] Support reading multiple CSV files for schema inference
[ https://issues.apache.org/jira/browse/ARROW-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-8906. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7252 [https://github.com/apache/arrow/pull/7252] > [Rust] Support reading multiple CSV files for schema inference > -- > > Key: ARROW-8906 > URL: https://issues.apache.org/jira/browse/ARROW-8906 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)