[jira] [Created] (ARROW-8881) [Rust] Add large list and binary support
Neville Dipale created ARROW-8881: - Summary: [Rust] Add large list and binary support Key: ARROW-8881 URL: https://issues.apache.org/jira/browse/ARROW-8881 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 0.17.0 Reporter: Neville Dipale Rust does not yet support large lists and large binary arrays. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8883) [Rust] [Integration Testing] Disable unsupported tests
Neville Dipale created ARROW-8883: - Summary: [Rust] [Integration Testing] Disable unsupported tests Key: ARROW-8883 URL: https://issues.apache.org/jira/browse/ARROW-8883 Project: Apache Arrow Issue Type: Sub-task Components: Integration, Rust Affects Versions: 0.17.0 Reporter: Neville Dipale Some of the integration test failures can be avoided by disabling unsupported tests, like large lists and nested types -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9007) [Rust] Support appending arrays by merging array data
Neville Dipale created ARROW-9007: - Summary: [Rust] Support appending arrays by merging array data Key: ARROW-9007 URL: https://issues.apache.org/jira/browse/ARROW-9007 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.17.0 Reporter: Neville Dipale ARROW-9005 introduces a concat kernel which allows for concatenating multiple arrays of the same type into a single array. This is useful for sorting on multiple arrays, among other things. The concat kernel is implemented for most array types, but not yet for nested arrays (lists, structs, etc). This Jira is for creating a way of appending/merging all array types, so that concat (and functionality that depends on it) can support all array types. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9053) [Rust] Add sort for lists and structs
Neville Dipale created ARROW-9053: - Summary: [Rust] Add sort for lists and structs Key: ARROW-9053 URL: https://issues.apache.org/jira/browse/ARROW-9053 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9095) [Rust] Fix NullArray to comply with spec
Neville Dipale created ARROW-9095: - Summary: [Rust] Fix NullArray to comply with spec Key: ARROW-9095 URL: https://issues.apache.org/jira/browse/ARROW-9095 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 0.17.0 Reporter: Neville Dipale When I implemented the NullArray, I didn't comply with the spec under the premise that I'd handle reading and writing IPC in a spec-compliant way as that looked like the easier approach. After some integration testing, I realised that I wasn't doing it correctly, so it's better to comply with the spec by not allocating any buffers for the array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6650) [Rust] [Integration] Add method to generate JSON from RecordBatch
Neville Dipale created ARROW-6650: - Summary: [Rust] [Integration] Add method to generate JSON from RecordBatch Key: ARROW-6650 URL: https://issues.apache.org/jira/browse/ARROW-6650 Project: Apache Arrow Issue Type: Sub-task Components: Integration, Rust Affects Versions: 0.14.1 Reporter: Neville Dipale [~emkornfi...@gmail.com] recommended that we use the integration IPC files. To be able to compare against the JSON files that are used, we need to be able to generate a JSON represention of Arrow data in Rust. We can already do this for schemas, and this ticket is for supporting converting RecordBatch to JSON. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6928) [Rust] Add FixedSizeList type
Neville Dipale created ARROW-6928: - Summary: [Rust] Add FixedSizeList type Key: ARROW-6928 URL: https://issues.apache.org/jira/browse/ARROW-6928 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale Support FixedSizeList, which is required for integration testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6944) [Rust] Add StringType
Neville Dipale created ARROW-6944: - Summary: [Rust] Add StringType Key: ARROW-6944 URL: https://issues.apache.org/jira/browse/ARROW-6944 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale Create a separate String type which uses UTF8, and restrict the BinaryArray to opaque binary data -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7193) [Rust] Create Arrow stream reader
Neville Dipale created ARROW-7193: - Summary: [Rust] Create Arrow stream reader Key: ARROW-7193 URL: https://issues.apache.org/jira/browse/ARROW-7193 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7194) [Rust] CSV Writer causing recursion errors
Neville Dipale created ARROW-7194: - Summary: [Rust] CSV Writer causing recursion errors Key: ARROW-7194 URL: https://issues.apache.org/jira/browse/ARROW-7194 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Neville Dipale As reported in [https://github.com/apache/arrow/pull/5805], the CSV writer's use of std::io::Write is causing recursion issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7207) [Rust] Update Generated Flatbuffer Files
Neville Dipale created ARROW-7207: - Summary: [Rust] Update Generated Flatbuffer Files Key: ARROW-7207 URL: https://issues.apache.org/jira/browse/ARROW-7207 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale We last built the fbs files early in the year, and since then there have been some changes like LargeLists. We should update the generated Rust files to incorporate these changes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7324) [Rust] Add Timezone to Timestamp
Neville Dipale created ARROW-7324: - Summary: [Rust] Add Timezone to Timestamp Key: ARROW-7324 URL: https://issues.apache.org/jira/browse/ARROW-7324 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale Proposal to add timestamp to timezone type -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7364) [Rust] Add cast options to cast kernel
Neville Dipale created ARROW-7364: - Summary: [Rust] Add cast options to cast kernel Key: ARROW-7364 URL: https://issues.apache.org/jira/browse/ARROW-7364 Project: Apache Arrow Issue Type: Improvement Reporter: Neville Dipale The cast kernels currently do not take explicit options, but instead convert overflows and invalid uft8 to nulls. We can create options that customise the behaviour, similarly to CastOptions in CPP ([https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.h#L38]) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7460) [Rust] Improve arithmetic kernels with autovec
Neville Dipale created ARROW-7460: - Summary: [Rust] Improve arithmetic kernels with autovec Key: ARROW-7460 URL: https://issues.apache.org/jira/browse/ARROW-7460 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 0.15.1 Reporter: Neville Dipale In a comment to an open ticket for optimising a cast kernel by using SIMD, [~andy-thomason] mentioned that LLVM does autovec well for Rust. I'd like to explore whether we could improve the kernel performance by simplifying the loops enough to allow the compiler to vectorise. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7475) [Rust] Create Arrow Stream writer
Neville Dipale created ARROW-7475: - Summary: [Rust] Create Arrow Stream writer Key: ARROW-7475 URL: https://issues.apache.org/jira/browse/ARROW-7475 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7521) [Rust] Remove tuple on FixedSizeList datatype
Neville Dipale created ARROW-7521: - Summary: [Rust] Remove tuple on FixedSizeList datatype Key: ARROW-7521 URL: https://issues.apache.org/jira/browse/ARROW-7521 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale The FixedSizeList datatype takes a tuple of Box and length, but this could be simplified to take the two values without a tuple. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7620) [Rust] Windows builds failing due to flatbuffer compile error
Neville Dipale created ARROW-7620: - Summary: [Rust] Windows builds failing due to flatbuffer compile error Key: ARROW-7620 URL: https://issues.apache.org/jira/browse/ARROW-7620 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Neville Dipale I've noticed now on a few PRs whose tests should otherwise pass, that the Rust Windows tests are failing due to `*_generated.rs` not being found while trying to rename the generated flatbuffer files. An example is at [https://github.com/apache/arrow/pull/6227/checks?check_run_id=397505832] + flatc --rust -o arrow/src/ipc/gen/ ../format/File.fbs ../format/Message.fbs ../format/Schema.fbs ../format/SparseTensor.fbs ../format/Tensor.fbs + find arrow/src/ipc/gen/ -name '*_generated.rs' -exec sed -i s/type__type/type_type/g '{}' ';' File not found - *_generated.rs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7704) [Rust] Support sort
Neville Dipale created ARROW-7704: - Summary: [Rust] Support sort Key: ARROW-7704 URL: https://issues.apache.org/jira/browse/ARROW-7704 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Neville Dipale This lays out the work needed to support sorting arrays and record batches -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7705) [Rust] Initial sort implementation
Neville Dipale created ARROW-7705: - Summary: [Rust] Initial sort implementation Key: ARROW-7705 URL: https://issues.apache.org/jira/browse/ARROW-7705 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale An initial sort implementation that allows sorting an array by various options (e.g. sort order). This is mainly to iterate on the design and inner workings of a sort algorithm. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7924) [Rust] Add sort for float types
Neville Dipale created ARROW-7924: - Summary: [Rust] Add sort for float types Key: ARROW-7924 URL: https://issues.apache.org/jira/browse/ARROW-7924 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale Floats need a different sort approach than other primitives, and this ticket will implement them separately -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8308) [Rust] [Flight] Implement DoExchange on examples
Neville Dipale created ARROW-8308: - Summary: [Rust] [Flight] Implement DoExchange on examples Key: ARROW-8308 URL: https://issues.apache.org/jira/browse/ARROW-8308 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale The gRPC server examples in Rust require all trait members to be exhaustively implemented. The recent `DoExchange` endpoint to the Flight service is causing failures in Rust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-4449) [Rust] Convert File to T: Read + Seek for schema inference
Neville Dipale created ARROW-4449: - Summary: [Rust] Convert File to T: Read + Seek for schema inference Key: ARROW-4449 URL: https://issues.apache.org/jira/browse/ARROW-4449 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale Assignee: Neville Dipale Arrow-4376 allowed us to read csv from a record iterator. We still require a `File` when inferring schemas. We propose changing from a File to something more generic. See discussion: https://github.com/apache/arrow/pull/3508#issuecomment-457986171 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4463) [Rust] Support read:write of Feather files
Neville Dipale created ARROW-4463: - Summary: [Rust] Support read:write of Feather files Key: ARROW-4463 URL: https://issues.apache.org/jira/browse/ARROW-4463 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale As an Arrow developer/user, I'd like to be able to read and write Feather files. The current I/O story in Rust isn't great, we don't yet fully support reading and writing between Parquet, we can only read CSV but not yet writing. This is an inconvenience (at least for me). I propose supporting the Feather format in Rust, initially with the following limitations: * No date/time support until ARROW-4386 (and potentially more work) lands * Reading categorical data (from other languages) but not writing them * Reading and writing from and to single record batches. We don't yet support slicing of arrays ARROW-3954 If the above are accept(ed|able), we can enhance the Feather support as the dependencies on the above limitations are lifted. We can also refactor the Feather code as we work on more IPC in Rust. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4534) [Rust] Build JSON reader for reading record batches from line-delimited JSON files
Neville Dipale created ARROW-4534: - Summary: [Rust] Build JSON reader for reading record batches from line-delimited JSON files Key: ARROW-4534 URL: https://issues.apache.org/jira/browse/ARROW-4534 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale Similar to ARROW-694, this is an umbrella issue for supporting reading JSON line-delimited files in Arrow. I have a reference implementation at [https://github.com/nevi-me/rust-dataframe/blob/io/json/src/io/json.rs,] where I'm building a Rust-based dataframe library using Arrow. I'd like us to have feature parity with CPP at some point. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4540) [Rust] Add basic JSON reader
Neville Dipale created ARROW-4540: - Summary: [Rust] Add basic JSON reader Key: ARROW-4540 URL: https://issues.apache.org/jira/browse/ARROW-4540 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale This is the first step in getting a JSON reader working in Rust -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4544) [Rust] Read nested JSON structs into StructArrays
Neville Dipale created ARROW-4544: - Summary: [Rust] Read nested JSON structs into StructArrays Key: ARROW-4544 URL: https://issues.apache.org/jira/browse/ARROW-4544 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale _Adding this as a separate task as it's a bit involved._ Add the ability to read in JSON structs that are children of the JSON record being read. The main concern here is deeply nested structures, which will require a performant and reusable basic JSON reader before dealing with recursion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4556) [Rust] Preserve order of JSON inferred schema
Neville Dipale created ARROW-4556: - Summary: [Rust] Preserve order of JSON inferred schema Key: ARROW-4556 URL: https://issues.apache.org/jira/browse/ARROW-4556 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale serde_json has the ability to preserve order of JSON records read. This feature might be necessary to ensure that schema inference returns a consistent order of fields each time. I'd like to add it separately as I'd also need to update JSON tests in datatypes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4680) [CI] [Rust] Travis CI builds fail with latest Rust 1.34.0-nightly (2019-02-25)
Neville Dipale created ARROW-4680: - Summary: [CI] [Rust] Travis CI builds fail with latest Rust 1.34.0-nightly (2019-02-25) Key: ARROW-4680 URL: https://issues.apache.org/jira/browse/ARROW-4680 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, Rust Reporter: Neville Dipale There's an unstable feature that's now marked for stabilisation in 1.34, and as a result Travis builds are failing. This is affecting all PRs that have been created or updated from 26 Feb 2019. AppVeyor only emits failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4769) [Rust] Improve array limit function where max records > len
Neville Dipale created ARROW-4769: - Summary: [Rust] Improve array limit function where max records > len Key: ARROW-4769 URL: https://issues.apache.org/jira/browse/ARROW-4769 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale When we have an array of n records, and we want to take a limit that's higher or equat to n, we still iterate through the array values and create a new array. We could improve this by returning a copy of the array as-is. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4804) [Rust] Read temporal values from CSV
Neville Dipale created ARROW-4804: - Summary: [Rust] Read temporal values from CSV Key: ARROW-4804 URL: https://issues.apache.org/jira/browse/ARROW-4804 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale CSV reader should support reading temporal values. Should support timestamp, date and time, with sane defaults provided for schema inference. To keep inference performant. user should provide a Vec of which columns to try convert to a temporal array -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4803) [Rust] Read temporal values from JSON
Neville Dipale created ARROW-4803: - Summary: [Rust] Read temporal values from JSON Key: ARROW-4803 URL: https://issues.apache.org/jira/browse/ARROW-4803 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale Ability to parse strings that look like timestamps to timestamp type. Need to consider whether only timestamp type should be supported as most JSON libraries stick to ISO8601. It might also be inefficient to use regex for timestamps, so the user should provide a hint of which columns to convert to timestamps -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4806) [Rust] Support casting temporal arrays in cast kernels
Neville Dipale created ARROW-4806: - Summary: [Rust] Support casting temporal arrays in cast kernels Key: ARROW-4806 URL: https://issues.apache.org/jira/browse/ARROW-4806 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale [ARROW-3882] is too far in the review process to add temporal casts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4805) [Rust] Write temporal arrays to CSV
Neville Dipale created ARROW-4805: - Summary: [Rust] Write temporal arrays to CSV Key: ARROW-4805 URL: https://issues.apache.org/jira/browse/ARROW-4805 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale The CSV writer should start supporting writing temporal arrays back to disk. To be consistent with norms, we should look at what other libraries do for date and time where the resolution is greater than seconds, and potentially deal with the below: * Is there optionality to how dates are written, or should it always be DD/MM/. * Should / or - be used? * Should time types be written as HH:MM:SS.ms, or 12345ms, 12345us, 12345ns? * Should timestamps always be written in the ISO8601 JSONlike format? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4853) [Rust] Array slice doesn't work on ListArray and StructArray
Neville Dipale created ARROW-4853: - Summary: [Rust] Array slice doesn't work on ListArray and StructArray Key: ARROW-4853 URL: https://issues.apache.org/jira/browse/ARROW-4853 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Neville Dipale -ARROW-3954- added the ability to slice arrays. It's been implemented on the Array trait, so callers might expect it to also work on ListArray and StructArray. It looks like for ListArray, the offset buffer is sliced, but the child_data buffer is not modified. This leads to an assertion failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4854) [Rust] Use Array Slice for limit kernel
Neville Dipale created ARROW-4854: - Summary: [Rust] Use Array Slice for limit kernel Key: ARROW-4854 URL: https://issues.apache.org/jira/browse/ARROW-4854 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 0.13.0 Reporter: Neville Dipale We currently reconstruct an array when taking a limit from it, we can improve performance by using slice from ARROW-3954 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4865) [Rust] Support casting lists and primitives to lists
Neville Dipale created ARROW-4865: - Summary: [Rust] Support casting lists and primitives to lists Key: ARROW-4865 URL: https://issues.apache.org/jira/browse/ARROW-4865 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale This adds support for casting between list arrays and from primitive arrays to single-value list arrays -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4886) [Rust] Inconsistent behaviour with casting sliced primitive array to list array
Neville Dipale created ARROW-4886: - Summary: [Rust] Inconsistent behaviour with casting sliced primitive array to list array Key: ARROW-4886 URL: https://issues.apache.org/jira/browse/ARROW-4886 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 0.12.0 Reporter: Neville Dipale [~csun] I was going through the C++ cast implementation to see if I've missed anything, and I noticed that ListCastKernel ([https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L665]) doesn't support casting non-zero-offset arrays. So I investigated what happens in Rust ARROW-4865. I found an inconsistency where inheriting the incoming array's offset could lead us to read invalid data. I tried fixing it, but found that a buffer that I expected to be invalid was being returned as valid, but returning invalid data. I've currently disabled casting primitive to array where the offset is not zero, and I'd like to wait for ARROW-4853 so I can see how sliced lists behave, and fix this inconsistency. That might only happen in 0.14, so I'm fine with that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4914) [Rust] Array slice returns incorrect bitmask
Neville Dipale created ARROW-4914: - Summary: [Rust] Array slice returns incorrect bitmask Key: ARROW-4914 URL: https://issues.apache.org/jira/browse/ARROW-4914 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 0.13.0 Reporter: Neville Dipale Slicing arrays changes the offset, length and null count of their array data, but the bitmask is not changed. This results in the correct null count, but the array values might be marked incorrectly as valid/invalid based on the old bitmask positions before the offset. To reproduce, create an array with some null values, slice the array, and then dbg!() it (after downcasting). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4968) [Rust] StructArray builder and From<> methods should check that field types match schema
Neville Dipale created ARROW-4968: - Summary: [Rust] StructArray builder and From<> methods should check that field types match schema Key: ARROW-4968 URL: https://issues.apache.org/jira/browse/ARROW-4968 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 0.13.0 Reporter: Neville Dipale Similar to how we assert that array data types are equal to their field types, we should do the same for StructArray and StructBuilder where necessary -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5180) [Rust] IPC Support
Neville Dipale created ARROW-5180: - Summary: [Rust] IPC Support Key: ARROW-5180 URL: https://issues.apache.org/jira/browse/ARROW-5180 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Neville Dipale The overall ticket to keep track of initial IPC support -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5181) [Rust] Create Arrow File reader
Neville Dipale created ARROW-5181: - Summary: [Rust] Create Arrow File reader Key: ARROW-5181 URL: https://issues.apache.org/jira/browse/ARROW-5181 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale Initial support for reading the Arrow File format -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5182) [Rust] Create Arrow File writer
Neville Dipale created ARROW-5182: - Summary: [Rust] Create Arrow File writer Key: ARROW-5182 URL: https://issues.apache.org/jira/browse/ARROW-5182 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5187) [Rust] Ability to flatten StructArray into a RecordBatch
Neville Dipale created ARROW-5187: - Summary: [Rust] Ability to flatten StructArray into a RecordBatch Key: ARROW-5187 URL: https://issues.apache.org/jira/browse/ARROW-5187 Project: Apache Arrow Issue Type: New Feature Components: Rust Affects Versions: 0.13.0 Reporter: Neville Dipale Add the ability to flatten a schema into a record batch. StructBuilder and StructArray have convenient methods to build multiple arrays. Being able to use these convenient methods and then convert the result to a record batch reduces the amount of boilerplate when creating Arrow data from sources like databases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5188) [Rust] Add temporal builders for StructArray
Neville Dipale created ARROW-5188: - Summary: [Rust] Add temporal builders for StructArray Key: ARROW-5188 URL: https://issues.apache.org/jira/browse/ARROW-5188 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Neville Dipale StructBuilder currently doesn't have builders for temporal arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5191) [Rust] Expose schema in readers (CSV, JSON) without reading batches
Neville Dipale created ARROW-5191: - Summary: [Rust] Expose schema in readers (CSV, JSON) without reading batches Key: ARROW-5191 URL: https://issues.apache.org/jira/browse/ARROW-5191 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale It's sometimes convenient to be able to view a datasource's schema without reading the first record batch. This is a proposal to create a `pub fn schema(&self) -> Arc` on the various readers that we support. I think this would also enable schema inference in datafusion -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5303) [Rust] Add SIMD vectorization of numeric casts
Neville Dipale created ARROW-5303: - Summary: [Rust] Add SIMD vectorization of numeric casts Key: ARROW-5303 URL: https://issues.apache.org/jira/browse/ARROW-5303 Project: Apache Arrow Issue Type: Improvement Components: Rust Affects Versions: 0.13.0 Reporter: Neville Dipale To improve the performance of cast kernels, we need SIMD support in numeric casts. An initial exploration shows that we can't trivially add SIMD casts between our Arrow T::Simd types, because `packed_simd` only supports a cast between T::Simd types that have the same number of lanes. This means that adding casts from f64 to i64 (same lane length) satisfies the bound trait `where TO::Simd : packed_simd::FromCast`, but f64 to i32 (different lane length) doesn't. We would benefit from investigating work-arounds to this limitation. Please see [github::nevi_me::arrow/\{branch:simd-cast}/../kernels/cast.rs|[https://github.com/nevi-me/arrow/blob/simd-cast/rust/arrow/src/compute/kernels/cast.rs#L601]] for an example implementation that's limited by the differences in lane length. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5350) [Rust] Support filtering on nested array types
Neville Dipale created ARROW-5350: - Summary: [Rust] Support filtering on nested array types Key: ARROW-5350 URL: https://issues.apache.org/jira/browse/ARROW-5350 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale We currently only filter on primitive types, but not on lists and structs. Add the ability to filter on nested array types -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5351) [Rust] Add support for take kernel functions
Neville Dipale created ARROW-5351: - Summary: [Rust] Add support for take kernel functions Key: ARROW-5351 URL: https://issues.apache.org/jira/browse/ARROW-5351 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale Similar to https://issues.apache.org/jira/browse/ARROW-772, a take function would allow us random-access on arrays, which is useful for sorting and (potentially) filtering. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5352) [Rust] BinaryArray filter loses replaces nulls with empty strings
Neville Dipale created ARROW-5352: - Summary: [Rust] BinaryArray filter loses replaces nulls with empty strings Key: ARROW-5352 URL: https://issues.apache.org/jira/browse/ARROW-5352 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 0.13.0 Reporter: Neville Dipale The filter implementation for BinaryArray discards nullness of data. BinaryArrays that are null (seem to) always return an empty string slice when getting a value, so the way filter works might be a bug depending on what Arrow developers' or users' intentions are. I think we should either preserve nulls (and their count) or document this as intended behaviour. Below is a test case that reproduces the bug. {code:java} #[test] fn test_filter_binary_array_with_nulls() { let mut a: BinaryBuilder = BinaryBuilder::new(100); a.append_null().unwrap(); a.append_string("a string").unwrap(); a.append_null().unwrap(); a.append_string("with nulls").unwrap(); let array = a.finish(); let b = BooleanArray::from(vec![true, true, true, true]); let c = filter(&array, &b).unwrap(); let d: &BinaryArray = c.as_any().downcast_ref::().unwrap(); // I didn't expect this behaviour assert_eq!("", d.get_string(0)); // fails here assert!(d.is_null(0)); assert_eq!(4, d.len()); // fails here assert_eq!(2, d.null_count()); assert_eq!("a string", d.get_string(1)); // fails here assert!(d.is_null(2)); assert_eq!("with nulls", d.get_string(3)); } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5360) [Rust] Builds are broken by rustyline on nightly 2019-05-16+
Neville Dipale created ARROW-5360: - Summary: [Rust] Builds are broken by rustyline on nightly 2019-05-16+ Key: ARROW-5360 URL: https://issues.apache.org/jira/browse/ARROW-5360 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: Neville Dipale Rust builds are broken on nightly since 2019-05-16. Please see [https://github.com/kkawakam/rustyline/issues/217] The issue might need to be fixed on the rustyline crate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5366) [Rust] Implement Duration and Interval Types
Neville Dipale created ARROW-5366: - Summary: [Rust] Implement Duration and Interval Types Key: ARROW-5366 URL: https://issues.apache.org/jira/browse/ARROW-5366 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Neville Dipale This should ideally include covering: * data types * arrays and builders * adding to kernels (e.g. including support in cast) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5367) [Rust] Add temporal kernels
Neville Dipale created ARROW-5367: - Summary: [Rust] Add temporal kernels Key: ARROW-5367 URL: https://issues.apache.org/jira/browse/ARROW-5367 Project: Apache Arrow Issue Type: New Feature Components: Rust Reporter: Neville Dipale When creating temporal arrays, we added a sample function that extracts the hour from a temporal array. This ticket is to add support for other common temporal functions like minute, second, hour, and might include temporal arithmetic as adding dates and times, calculating durations etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5399) [Rust] [Testing] Add IPC test files to arrow-testing
Neville Dipale created ARROW-5399: - Summary: [Rust] [Testing] Add IPC test files to arrow-testing Key: ARROW-5399 URL: https://issues.apache.org/jira/browse/ARROW-5399 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale We're generating a lot of files for testing, which should ideally live in arrow-testing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5400) [Rust] Test/ensure that reader and writer support zero-length record batches
Neville Dipale created ARROW-5400: - Summary: [Rust] Test/ensure that reader and writer support zero-length record batches Key: ARROW-5400 URL: https://issues.apache.org/jira/browse/ARROW-5400 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Neville Dipale -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5408) [Rust] Create struct array builder that creates null buffers
Neville Dipale created ARROW-5408: - Summary: [Rust] Create struct array builder that creates null buffers Key: ARROW-5408 URL: https://issues.apache.org/jira/browse/ARROW-5408 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Neville Dipale We currently have a way of creating a struct array from a list of (field, array) tuples. This does not create null buffers for the struct (because no index is null). While this works fine for Rust, it often leads to incompatible data with IPC data and kernel function outputs. Having a function that caters for nulls, or expanding the current one, would alleviate this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)