[jira] [Created] (ARROW-13977) [Format] Clarify leap seconds and leap days for interval type
QP Hou created ARROW-13977: -- Summary: [Format] Clarify leap seconds and leap days for interval type Key: ARROW-13977 URL: https://issues.apache.org/jira/browse/ARROW-13977 Project: Apache Arrow Issue Type: Task Reporter: QP Hou Assignee: QP Hou It's unclear how leap seconds and leap days should be handled for interval type, we should clarify them in the spec. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11721) json schame inference should return Schema type instead of SchemaRef
QP Hou created ARROW-11721: -- Summary: json schame inference should return Schema type instead of SchemaRef Key: ARROW-11721 URL: https://issues.apache.org/jira/browse/ARROW-11721 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11719) Support merged schema for memory table
QP Hou created ARROW-11719: -- Summary: Support merged schema for memory table Key: ARROW-11719 URL: https://issues.apache.org/jira/browse/ARROW-11719 Project: Apache Arrow Issue Type: Task Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou Memory table should support loading batches with compatible schemas instead of forcing all schemas to be the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11708) Clean up Rust 2021 linting warning
QP Hou created ARROW-11708: -- Summary: Clean up Rust 2021 linting warning Key: ARROW-11708 URL: https://issues.apache.org/jira/browse/ARROW-11708 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11707) Support CSV schema inference without seek
QP Hou created ARROW-11707: -- Summary: Support CSV schema inference without seek Key: ARROW-11707 URL: https://issues.apache.org/jira/browse/ARROW-11707 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11542) [Rust] json reader should not crash when reading nested list
QP Hou created ARROW-11542: -- Summary: [Rust] json reader should not crash when reading nested list Key: ARROW-11542 URL: https://issues.apache.org/jira/browse/ARROW-11542 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11491) support json schema inference for nested list and struct
QP Hou created ARROW-11491: -- Summary: support json schema inference for nested list and struct Key: ARROW-11491 URL: https://issues.apache.org/jira/browse/ARROW-11491 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11435) Allow creating ParquetPartition from external crate
QP Hou created ARROW-11435: -- Summary: Allow creating ParquetPartition from external crate Key: ARROW-11435 URL: https://issues.apache.org/jira/browse/ARROW-11435 Project: Apache Arrow Issue Type: Task Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou Without this functionality, it's not possible to implement table provider in external crate that targets parquet format since ParquetExec takes ParquetPartition as an argument. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11366) Support boolean literal in equality expression
QP Hou created ARROW-11366: -- Summary: Support boolean literal in equality expression Key: ARROW-11366 URL: https://issues.apache.org/jira/browse/ARROW-11366 Project: Apache Arrow Issue Type: Task Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11310) implement arrow JSON writer
QP Hou created ARROW-11310: -- Summary: implement arrow JSON writer Key: ARROW-11310 URL: https://issues.apache.org/jira/browse/ARROW-11310 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11113) [Rust] support as_struct_array cast
QP Hou created ARROW-3: -- Summary: [Rust] support as_struct_array cast Key: ARROW-3 URL: https://issues.apache.org/jira/browse/ARROW-3 Project: Apache Arrow Issue Type: Bug Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11110) [Rust] [Datafusion] context.table should not take a mutable self reference
QP Hou created ARROW-0: -- Summary: [Rust] [Datafusion] context.table should not take a mutable self reference Key: ARROW-0 URL: https://issues.apache.org/jira/browse/ARROW-0 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou Fix For: 3.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10876) [Rust] json reader should validate value type
QP Hou created ARROW-10876: -- Summary: [Rust] json reader should validate value type Key: ARROW-10876 URL: https://issues.apache.org/jira/browse/ARROW-10876 Project: Apache Arrow Issue Type: Bug Reporter: QP Hou json reader should error out if row type is not object -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10875) simplify simd cfg check
QP Hou created ARROW-10875: -- Summary: simplify simd cfg check Key: ARROW-10875 URL: https://issues.apache.org/jira/browse/ARROW-10875 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: QP Hou Assignee: QP Hou make simd cfg check DRY for easier maintenance -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10842) [Rust] decouple IO from json schema inference code
QP Hou created ARROW-10842: -- Summary: [Rust] decouple IO from json schema inference code Key: ARROW-10842 URL: https://issues.apache.org/jira/browse/ARROW-10842 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10830) [Rust] json reader should not hard crash on invalid json
QP Hou created ARROW-10830: -- Summary: [Rust] json reader should not hard crash on invalid json Key: ARROW-10830 URL: https://issues.apache.org/jira/browse/ARROW-10830 Project: Apache Arrow Issue Type: Bug Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10822) [Rust] [Datafusion] support compiling datafusion with simd support
QP Hou created ARROW-10822: -- Summary: [Rust] [Datafusion] support compiling datafusion with simd support Key: ARROW-10822 URL: https://issues.apache.org/jira/browse/ARROW-10822 Project: Apache Arrow Issue Type: New Feature Components: Rust - DataFusion Reporter: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10821) [Rust] [Datafusion] implement negative expression
QP Hou created ARROW-10821: -- Summary: [Rust] [Datafusion] implement negative expression Key: ARROW-10821 URL: https://issues.apache.org/jira/browse/ARROW-10821 Project: Apache Arrow Issue Type: New Feature Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10458) [Rust] [Datafusion] context.create_logical_plan should not take a mutable self reference
QP Hou created ARROW-10458: -- Summary: [Rust] [Datafusion] context.create_logical_plan should not take a mutable self reference Key: ARROW-10458 URL: https://issues.apache.org/jira/browse/ARROW-10458 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10454) [Rust][Datafusion] support creating ParquetExec from externally resolved file list and schema
QP Hou created ARROW-10454: -- Summary: [Rust][Datafusion] support creating ParquetExec from externally resolved file list and schema Key: ARROW-10454 URL: https://issues.apache.org/jira/browse/ARROW-10454 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9327) Fix all clippy errors for arrow crate
QP Hou created ARROW-9327: - Summary: Fix all clippy errors for arrow crate Key: ARROW-9327 URL: https://issues.apache.org/jira/browse/ARROW-9327 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9192) [Rust] Enable clippy linting for arrow crate in CI pipeline
QP Hou created ARROW-9192: - Summary: [Rust] Enable clippy linting for arrow crate in CI pipeline Key: ARROW-9192 URL: https://issues.apache.org/jira/browse/ARROW-9192 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9184) [Rust][Datafusion] table scan without projection should return all columns
QP Hou created ARROW-9184: - Summary: [Rust][Datafusion] table scan without projection should return all columns Key: ARROW-9184 URL: https://issues.apache.org/jira/browse/ARROW-9184 Project: Apache Arrow Issue Type: Bug Reporter: QP Hou Assignee: QP Hou Projection should be optional if user already want to fetch all columns -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9158) [Rust][Datafusion] Projection physical plan compilation should preserve nullability
QP Hou created ARROW-9158: - Summary: [Rust][Datafusion] Projection physical plan compilation should preserve nullability Key: ARROW-9158 URL: https://issues.apache.org/jira/browse/ARROW-9158 Project: Apache Arrow Issue Type: Improvement Reporter: QP Hou Assignee: QP Hou When compiling logical plan to physical plan, field nullability should be preserved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9157) [Rust][Datafusion] execution context's create_physical_plan should take self as immutable reference
QP Hou created ARROW-9157: - Summary: [Rust][Datafusion] execution context's create_physical_plan should take self as immutable reference Key: ARROW-9157 URL: https://issues.apache.org/jira/browse/ARROW-9157 Project: Apache Arrow Issue Type: Improvement Reporter: QP Hou Assignee: QP Hou It's not mutating self, so mutable reference is not necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9124) [Rust][Datafusion] DFParser should consume sql query as instead of String
[ https://issues.apache.org/jira/browse/ARROW-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-9124: -- Summary: [Rust][Datafusion] DFParser should consume sql query as instead of String (was: DFParser should consume sql query as instead of String) > [Rust][Datafusion] DFParser should consume sql query as instead of String > -- > > Key: ARROW-9124 > URL: https://issues.apache.org/jira/browse/ARROW-9124 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > > It's more efficient to use instead of String -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9124) DFParser should consume sql query as instead of String
QP Hou created ARROW-9124: - Summary: DFParser should consume sql query as instead of String Key: ARROW-9124 URL: https://issues.apache.org/jira/browse/ARROW-9124 Project: Apache Arrow Issue Type: Improvement Reporter: QP Hou Assignee: QP Hou It's more efficient to use instead of String -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9124) DFParser should consume sql query as instead of String
[ https://issues.apache.org/jira/browse/ARROW-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-9124: -- Component/s: Rust - DataFusion > DFParser should consume sql query as instead of String > --- > > Key: ARROW-9124 > URL: https://issues.apache.org/jira/browse/ARROW-9124 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > > It's more efficient to use instead of String -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9057) Projection should work on InMemoryScan without error
QP Hou created ARROW-9057: - Summary: Projection should work on InMemoryScan without error Key: ARROW-9057 URL: https://issues.apache.org/jira/browse/ARROW-9057 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8824) [Rust] [DataFusion] Implement new SQL parser
[ https://issues.apache.org/jira/browse/ARROW-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127242#comment-17127242 ] QP Hou commented on ARROW-8824: --- +1 on rewriting a new dedicated parser for datafusion. > [Rust] [DataFusion] Implement new SQL parser > > > Key: ARROW-8824 > URL: https://issues.apache.org/jira/browse/ARROW-8824 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.0.0 > > > We currently depend on the sqlparser crate that I originally created but has > moved on since the version we use and that project is aiming to support > multiple SQL dialects and I don't think it is appropriate for what we need in > DataFusion. > I think it would be better to build a new SQL parser as part of the > DataFusion crate so that we can more easily maintain it, and it can use Arrow > as the native type system. > Another option would be to try and donate the sqlparser 0.2.x code base but > there are a fair number of committers and it is probably easier just to > implement it from scratch (without referencing the existing code). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9005) Support sort expression
QP Hou created ARROW-9005: - Summary: Support sort expression Key: ARROW-9005 URL: https://issues.apache.org/jira/browse/ARROW-9005 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8931) [Rust] Support lexical sort in arrow compute kernel
QP Hou created ARROW-8931: - Summary: [Rust] Support lexical sort in arrow compute kernel Key: ARROW-8931 URL: https://issues.apache.org/jira/browse/ARROW-8931 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8906) [Rust] Support reading multiple CSV files for schema inference
QP Hou created ARROW-8906: - Summary: [Rust] Support reading multiple CSV files for schema inference Key: ARROW-8906 URL: https://issues.apache.org/jira/browse/ARROW-8906 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8877) [Rust] add CSV read option struct to simplify datafusion interface
QP Hou created ARROW-8877: - Summary: [Rust] add CSV read option struct to simplify datafusion interface Key: ARROW-8877 URL: https://issues.apache.org/jira/browse/ARROW-8877 Project: Apache Arrow Issue Type: Improvement Components: Rust, Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8840) [Rust] datafusion ExecutionError should implement std::error:Error trait
QP Hou created ARROW-8840: - Summary: [Rust] datafusion ExecutionError should implement std::error:Error trait Key: ARROW-8840 URL: https://issues.apache.org/jira/browse/ARROW-8840 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8839) [Rust] datafusion logical plan should support scaning csv without provided schema
QP Hou created ARROW-8839: - Summary: [Rust] datafusion logical plan should support scaning csv without provided schema Key: ARROW-8839 URL: https://issues.apache.org/jira/browse/ARROW-8839 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8821) [Rust] nested binary expression with Like, NotLike and Not operator results in type cast error
[ https://issues.apache.org/jira/browse/ARROW-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-8821: -- Component/s: (was: Rust) Rust - DataFusion > [Rust] nested binary expression with Like, NotLike and Not operator results > in type cast error > -- > > Key: ARROW-8821 > URL: https://issues.apache.org/jira/browse/ARROW-8821 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8821) [Rust] nested binary expression with Like, NotLike and Not operator results in type cast error
[ https://issues.apache.org/jira/browse/ARROW-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-8821: -- Component/s: Rust > [Rust] nested binary expression with Like, NotLike and Not operator results > in type cast error > -- > > Key: ARROW-8821 > URL: https://issues.apache.org/jira/browse/ARROW-8821 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8821) [Rust] nested binary expression with Like, NotLike and Not operator results in type cast error
QP Hou created ARROW-8821: - Summary: [Rust] nested binary expression with Like, NotLike and Not operator results in type cast error Key: ARROW-8821 URL: https://issues.apache.org/jira/browse/ARROW-8821 Project: Apache Arrow Issue Type: Bug Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8752) Remove unused hashmap
QP Hou created ARROW-8752: - Summary: Remove unused hashmap Key: ARROW-8752 URL: https://issues.apache.org/jira/browse/ARROW-8752 Project: Apache Arrow Issue Type: Improvement Reporter: QP Hou Assignee: QP Hou both base_nodes and base_nodes_set doesn't seem to be used at all in build_array_reader. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8751) [Rust] ParquetFileArrowReader should be able to read empty parquet file without error
QP Hou created ARROW-8751: - Summary: [Rust] ParquetFileArrowReader should be able to read empty parquet file without error Key: ARROW-8751 URL: https://issues.apache.org/jira/browse/ARROW-8751 Project: Apache Arrow Issue Type: New Feature Reporter: QP Hou Assignee: QP Hou Sometimes spark will write out parquet files with zero row groups, which will result in error if read using ParquetFileArrowReader. It would be more convenient if ParquetFileArrowReader can support this edge-case out of the box. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8744) [Rust] ParquetIterator's next method should be safe to call even after reached end of iteration
QP Hou created ARROW-8744: - Summary: [Rust] ParquetIterator's next method should be safe to call even after reached end of iteration Key: ARROW-8744 URL: https://issues.apache.org/jira/browse/ARROW-8744 Project: Apache Arrow Issue Type: Improvement Reporter: QP Hou Assignee: QP Hou Once reached end of iteration, calling next on ParquetIterator will result in an error. This is inconvenient in two ways: * when shared between multiple threads, only one of the thread will be able to terminate without error * sender for response_rx cannot terminate the iteration early and free up resources, instead, it needs to always wait for signal from request_tx before closing up the connection -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8744) [Rust] ParquetIterator's next method should be safe to call even after reached end of iteration
[ https://issues.apache.org/jira/browse/ARROW-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-8744: -- Component/s: Rust - DataFusion Priority: Minor (was: Major) > [Rust] ParquetIterator's next method should be safe to call even after > reached end of iteration > --- > > Key: ARROW-8744 > URL: https://issues.apache.org/jira/browse/ARROW-8744 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > > Once reached end of iteration, calling next on ParquetIterator will result in > an error. This is inconvenient in two ways: > * when shared between multiple threads, only one of the thread will be able > to terminate without error > * sender for response_rx cannot terminate the iteration early and free up > resources, instead, it needs to always wait for signal from request_tx before > closing up the connection -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8725) redundant directory walk in rust parquet datasource code
QP Hou created ARROW-8725: - Summary: redundant directory walk in rust parquet datasource code Key: ARROW-8725 URL: https://issues.apache.org/jira/browse/ARROW-8725 Project: Apache Arrow Issue Type: Improvement Reporter: QP Hou Assignee: QP Hou In the rust code base, `common::build_file_list` is called within `ParquetExec::try_new`, so there is no need to build the file list before calling `try_new`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8552) [Rust] support column iteration for parquet row
QP Hou created ARROW-8552: - Summary: [Rust] support column iteration for parquet row Key: ARROW-8552 URL: https://issues.apache.org/jira/browse/ARROW-8552 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: QP Hou It would be useful to be able to iterate through all the columns in a parquet row. -- This message was sent by Atlassian Jira (v8.3.4#803005)