[jira] [Created] (ARROW-11707) Support CSV schema inference without seek
QP Hou created ARROW-11707: -- Summary: Support CSV schema inference without seek Key: ARROW-11707 URL: https://issues.apache.org/jira/browse/ARROW-11707 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11707) Decouple CSV schema inference from IO
[ https://issues.apache.org/jira/browse/ARROW-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-11707: --- Summary: Decouple CSV schema inference from IO (was: Support CSV schema inference without seek) > Decouple CSV schema inference from IO > - > > Key: ARROW-11707 > URL: https://issues.apache.org/jira/browse/ARROW-11707 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11707) Support CSV schema inference without IO
[ https://issues.apache.org/jira/browse/ARROW-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11707: --- Labels: pull-request-available (was: ) > Support CSV schema inference without IO > --- > > Key: ARROW-11707 > URL: https://issues.apache.org/jira/browse/ARROW-11707 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11707) Support CSV schema inference without IO
[ https://issues.apache.org/jira/browse/ARROW-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-11707: --- Summary: Support CSV schema inference without IO (was: Decouple CSV schema inference from IO) > Support CSV schema inference without IO > --- > > Key: ARROW-11707 > URL: https://issues.apache.org/jira/browse/ARROW-11707 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11708) Clean up Rust 2021 linting warning
QP Hou created ARROW-11708: -- Summary: Clean up Rust 2021 linting warning Key: ARROW-11708 URL: https://issues.apache.org/jira/browse/ARROW-11708 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11708) Clean up Rust 2021 linting warning
[ https://issues.apache.org/jira/browse/ARROW-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11708: --- Labels: pull-request-available (was: ) > Clean up Rust 2021 linting warning > -- > > Key: ARROW-11708 > URL: https://issues.apache.org/jira/browse/ARROW-11708 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11709) [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util
Andrew Lamb created ARROW-11709: --- Summary: [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util Key: ARROW-11709 URL: https://issues.apache.org/jira/browse/ARROW-11709 Project: Apache Arrow Issue Type: Sub-task Reporter: Andrew Lamb move `expressions` and `inputs` into LogicalPlan rather than helpers in util, and use Visitor rather than hard coded list Goal is to consolidate the expression walking in one place -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11710) [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy
Andrew Lamb created ARROW-11710: --- Summary: [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy Key: ARROW-11710 URL: https://issues.apache.org/jira/browse/ARROW-11710 Project: Apache Arrow Issue Type: Sub-task Reporter: Andrew Lamb The idea is to 1. Reduce the amount repetitions in optimizer rules to make them easier to implement 2. Reduce the amount of repetition to make it easier to see the actual logic (rather than having it intertwined with the code needed to do recursion) 2. Set the stage for a more general `PlanRewriter` that doesn't have to clone its input, and can modify take their input by value and consume them Plan is to make an ExprRewriter, the mutable counterpart to `ExpressionVisitor` and demonstrates its usefulness by rewriting several expression transformation rewrite passes using it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input
Andrew Lamb created ARROW-11711: --- Summary: [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input Key: ARROW-11711 URL: https://issues.apache.org/jira/browse/ARROW-11711 Project: Apache Arrow Issue Type: Sub-task Reporter: Andrew Lamb Rename ExpressionVisitor ExprVisitor for consistency and change it to use `&mut self` rather than consuming the visitor for consistency with `PlanVisitor` (as well as the soon to be created `ExprVisitor` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
Andrew Lamb created ARROW-11712: --- Summary: [Rust][DataFusion] Introduce PlanRewriter for rewriting plans Key: ARROW-11712 URL: https://issues.apache.org/jira/browse/ARROW-11712 Project: Apache Arrow Issue Type: Sub-task Reporter: Andrew Lamb Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11690) [Rust][DataFusion] Avoid Expr::clone in Expr builder methods
[ https://issues.apache.org/jira/browse/ARROW-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb reassigned ARROW-11690: --- Assignee: Andrew Lamb > [Rust][DataFusion] Avoid Expr::clone in Expr builder methods > > > Key: ARROW-11690 > URL: https://issues.apache.org/jira/browse/ARROW-11690 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input
[ https://issues.apache.org/jira/browse/ARROW-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11711: Component/s: Rust - DataFusion > [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize > input > - > > Key: ARROW-11711 > URL: https://issues.apache.org/jira/browse/ARROW-11711 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Priority: Major > > Rename ExpressionVisitor ExprVisitor for consistency and change it to use > `&mut self` rather than consuming the visitor for consistency with > `PlanVisitor` (as well as the soon to be created `ExprVisitor` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
[ https://issues.apache.org/jira/browse/ARROW-11712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb reassigned ARROW-11712: --- Assignee: Andrew Lamb > [Rust][DataFusion] Introduce PlanRewriter for rewriting plans > - > > Key: ARROW-11712 > URL: https://issues.apache.org/jira/browse/ARROW-11712 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Major > > Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and > rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11710) [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy
[ https://issues.apache.org/jira/browse/ARROW-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11710: Component/s: Rust - DataFusion > [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy > > > Key: ARROW-11710 > URL: https://issues.apache.org/jira/browse/ARROW-11710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Priority: Major > > The idea is to > 1. Reduce the amount repetitions in optimizer rules to make them easier to > implement > 2. Reduce the amount of repetition to make it easier to see the actual logic > (rather than having it intertwined with the code needed to do recursion) > 2. Set the stage for a more general `PlanRewriter` that doesn't have to > clone its input, and can modify take their input by value and consume them > Plan is to make an ExprRewriter, the mutable counterpart to > `ExpressionVisitor` and demonstrates its usefulness by rewriting several > expression transformation rewrite passes using it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11709) [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util
[ https://issues.apache.org/jira/browse/ARROW-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11709: Component/s: Rust - DataFusion > [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather > than helpers in util > --- > > Key: ARROW-11709 > URL: https://issues.apache.org/jira/browse/ARROW-11709 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Major > > move `expressions` and `inputs` into LogicalPlan rather than helpers in > util, and use Visitor rather than hard coded list > Goal is to consolidate the expression walking in one place -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11690) [Rust][DataFusion] Avoid Expr::clone in Expr builder methods
[ https://issues.apache.org/jira/browse/ARROW-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11690: Component/s: Rust - DataFusion > [Rust][DataFusion] Avoid Expr::clone in Expr builder methods > > > Key: ARROW-11690 > URL: https://issues.apache.org/jira/browse/ARROW-11690 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input
[ https://issues.apache.org/jira/browse/ARROW-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb reassigned ARROW-11711: --- Assignee: Andrew Lamb > [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize > input > - > > Key: ARROW-11711 > URL: https://issues.apache.org/jira/browse/ARROW-11711 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Major > > Rename ExpressionVisitor ExprVisitor for consistency and change it to use > `&mut self` rather than consuming the visitor for consistency with > `PlanVisitor` (as well as the soon to be created `ExprVisitor` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11709) [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util
[ https://issues.apache.org/jira/browse/ARROW-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb reassigned ARROW-11709: --- Assignee: Andrew Lamb > [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather > than helpers in util > --- > > Key: ARROW-11709 > URL: https://issues.apache.org/jira/browse/ARROW-11709 > Project: Apache Arrow > Issue Type: Sub-task >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Major > > move `expressions` and `inputs` into LogicalPlan rather than helpers in > util, and use Visitor rather than hard coded list > Goal is to consolidate the expression walking in one place -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11710) [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy
[ https://issues.apache.org/jira/browse/ARROW-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb reassigned ARROW-11710: --- Assignee: Andrew Lamb > [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy > > > Key: ARROW-11710 > URL: https://issues.apache.org/jira/browse/ARROW-11710 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Major > > The idea is to > 1. Reduce the amount repetitions in optimizer rules to make them easier to > implement > 2. Reduce the amount of repetition to make it easier to see the actual logic > (rather than having it intertwined with the code needed to do recursion) > 2. Set the stage for a more general `PlanRewriter` that doesn't have to > clone its input, and can modify take their input by value and consume them > Plan is to make an ExprRewriter, the mutable counterpart to > `ExpressionVisitor` and demonstrates its usefulness by rewriting several > expression transformation rewrite passes using it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
[ https://issues.apache.org/jira/browse/ARROW-11712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11712: Component/s: Rust - DataFusion > [Rust][DataFusion] Introduce PlanRewriter for rewriting plans > - > > Key: ARROW-11712 > URL: https://issues.apache.org/jira/browse/ARROW-11712 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andrew Lamb >Priority: Major > > Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and > rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11713) [Rust] Get MIRI running again
Andrew Lamb created ARROW-11713: --- Summary: [Rust] Get MIRI running again Key: ARROW-11713 URL: https://issues.apache.org/jira/browse/ARROW-11713 Project: Apache Arrow Issue Type: Improvement Reporter: Andrew Lamb Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in programs The Rust arrow implementation now runs the MIRI checks as part of CI, but it does not pass cleanly For example: https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 {code} Compiling criterion v0.3.4 Compiling h2 v0.3.0 Compiling tower v0.4.5 Compiling hyper v0.14.4 error[E0463]: can't find crate for `tracing` --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 | 68 | extern crate tracing; | ^ can't find crate error: aborting due to previous error {code} Previously MIRI ran but the check failed in FFI somewhere -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11713) [Rust] Get MIRI running again
[ https://issues.apache.org/jira/browse/ARROW-11713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11713: Description: Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in programs The Rust arrow implementation now runs the MIRI checks as part of CI, but it does not pass cleanly For example: https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 {code} Compiling criterion v0.3.4 Compiling h2 v0.3.0 Compiling tower v0.4.5 Compiling hyper v0.14.4 error[E0463]: can't find crate for `tracing` --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 | 68 | extern crate tracing; | ^ can't find crate error: aborting due to previous error {code} Previously MIRI ran but the check failed in FFI somewhere Help wanted! was: Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in programs The Rust arrow implementation now runs the MIRI checks as part of CI, but it does not pass cleanly For example: https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 {code} Compiling criterion v0.3.4 Compiling h2 v0.3.0 Compiling tower v0.4.5 Compiling hyper v0.14.4 error[E0463]: can't find crate for `tracing` --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 | 68 | extern crate tracing; | ^ can't find crate error: aborting due to previous error {code} Previously MIRI ran but the check failed in FFI somewhere > [Rust] Get MIRI running again > - > > Key: ARROW-11713 > URL: https://issues.apache.org/jira/browse/ARROW-11713 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Andrew Lamb >Priority: Major > > Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors > in programs > The Rust arrow implementation now runs the MIRI checks as part of CI, but it > does not pass cleanly > For example: > https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 > {code} >Compiling criterion v0.3.4 >Compiling h2 v0.3.0 >Compiling tower v0.4.5 >Compiling hyper v0.14.4 > error[E0463]: can't find crate for `tracing` > --> > /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 >| > 68 | extern crate tracing; >| ^ can't find crate > error: aborting due to previous error > {code} > Previously MIRI ran but the check failed in FFI somewhere > Help wanted! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11713) [Rust] Get MIRI running again
[ https://issues.apache.org/jira/browse/ARROW-11713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11713: Description: Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in programs The Rust arrow implementation now runs the MIRI checks as part of CI thanks to [~vertexclique] but it does not pass cleanly yet For example: https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 {code} Compiling criterion v0.3.4 Compiling h2 v0.3.0 Compiling tower v0.4.5 Compiling hyper v0.14.4 error[E0463]: can't find crate for `tracing` --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 | 68 | extern crate tracing; | ^ can't find crate error: aborting due to previous error {code} Previously MIRI ran but the check failed in FFI somewhere Help wanted! was: Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in programs The Rust arrow implementation now runs the MIRI checks as part of CI, but it does not pass cleanly For example: https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 {code} Compiling criterion v0.3.4 Compiling h2 v0.3.0 Compiling tower v0.4.5 Compiling hyper v0.14.4 error[E0463]: can't find crate for `tracing` --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 | 68 | extern crate tracing; | ^ can't find crate error: aborting due to previous error {code} Previously MIRI ran but the check failed in FFI somewhere Help wanted! > [Rust] Get MIRI running again > - > > Key: ARROW-11713 > URL: https://issues.apache.org/jira/browse/ARROW-11713 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Andrew Lamb >Priority: Major > > Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors > in programs > The Rust arrow implementation now runs the MIRI checks as part of CI thanks > to [~vertexclique] but it does not pass cleanly yet > For example: > https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 > {code} >Compiling criterion v0.3.4 >Compiling h2 v0.3.0 >Compiling tower v0.4.5 >Compiling hyper v0.14.4 > error[E0463]: can't find crate for `tracing` > --> > /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 >| > 68 | extern crate tracing; >| ^ can't find crate > error: aborting due to previous error > {code} > Previously MIRI ran but the check failed in FFI somewhere > Help wanted! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11714) [Rust] Fix MIRI build on CI
Andrew Lamb created ARROW-11714: --- Summary: [Rust] Fix MIRI build on CI Key: ARROW-11714 URL: https://issues.apache.org/jira/browse/ARROW-11714 Project: Apache Arrow Issue Type: Sub-task Reporter: Andrew Lamb the MIRI check doesn't even compile anymore: {code} Compiling criterion v0.3.4 Compiling h2 v0.3.0 Compiling tower v0.4.5 Compiling hyper v0.14.4 error[E0463]: can't find crate for `tracing` --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 | 68 | extern crate tracing; | ^ can't find crate error: aborting due to previous error {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11715) [Rust] Ensure a successful MIRI Run on CI
Andrew Lamb created ARROW-11715: --- Summary: [Rust] Ensure a successful MIRI Run on CI Key: ARROW-11715 URL: https://issues.apache.org/jira/browse/ARROW-11715 Project: Apache Arrow Issue Type: Sub-task Reporter: Andrew Lamb Now we have the MIRI check setup to pass even of `cargo miri` returns an error. https://github.com/apache/arrow/blob/master/.github/workflows/rust.yml#L263-L264 {code} # Ignore MIRI errors until we can get a clean run cargo miri test || true {code} Goal is to make MIRI pass and then remove this check in CI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11713) [Rust] Get MIRI running again
[ https://issues.apache.org/jira/browse/ARROW-11713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11713: Component/s: Rust > [Rust] Get MIRI running again > - > > Key: ARROW-11713 > URL: https://issues.apache.org/jira/browse/ARROW-11713 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andrew Lamb >Priority: Major > > Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors > in programs > The Rust arrow implementation now runs the MIRI checks as part of CI thanks > to [~vertexclique] but it does not pass cleanly yet > For example: > https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240 > {code} >Compiling criterion v0.3.4 >Compiling h2 v0.3.0 >Compiling tower v0.4.5 >Compiling hyper v0.14.4 > error[E0463]: can't find crate for `tracing` > --> > /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1 >| > 68 | extern crate tracing; >| ^ can't find crate > error: aborting due to previous error > {code} > Previously MIRI ran but the check failed in FFI somewhere > Help wanted! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11708) Clean up Rust 2021 linting warning
[ https://issues.apache.org/jira/browse/ARROW-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11708. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9535 [https://github.com/apache/arrow/pull/9535] > Clean up Rust 2021 linting warning > -- > > Key: ARROW-11708 > URL: https://issues.apache.org/jira/browse/ARROW-11708 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11716) [Rust][DataFusion] Change tests in sql.rs to use `assert_batch`
Andrew Lamb created ARROW-11716: --- Summary: [Rust][DataFusion] Change tests in sql.rs to use `assert_batch` Key: ARROW-11716 URL: https://issues.apache.org/jira/browse/ARROW-11716 Project: Apache Arrow Issue Type: Improvement Reporter: Andrew Lamb The idea is to make the tests in [sql.rs|https://github.com/apache/arrow/blob/master/rust/datafusion/tests/sql.rs#L103] more maintainable by using the `assert_batches_eq` macro that was introduced here: https://github.com/apache/arrow/pull/9264 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11692) [Rust][DataFusion] Improve documentation on Optimizer
[ https://issues.apache.org/jira/browse/ARROW-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb updated ARROW-11692: Component/s: Rust - DataFusion > [Rust][DataFusion] Improve documentation on Optimizer > - > > Key: ARROW-11692 > URL: https://issues.apache.org/jira/browse/ARROW-11692 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Trivial > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-11692) [Rust][DataFusion] Improve documentation on Optimizer
[ https://issues.apache.org/jira/browse/ARROW-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb reassigned ARROW-11692: --- Assignee: Andrew Lamb > [Rust][DataFusion] Improve documentation on Optimizer > - > > Key: ARROW-11692 > URL: https://issues.apache.org/jira/browse/ARROW-11692 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Trivial > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11692) [Rust][DataFusion] Improve documentation on Optimizer
[ https://issues.apache.org/jira/browse/ARROW-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11692. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9529 [https://github.com/apache/arrow/pull/9529] > [Rust][DataFusion] Improve documentation on Optimizer > - > > Key: ARROW-11692 > URL: https://issues.apache.org/jira/browse/ARROW-11692 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11717) [Integration] Intermittent (but frequent) flight integration failures with auth:basic_proto
Andrew Lamb created ARROW-11717: --- Summary: [Integration] Intermittent (but frequent) flight integration failures with auth:basic_proto Key: ARROW-11717 URL: https://issues.apache.org/jira/browse/ARROW-11717 Project: Apache Arrow Issue Type: Bug Components: Integration Reporter: Andrew Lamb Link to discussion on list: https://lists.apache.org/thread.html/r0dcdc2b6334e7f067a828634cf7584406ed859ff4d3fb622fef1bdd7%40%3Cdev.arrow.apache.org%3E I noticed that the Rust/CPP integration tests are failing seemingly intermittently on master (and on Rust PRs). The tests pass if they are re-run (enough) There are several commits that the little red `X` meaning that CI didn't pass on master https://github.com/apache/arrow/commits/master Here are some Some example CI runs that are failing https://github.com/apache/arrow/runs/1935673508 https://github.com/apache/arrow/runs/1926705212 Here is another example: https://github.com/apache/arrow/pull/9359/checks?check_run_id=1941967422 Example failure: {code} == Testing file auth:basic_proto == Traceback (most recent call last): File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd output = subprocess.check_output(cmd, stderr=subprocess.STDOUT) File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/build/cpp/debug/flight-test-integration-client', '-host', 'localhost', '-port=33569', '-scenario', 'auth:basic_proto']' died with . During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/arrow/dev/archery/archery/integration/runner.py", line 308, in _run_flight_test_case consumer.flight_request(port, **client_args) File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 116, in flight_request run_cmd(cmd) File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd raise RuntimeError(sio.getvalue()) RuntimeError: Command failed: /build/cpp/debug/flight-test-integration-client -host localhost -port=33569 -scenario auth:basic_proto With output: -- -- Arrow Fatal Error -- Invalid: Expected UNAUTHENTICATED but got Unavailable {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11681) [Rust] IPC writers shouldn't unwrap in destructors
[ https://issues.apache.org/jira/browse/ARROW-11681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11681. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9520 [https://github.com/apache/arrow/pull/9520] > [Rust] IPC writers shouldn't unwrap in destructors > -- > > Key: ARROW-11681 > URL: https://issues.apache.org/jira/browse/ARROW-11681 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 3.0.0 >Reporter: Steven Fackler >Assignee: Steven Fackler >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > FileWriter and StreamWriter call `self.finish().unwrap()` in their `Drop` > implementations if the write has not already been finished. However, a common > reason for the write to not be finished is an earlier IO error on the > underlying stream. In that case, the destructor will panic, which is not > desired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11459) [Rust] Allow ListArray of primitives to be built from iterator
[ https://issues.apache.org/jira/browse/ARROW-11459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-11459. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9388 [https://github.com/apache/arrow/pull/9388] > [Rust] Allow ListArray of primitives to be built from iterator > -- > > Key: ARROW-11459 > URL: https://issues.apache.org/jira/browse/ARROW-11459 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Jorge Leitão >Assignee: Jorge Leitão >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11706) [JS] Better BigInt compatibility check
[ https://issues.apache.org/jira/browse/ARROW-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diana Clarke updated ARROW-11706: - Description: See: https://github.com/apache/arrow/pull/9110 Check for whether {{BigInt64ArrayAvailable}} and {{BigUint64ArrayAvailable}} are available, rather than just {{BigIntAvailable}}. Recent versions of JavaScriptCore/WebKit in Safari support {{BigInt}} but do not support {{BigInt64Array}}, and so anything that relies on {{BigInt64Array}} will fail despite {{BigIntAvailable}} being true. The manifestation of this issue can be seen when trying to run the following within Safari on a table that contains bigints: {code:java} RecordBatchJSONWriter.writeAll(table).toString(true) message: "BigUint64Array is not available in this environment" BigUint64ArrayUnavailableError BigUint64ArrayUnavailable bignumToString bigNumsToStrings generatorResume@[native code] performIteration@[native code] visitInt visit map@[native code] recordBatchToJSON close finish global code {code} See also: https://bugs.webkit.org/show_bug.cgi?id=190800 was: See: https://github.com/apache/arrow/pull/9110 Check for whether BigInt64ArrayAvailable and BigUint64ArrayAvailable are available, rather than just BigIntAvailable. Recent versions of JavaScriptCore/WebKit in Safari support BigInt but do not support BigInt64Array, and so anything that relies on BigInt64Array will fail despite BigIntAvailable being true. The manifestation of this issue can be seen when trying to run the following within Safari on a table that contains bigints: {code:java} RecordBatchJSONWriter.writeAll(table).toString(true) message: "BigUint64Array is not available in this environment" BigUint64ArrayUnavailableError BigUint64ArrayUnavailable bignumToString bigNumsToStrings generatorResume@[native code] performIteration@[native code] visitInt visit map@[native code] recordBatchToJSON close finish global code {code} See also: https://bugs.webkit.org/show_bug.cgi?id=190800 > [JS] Better BigInt compatibility check > -- > > Key: ARROW-11706 > URL: https://issues.apache.org/jira/browse/ARROW-11706 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Diana Clarke >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > See: https://github.com/apache/arrow/pull/9110 > Check for whether {{BigInt64ArrayAvailable}} and {{BigUint64ArrayAvailable}} > are available, rather than just {{BigIntAvailable}}. Recent versions of > JavaScriptCore/WebKit in Safari support {{BigInt}} but do not support > {{BigInt64Array}}, and so anything that relies on {{BigInt64Array}} will fail > despite {{BigIntAvailable}} being true. > The manifestation of this issue can be seen when trying to run the following > within Safari on a table that contains bigints: > {code:java} > RecordBatchJSONWriter.writeAll(table).toString(true) > message: "BigUint64Array is not available in this environment" > BigUint64ArrayUnavailableError > BigUint64ArrayUnavailable > bignumToString > bigNumsToStrings > generatorResume@[native code] > performIteration@[native code] > visitInt > visit > map@[native code] > recordBatchToJSON > close > finish > global code > {code} > See also: https://bugs.webkit.org/show_bug.cgi?id=190800 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11706) [JS] Better BigInt compatibility check
[ https://issues.apache.org/jira/browse/ARROW-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Diana Clarke updated ARROW-11706: - Component/s: JavaScript > [JS] Better BigInt compatibility check > -- > > Key: ARROW-11706 > URL: https://issues.apache.org/jira/browse/ARROW-11706 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Diana Clarke >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > See: https://github.com/apache/arrow/pull/9110 > Check for whether {{BigInt64ArrayAvailable}} and {{BigUint64ArrayAvailable}} > are available, rather than just {{BigIntAvailable}}. Recent versions of > JavaScriptCore/WebKit in Safari support {{BigInt}} but do not support > {{BigInt64Array}}, and so anything that relies on {{BigInt64Array}} will fail > despite {{BigIntAvailable}} being true. > The manifestation of this issue can be seen when trying to run the following > within Safari on a table that contains bigints: > {code:java} > RecordBatchJSONWriter.writeAll(table).toString(true) > message: "BigUint64Array is not available in this environment" > BigUint64ArrayUnavailableError > BigUint64ArrayUnavailable > bignumToString > bigNumsToStrings > generatorResume@[native code] > performIteration@[native code] > visitInt > visit > map@[native code] > recordBatchToJSON > close > finish > global code > {code} > See also: https://bugs.webkit.org/show_bug.cgi?id=190800 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11698) [Website] Write blog post about C++ endianness compatibility
[ https://issues.apache.org/jira/browse/ARROW-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287755#comment-17287755 ] Kazuaki Ishizaki commented on ARROW-11698: -- Sounds good. It would be nice to add big-endian support on Java, too. This is because our community do not announce it while Arrow 3.0 includes the feature. What do you think? > [Website] Write blog post about C++ endianness compatibility > > > Key: ARROW-11698 > URL: https://issues.apache.org/jira/browse/ARROW-11698 > Project: Apache Arrow > Issue Type: Task > Components: Website >Reporter: Antoine Pitrou >Priority: Minor > > It might be nice to announce the cross-endian compatibility effort on the > website. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11718) [Rust] IPC writers shouldn't implicitly finish on drop
Steven Fackler created ARROW-11718: -- Summary: [Rust] IPC writers shouldn't implicitly finish on drop Key: ARROW-11718 URL: https://issues.apache.org/jira/browse/ARROW-11718 Project: Apache Arrow Issue Type: Bug Components: Rust Affects Versions: 3.0.0 Reporter: Steven Fackler Assignee: Steven Fackler The Rust IPC writer types have a destructor that automatically writes the footer if necessary. This is not ideal, though, since it can hide errors. For example, if a web server is streaming data to a client in the Arrow IPC format and it encounters an internal error trying to generate the next batch, the outbound stream will appear valid to the client as the footer will automatically be written out but some amount of data will actually be missing. If the footer was not automatically written, the client would properly detect the truncation. For reference, the C++ implementation does not attempt to write the footer implicitly on drop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11718) [Rust] IPC writers shouldn't implicitly finish on drop
[ https://issues.apache.org/jira/browse/ARROW-11718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11718: --- Labels: pull-request-available (was: ) > [Rust] IPC writers shouldn't implicitly finish on drop > -- > > Key: ARROW-11718 > URL: https://issues.apache.org/jira/browse/ARROW-11718 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 3.0.0 >Reporter: Steven Fackler >Assignee: Steven Fackler >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The Rust IPC writer types have a destructor that automatically writes the > footer if necessary. This is not ideal, though, since it can hide errors. For > example, if a web server is streaming data to a client in the Arrow IPC > format and it encounters an internal error trying to generate the next batch, > the outbound stream will appear valid to the client as the footer will > automatically be written out but some amount of data will actually be > missing. If the footer was not automatically written, the client would > properly detect the truncation. > For reference, the C++ implementation does not attempt to write the footer > implicitly on drop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11017) [Rust] [DataFusion] Add support for Parquet schema merging
[ https://issues.apache.org/jira/browse/ARROW-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287787#comment-17287787 ] QP Hou commented on ARROW-11017: The Schema struct does have a try_merge method defined, should we just reuse that one? > [Rust] [DataFusion] Add support for Parquet schema merging > --- > > Key: ARROW-11017 > URL: https://issues.apache.org/jira/browse/ARROW-11017 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust - DataFusion >Reporter: Andy Grove >Priority: Major > > Add support for Parquet schema merging so that we can read data sets where > some files have additional columns. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6399) [C++] More extensive attributes usage could improve debugging
[ https://issues.apache.org/jira/browse/ARROW-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287794#comment-17287794 ] Wes McKinney commented on ARROW-6399: - Is it possible to go ahead and add {{ARROW_MUST_USE_RESULT}} to {{Result}}? > [C++] More extensive attributes usage could improve debugging > - > > Key: ARROW-6399 > URL: https://issues.apache.org/jira/browse/ARROW-6399 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Ben Kietzman >Priority: Minor > > Wrapping raw or smart pointer parameters and other declarations with > {{gsl::not_null}} will assert they are not null. The check is dropped for > release builds. > Status is tagged with ARROW_MUST_USE_RESULT which emits warnings when a > Status might be ignored if compiling with clang, but Result<> should probably > be tagged with this too -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6414) [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame
[ https://issues.apache.org/jira/browse/ARROW-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-6414. --- Resolution: Won't Fix These functions are now deprecated > [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame > - > > Key: ARROW-6414 > URL: https://issues.apache.org/jira/browse/ARROW-6414 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.14.0 >Reporter: Stephen Gowdy >Priority: Major > > If you have an empty multiindex columns in a pandas dataframe pyarrow cannot > serialise and deserialise it. Example code is below to show this. > {code:python} > import pandas as pd > import pyarrow as pa > columns = pd.MultiIndex.from_tuples([('a', 'b', 'c')]) > df = pd.DataFrame(columns = columns) > df = df[[]] > pa.deserialize_pandas(pa.serialize_pandas(df).to_pybytes()) > ... > AttributeError: 'dict' object has no attribute 'dtype' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6455) [C++] Implement ExtensionType for non-UTF8 Unicode data
[ https://issues.apache.org/jira/browse/ARROW-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-6455. --- Resolution: Later Closing this until it can be demonstrated that it's actually needed > [C++] Implement ExtensionType for non-UTF8 Unicode data > --- > > Key: ARROW-6455 > URL: https://issues.apache.org/jira/browse/ARROW-6455 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Minor > Fix For: 4.0.0 > > > For use cases where it's OK to transport such data as binary (without > transcoding to UTF-8) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6495) [Plasma] Use xxh3 for object hashing
[ https://issues.apache.org/jira/browse/ARROW-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-6495. --- Resolution: Won't Fix > [Plasma] Use xxh3 for object hashing > > > Key: ARROW-6495 > URL: https://issues.apache.org/jira/browse/ARROW-6495 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Antoine Pitrou >Priority: Minor > > We recently vendored xxh3 in Arrow. Plasma may want to use it for object > hashing, since it's supposed to be even faster than XXH64. > See https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for > performance numbers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6498) [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness
[ https://issues.apache.org/jira/browse/ARROW-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-6498. --- Resolution: Cannot Reproduce Closing since this doesn't seem to be an active problem > [C++][CI] Download googletest tarball and use for EP build to avoid > occasional flakiness > > > Key: ARROW-6498 > URL: https://issues.apache.org/jira/browse/ARROW-6498 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Wes McKinney >Priority: Major > > Failures such as > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27281370/job/dn0ji349v8popkd9 > seem to be happening a fair amount. > We might try to avoid this by wget-ing a tarball and setting > {{$ARROW_GTEST_URL}}. Open to other ideas about how to reduce flakiness -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6571) [Developer] Provide means to "plug in" a third party Arrow implementation into the integration test suite for validation purposes
[ https://issues.apache.org/jira/browse/ARROW-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-6571. --- Resolution: Won't Fix Someone is free to contribute this, but I don't think it's a priority for the community > [Developer] Provide means to "plug in" a third party Arrow implementation > into the integration test suite for validation purposes > - > > Key: ARROW-6571 > URL: https://issues.apache.org/jira/browse/ARROW-6571 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Wes McKinney >Priority: Major > > Our current integration test suite has the details of our reference > implementations hard coded. If a third party implements integration tests for > their Arrow library, we should provide a way for them to test their library > against one or more reference implementations -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6680) [Python] Add Array ctor microbenchmarks
[ https://issues.apache.org/jira/browse/ARROW-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6680: Fix Version/s: 4.0.0 > [Python] Add Array ctor microbenchmarks > --- > > Key: ARROW-6680 > URL: https://issues.apache.org/jira/browse/ARROW-6680 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 4.0.0 > > > Since more unavoidable validation is being added in e.g. > https://github.com/apache/arrow/pull/5488 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6673) [Python] Consider separating libarrow.pxd into multiple definition files
[ https://issues.apache.org/jira/browse/ARROW-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-6673. --- Resolution: Won't Fix > [Python] Consider separating libarrow.pxd into multiple definition files > > > Key: ARROW-6673 > URL: https://issues.apache.org/jira/browse/ARROW-6673 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Krisztian Szucs >Priority: Minor > > See discussion https://github.com/apache/arrow/pull/5423#discussion_r327522836 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6779) [Python] Conversion from datetime.datetime to timstamp('ns') can overflow
[ https://issues.apache.org/jira/browse/ARROW-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6779: Fix Version/s: 4.0.0 > [Python] Conversion from datetime.datetime to timstamp('ns') can overflow > - > > Key: ARROW-6779 > URL: https://issues.apache.org/jira/browse/ARROW-6779 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > Fix For: 4.0.0 > > > In the python conversion of datetime scalars, there is no check for integer > overflow: > {code} > In [32]: pa.array([datetime.datetime(3000, 1, 1)], pa.timestamp('ns')) > > > Out[32]: > > [ > 1830-11-23 00:50:52.580896768 > ] > {code} > So in case the target type has nanosecond unit, this can give wrong results > (I don't think the other resolutions can reach overflow, given the limited > range of years of datetime.datetime). > We should probably check for this case and raise an error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6414) [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame
[ https://issues.apache.org/jira/browse/ARROW-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287799#comment-17287799 ] Stephen Gowdy commented on ARROW-6414: -- What are the replacements? > [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame > - > > Key: ARROW-6414 > URL: https://issues.apache.org/jira/browse/ARROW-6414 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.14.0 >Reporter: Stephen Gowdy >Priority: Major > > If you have an empty multiindex columns in a pandas dataframe pyarrow cannot > serialise and deserialise it. Example code is below to show this. > {code:python} > import pandas as pd > import pyarrow as pa > columns = pd.MultiIndex.from_tuples([('a', 'b', 'c')]) > df = pd.DataFrame(columns = columns) > df = df[[]] > pa.deserialize_pandas(pa.serialize_pandas(df).to_pybytes()) > ... > AttributeError: 'dict' object has no attribute 'dtype' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6498) [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness
[ https://issues.apache.org/jira/browse/ARROW-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287806#comment-17287806 ] Neal Richardson commented on ARROW-6498: And by "we host" I mean GitHub/bintray. > [C++][CI] Download googletest tarball and use for EP build to avoid > occasional flakiness > > > Key: ARROW-6498 > URL: https://issues.apache.org/jira/browse/ARROW-6498 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Wes McKinney >Priority: Major > > Failures such as > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27281370/job/dn0ji349v8popkd9 > seem to be happening a fair amount. > We might try to avoid this by wget-ing a tarball and setting > {{$ARROW_GTEST_URL}}. Open to other ideas about how to reduce flakiness -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6498) [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness
[ https://issues.apache.org/jira/browse/ARROW-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287805#comment-17287805 ] Neal Richardson commented on ARROW-6498: We ended up doing a different solution: mirrors of dependencies that we host. I don't recall the issue that added them, but ARROW-11611 is the issue for updating them following our most recent dependency version bump. > [C++][CI] Download googletest tarball and use for EP build to avoid > occasional flakiness > > > Key: ARROW-6498 > URL: https://issues.apache.org/jira/browse/ARROW-6498 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Wes McKinney >Priority: Major > > Failures such as > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27281370/job/dn0ji349v8popkd9 > seem to be happening a fair amount. > We might try to avoid this by wget-ing a tarball and setting > {{$ARROW_GTEST_URL}}. Open to other ideas about how to reduce flakiness -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11719) Support merged schema for memory table
QP Hou created ARROW-11719: -- Summary: Support merged schema for memory table Key: ARROW-11719 URL: https://issues.apache.org/jira/browse/ARROW-11719 Project: Apache Arrow Issue Type: Task Components: Rust - DataFusion Reporter: QP Hou Assignee: QP Hou Memory table should support loading batches with compatible schemas instead of forcing all schemas to be the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11719) Support merged schema for memory table
[ https://issues.apache.org/jira/browse/ARROW-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11719: --- Labels: pull-request-available (was: ) > Support merged schema for memory table > -- > > Key: ARROW-11719 > URL: https://issues.apache.org/jira/browse/ARROW-11719 > Project: Apache Arrow > Issue Type: Task > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Memory table should support loading batches with compatible schemas instead > of forcing all schemas to be the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11432) [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified field name
[ https://issues.apache.org/jira/browse/ARROW-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287821#comment-17287821 ] R J commented on ARROW-11432: - I'm not sure there is an easy fix without making breaking changes to the public API. When building a join schema, it checks if the join set is valid (physical_plan::hash_utils::check_join_set_is_valid), which has a parent public API call (physical_plan::hash_utils::check_join_is_valid). This join is unaware of the registered name (CSV or parquet) as it is performed with arrow schemas rather than DataFusion schemas. > [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified > field name > --- > > Key: ARROW-11432 > URL: https://issues.apache.org/jira/browse/ARROW-11432 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Affects Versions: 3.0.0 >Reporter: GANG LIAO >Priority: Critical > > https://github.com/apache/arrow/issues/9307 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
Sachit Vithaldas created ARROW-11720: Summary: [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5 Key: ARROW-11720 URL: https://issues.apache.org/jira/browse/ARROW-11720 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 3.0.0 Environment: Ubuntu 18.04.5 LTS Reporter: Sachit Vithaldas I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the following command: {code:java} pip3 install pyarrow{code} When doing so I get the following error: {code:java} Collecting pyarrow Downloading https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz (682kB) 100% || 686kB 2.3MB/s Complete output from command python setup.py egg_info: Traceback (most recent call last): File "", line 1, in File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in from Cython.Distutils import build_ext as _build_ext ModuleNotFoundError: No module named 'Cython' Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2zxk66af/pyarrow/{code} However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This problem seems to be specific to 18.04.5. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
[ https://issues.apache.org/jira/browse/ARROW-11720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287823#comment-17287823 ] Kouhei Sutou commented on ARROW-11720: -- Could you upgrade your pip to use manylinux2010 or manylinux2014? See also: ARROW-11498 > [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5 > - > > Key: ARROW-11720 > URL: https://issues.apache.org/jira/browse/ARROW-11720 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 3.0.0 > Environment: Ubuntu 18.04.5 LTS >Reporter: Sachit Vithaldas >Priority: Major > > I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the > following command: > {code:java} > pip3 install pyarrow{code} > When doing so I get the following error: > {code:java} > Collecting pyarrow > Downloading > https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz > (682kB) > 100% || 686kB 2.3MB/s > Complete output from command python setup.py egg_info: > Traceback (most recent call last): > File "", line 1, in > File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in > from Cython.Distutils import build_ext as _build_ext > ModuleNotFoundError: No module named 'Cython' > > Command "python setup.py egg_info" failed with error code 1 in > /tmp/pip-build-2zxk66af/pyarrow/{code} > However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This > problem seems to be specific to 18.04.5. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
[ https://issues.apache.org/jira/browse/ARROW-11720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287825#comment-17287825 ] Sachit Vithaldas commented on ARROW-11720: -- Upgrading pip resolved the issue for me. Thanks for your help! > [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5 > - > > Key: ARROW-11720 > URL: https://issues.apache.org/jira/browse/ARROW-11720 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 3.0.0 > Environment: Ubuntu 18.04.5 LTS >Reporter: Sachit Vithaldas >Priority: Major > > I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the > following command: > {code:java} > pip3 install pyarrow{code} > When doing so I get the following error: > {code:java} > Collecting pyarrow > Downloading > https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz > (682kB) > 100% || 686kB 2.3MB/s > Complete output from command python setup.py egg_info: > Traceback (most recent call last): > File "", line 1, in > File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in > from Cython.Distutils import build_ext as _build_ext > ModuleNotFoundError: No module named 'Cython' > > Command "python setup.py egg_info" failed with error code 1 in > /tmp/pip-build-2zxk66af/pyarrow/{code} > However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This > problem seems to be specific to 18.04.5. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
[ https://issues.apache.org/jira/browse/ARROW-11720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachit Vithaldas resolved ARROW-11720. -- Resolution: Fixed > [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5 > - > > Key: ARROW-11720 > URL: https://issues.apache.org/jira/browse/ARROW-11720 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 3.0.0 > Environment: Ubuntu 18.04.5 LTS >Reporter: Sachit Vithaldas >Priority: Major > > I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the > following command: > {code:java} > pip3 install pyarrow{code} > When doing so I get the following error: > {code:java} > Collecting pyarrow > Downloading > https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz > (682kB) > 100% || 686kB 2.3MB/s > Complete output from command python setup.py egg_info: > Traceback (most recent call last): > File "", line 1, in > File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in > from Cython.Distutils import build_ext as _build_ext > ModuleNotFoundError: No module named 'Cython' > > Command "python setup.py egg_info" failed with error code 1 in > /tmp/pip-build-2zxk66af/pyarrow/{code} > However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This > problem seems to be specific to 18.04.5. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-11432) [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified field name
[ https://issues.apache.org/jira/browse/ARROW-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287821#comment-17287821 ] R J edited comment on ARROW-11432 at 2/20/21, 11:59 PM: I'm not sure there is an easy fix without making breaking changes to the public API. When building a join schema, it checks if the join set is valid (physical_plan::hash_utils::check_join_set_is_valid), which has a parent public API call (physical_plan::hash_utils::check_join_is_valid). This join is unaware of the registered name (CSV or parquet) as it is performed with arrow schemas rather than DataFusion schemas. EDIT: It could be my lack of knowledge of the DataFusion codebase, but it appears it would need a lot of changes. was (Author: turnofacard): I'm not sure there is an easy fix without making breaking changes to the public API. When building a join schema, it checks if the join set is valid (physical_plan::hash_utils::check_join_set_is_valid), which has a parent public API call (physical_plan::hash_utils::check_join_is_valid). This join is unaware of the registered name (CSV or parquet) as it is performed with arrow schemas rather than DataFusion schemas. > [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified > field name > --- > > Key: ARROW-11432 > URL: https://issues.apache.org/jira/browse/ARROW-11432 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Affects Versions: 3.0.0 >Reporter: GANG LIAO >Priority: Critical > > https://github.com/apache/arrow/issues/9307 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11721) json schame inference should return Schema type instead of SchemaRef
QP Hou created ARROW-11721: -- Summary: json schame inference should return Schema type instead of SchemaRef Key: ARROW-11721 URL: https://issues.apache.org/jira/browse/ARROW-11721 Project: Apache Arrow Issue Type: Task Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11721) json schema inference should return Schema type instead of SchemaRef
[ https://issues.apache.org/jira/browse/ARROW-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] QP Hou updated ARROW-11721: --- Summary: json schema inference should return Schema type instead of SchemaRef (was: json schame inference should return Schema type instead of SchemaRef) > json schema inference should return Schema type instead of SchemaRef > > > Key: ARROW-11721 > URL: https://issues.apache.org/jira/browse/ARROW-11721 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11721) json schema inference should return Schema type instead of SchemaRef
[ https://issues.apache.org/jira/browse/ARROW-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11721: --- Labels: pull-request-available (was: ) > json schema inference should return Schema type instead of SchemaRef > > > Key: ARROW-11721 > URL: https://issues.apache.org/jira/browse/ARROW-11721 > Project: Apache Arrow > Issue Type: Task > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)