[jira] [Created] (ARROW-11707) Support CSV schema inference without seek

2021-02-20 Thread QP Hou (Jira)
QP Hou created ARROW-11707:
--

 Summary: Support CSV schema inference without seek
 Key: ARROW-11707
 URL: https://issues.apache.org/jira/browse/ARROW-11707
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11707) Decouple CSV schema inference from IO

2021-02-20 Thread QP Hou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QP Hou updated ARROW-11707:
---
Summary: Decouple CSV schema inference from IO  (was: Support CSV schema 
inference without seek)

> Decouple CSV schema inference from IO
> -
>
> Key: ARROW-11707
> URL: https://issues.apache.org/jira/browse/ARROW-11707
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11707) Support CSV schema inference without IO

2021-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11707:
---
Labels: pull-request-available  (was: )

> Support CSV schema inference without IO
> ---
>
> Key: ARROW-11707
> URL: https://issues.apache.org/jira/browse/ARROW-11707
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11707) Support CSV schema inference without IO

2021-02-20 Thread QP Hou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QP Hou updated ARROW-11707:
---
Summary: Support CSV schema inference without IO  (was: Decouple CSV schema 
inference from IO)

> Support CSV schema inference without IO
> ---
>
> Key: ARROW-11707
> URL: https://issues.apache.org/jira/browse/ARROW-11707
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11708) Clean up Rust 2021 linting warning

2021-02-20 Thread QP Hou (Jira)
QP Hou created ARROW-11708:
--

 Summary: Clean up Rust 2021 linting warning
 Key: ARROW-11708
 URL: https://issues.apache.org/jira/browse/ARROW-11708
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11708) Clean up Rust 2021 linting warning

2021-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11708:
---
Labels: pull-request-available  (was: )

> Clean up Rust 2021 linting warning
> --
>
> Key: ARROW-11708
> URL: https://issues.apache.org/jira/browse/ARROW-11708
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11709) [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11709:
---

 Summary: [Rust][DataFusion] Move `expressions` and `inputs` into 
LogicalPlan rather than helpers in util
 Key: ARROW-11709
 URL: https://issues.apache.org/jira/browse/ARROW-11709
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb



 move `expressions` and `inputs` into LogicalPlan rather than helpers in util, 
and use Visitor rather than hard coded list

Goal is to consolidate the expression walking in one place



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11710) [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11710:
---

 Summary: [Rust][DataFusion] Implement ExprRewriter to avoid tree 
traversal redundancy
 Key: ARROW-11710
 URL: https://issues.apache.org/jira/browse/ARROW-11710
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb



The idea is to
1. Reduce the amount repetitions in optimizer rules to make them easier to 
implement

2. Reduce the amount of repetition to make it easier to see the actual logic 
(rather than having it intertwined with the code needed to do recursion)

2. Set the stage for a more general `PlanRewriter` that doesn't have  to clone 
its input, and  can modify take their input by value and consume them

Plan is to make an ExprRewriter, the mutable counterpart to `ExpressionVisitor` 
and demonstrates its usefulness by rewriting several expression transformation 
rewrite passes using it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11711:
---

 Summary: [Rust][DataFusion] Rename ExpressionVisitor --> 
ExprVisitor and standardize input
 Key: ARROW-11711
 URL: https://issues.apache.org/jira/browse/ARROW-11711
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


Rename ExpressionVisitor ExprVisitor for consistency and change it to use `&mut 
self` rather than consuming the visitor for consistency with `PlanVisitor` (as 
well as the soon to be created `ExprVisitor`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11712:
---

 Summary: [Rust][DataFusion] Introduce PlanRewriter for rewriting 
plans
 Key: ARROW-11712
 URL: https://issues.apache.org/jira/browse/ARROW-11712
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and 
rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11690) [Rust][DataFusion] Avoid Expr::clone in Expr builder methods

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11690:
---

Assignee: Andrew Lamb

> [Rust][DataFusion] Avoid Expr::clone in Expr builder methods
> 
>
> Key: ARROW-11690
> URL: https://issues.apache.org/jira/browse/ARROW-11690
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11711:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize 
> input
> -
>
> Key: ARROW-11711
> URL: https://issues.apache.org/jira/browse/ARROW-11711
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> Rename ExpressionVisitor ExprVisitor for consistency and change it to use 
> `&mut self` rather than consuming the visitor for consistency with 
> `PlanVisitor` (as well as the soon to be created `ExprVisitor`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11712:
---

Assignee: Andrew Lamb

> [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
> -
>
> Key: ARROW-11712
> URL: https://issues.apache.org/jira/browse/ARROW-11712
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and 
> rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11710) [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11710:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy
> 
>
> Key: ARROW-11710
> URL: https://issues.apache.org/jira/browse/ARROW-11710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> The idea is to
> 1. Reduce the amount repetitions in optimizer rules to make them easier to 
> implement
> 2. Reduce the amount of repetition to make it easier to see the actual logic 
> (rather than having it intertwined with the code needed to do recursion)
> 2. Set the stage for a more general `PlanRewriter` that doesn't have  to 
> clone its input, and  can modify take their input by value and consume them
> Plan is to make an ExprRewriter, the mutable counterpart to 
> `ExpressionVisitor` and demonstrates its usefulness by rewriting several 
> expression transformation rewrite passes using it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11709) [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11709:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather 
> than helpers in util
> ---
>
> Key: ARROW-11709
> URL: https://issues.apache.org/jira/browse/ARROW-11709
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
>  move `expressions` and `inputs` into LogicalPlan rather than helpers in 
> util, and use Visitor rather than hard coded list
> Goal is to consolidate the expression walking in one place



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11690) [Rust][DataFusion] Avoid Expr::clone in Expr builder methods

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11690:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Avoid Expr::clone in Expr builder methods
> 
>
> Key: ARROW-11690
> URL: https://issues.apache.org/jira/browse/ARROW-11690
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11711:
---

Assignee: Andrew Lamb

> [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize 
> input
> -
>
> Key: ARROW-11711
> URL: https://issues.apache.org/jira/browse/ARROW-11711
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> Rename ExpressionVisitor ExprVisitor for consistency and change it to use 
> `&mut self` rather than consuming the visitor for consistency with 
> `PlanVisitor` (as well as the soon to be created `ExprVisitor`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11709) [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11709:
---

Assignee: Andrew Lamb

> [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather 
> than helpers in util
> ---
>
> Key: ARROW-11709
> URL: https://issues.apache.org/jira/browse/ARROW-11709
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
>  move `expressions` and `inputs` into LogicalPlan rather than helpers in 
> util, and use Visitor rather than hard coded list
> Goal is to consolidate the expression walking in one place



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11710) [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11710:
---

Assignee: Andrew Lamb

> [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy
> 
>
> Key: ARROW-11710
> URL: https://issues.apache.org/jira/browse/ARROW-11710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>
> The idea is to
> 1. Reduce the amount repetitions in optimizer rules to make them easier to 
> implement
> 2. Reduce the amount of repetition to make it easier to see the actual logic 
> (rather than having it intertwined with the code needed to do recursion)
> 2. Set the stage for a more general `PlanRewriter` that doesn't have  to 
> clone its input, and  can modify take their input by value and consume them
> Plan is to make an ExprRewriter, the mutable counterpart to 
> `ExpressionVisitor` and demonstrates its usefulness by rewriting several 
> expression transformation rewrite passes using it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11712:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
> -
>
> Key: ARROW-11712
> URL: https://issues.apache.org/jira/browse/ARROW-11712
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Major
>
> Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and 
> rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11713) [Rust] Get MIRI running again

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11713:
---

 Summary: [Rust] Get MIRI running again
 Key: ARROW-11713
 URL: https://issues.apache.org/jira/browse/ARROW-11713
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in 
programs

The Rust arrow implementation now runs the MIRI checks as part of CI, but it 
does not pass cleanly

For example:
https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}

Previously MIRI ran but the check failed in FFI somewhere



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11713) [Rust] Get MIRI running again

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11713:

Description: 
Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in 
programs

The Rust arrow implementation now runs the MIRI checks as part of CI, but it 
does not pass cleanly

For example:
https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}

Previously MIRI ran but the check failed in FFI somewhere

Help wanted!

  was:
Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in 
programs

The Rust arrow implementation now runs the MIRI checks as part of CI, but it 
does not pass cleanly

For example:
https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}

Previously MIRI ran but the check failed in FFI somewhere


> [Rust] Get MIRI running again
> -
>
> Key: ARROW-11713
> URL: https://issues.apache.org/jira/browse/ARROW-11713
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andrew Lamb
>Priority: Major
>
> Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors 
> in programs
> The Rust arrow implementation now runs the MIRI checks as part of CI, but it 
> does not pass cleanly
> For example:
> https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240
> {code}
>Compiling criterion v0.3.4
>Compiling h2 v0.3.0
>Compiling tower v0.4.5
>Compiling hyper v0.14.4
> error[E0463]: can't find crate for `tracing`
>   --> 
> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
>|
> 68 | extern crate tracing;
>| ^ can't find crate
> error: aborting due to previous error
> {code}
> Previously MIRI ran but the check failed in FFI somewhere
> Help wanted!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11713) [Rust] Get MIRI running again

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11713:

Description: 
Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in 
programs

The Rust arrow implementation now runs the MIRI checks as part of CI thanks to 
[~vertexclique] but it does not pass cleanly yet

For example:
https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}

Previously MIRI ran but the check failed in FFI somewhere

Help wanted!

  was:
Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in 
programs

The Rust arrow implementation now runs the MIRI checks as part of CI, but it 
does not pass cleanly

For example:
https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}

Previously MIRI ran but the check failed in FFI somewhere

Help wanted!


> [Rust] Get MIRI running again
> -
>
> Key: ARROW-11713
> URL: https://issues.apache.org/jira/browse/ARROW-11713
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andrew Lamb
>Priority: Major
>
> Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors 
> in programs
> The Rust arrow implementation now runs the MIRI checks as part of CI thanks 
> to [~vertexclique] but it does not pass cleanly yet
> For example:
> https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240
> {code}
>Compiling criterion v0.3.4
>Compiling h2 v0.3.0
>Compiling tower v0.4.5
>Compiling hyper v0.14.4
> error[E0463]: can't find crate for `tracing`
>   --> 
> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
>|
> 68 | extern crate tracing;
>| ^ can't find crate
> error: aborting due to previous error
> {code}
> Previously MIRI ran but the check failed in FFI somewhere
> Help wanted!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11714) [Rust] Fix MIRI build on CI

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11714:
---

 Summary: [Rust] Fix MIRI build on CI
 Key: ARROW-11714
 URL: https://issues.apache.org/jira/browse/ARROW-11714
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


the MIRI check doesn't even compile anymore:

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11715) [Rust] Ensure a successful MIRI Run on CI

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11715:
---

 Summary: [Rust] Ensure a successful MIRI Run on CI
 Key: ARROW-11715
 URL: https://issues.apache.org/jira/browse/ARROW-11715
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


Now we have the MIRI check setup to pass even of `cargo miri` returns an error.

https://github.com/apache/arrow/blob/master/.github/workflows/rust.yml#L263-L264
{code}
  # Ignore MIRI errors until we can get a clean run
  cargo miri test || true
{code}

Goal is to make MIRI pass and then remove this check in CI 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11713) [Rust] Get MIRI running again

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11713:

Component/s: Rust

> [Rust] Get MIRI running again
> -
>
> Key: ARROW-11713
> URL: https://issues.apache.org/jira/browse/ARROW-11713
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andrew Lamb
>Priority: Major
>
> Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors 
> in programs
> The Rust arrow implementation now runs the MIRI checks as part of CI thanks 
> to [~vertexclique] but it does not pass cleanly yet
> For example:
> https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240
> {code}
>Compiling criterion v0.3.4
>Compiling h2 v0.3.0
>Compiling tower v0.4.5
>Compiling hyper v0.14.4
> error[E0463]: can't find crate for `tracing`
>   --> 
> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
>|
> 68 | extern crate tracing;
>| ^ can't find crate
> error: aborting due to previous error
> {code}
> Previously MIRI ran but the check failed in FFI somewhere
> Help wanted!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11708) Clean up Rust 2021 linting warning

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11708.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9535
[https://github.com/apache/arrow/pull/9535]

> Clean up Rust 2021 linting warning
> --
>
> Key: ARROW-11708
> URL: https://issues.apache.org/jira/browse/ARROW-11708
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11716) [Rust][DataFusion] Change tests in sql.rs to use `assert_batch`

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11716:
---

 Summary: [Rust][DataFusion] Change tests in sql.rs to use 
`assert_batch` 
 Key: ARROW-11716
 URL: https://issues.apache.org/jira/browse/ARROW-11716
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb



The idea is to make the tests in 
[sql.rs|https://github.com/apache/arrow/blob/master/rust/datafusion/tests/sql.rs#L103]
 more maintainable by using the `assert_batches_eq` macro that was introduced 
here: https://github.com/apache/arrow/pull/9264




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11692) [Rust][DataFusion] Improve documentation on Optimizer

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb updated ARROW-11692:

Component/s: Rust - DataFusion

> [Rust][DataFusion] Improve documentation on Optimizer
> -
>
> Key: ARROW-11692
> URL: https://issues.apache.org/jira/browse/ARROW-11692
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11692) [Rust][DataFusion] Improve documentation on Optimizer

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb reassigned ARROW-11692:
---

Assignee: Andrew Lamb

> [Rust][DataFusion] Improve documentation on Optimizer
> -
>
> Key: ARROW-11692
> URL: https://issues.apache.org/jira/browse/ARROW-11692
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11692) [Rust][DataFusion] Improve documentation on Optimizer

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11692.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9529
[https://github.com/apache/arrow/pull/9529]

> [Rust][DataFusion] Improve documentation on Optimizer
> -
>
> Key: ARROW-11692
> URL: https://issues.apache.org/jira/browse/ARROW-11692
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11717) [Integration] Intermittent (but frequent) flight integration failures with auth:basic_proto

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11717:
---

 Summary: [Integration] Intermittent (but frequent) flight 
integration failures with auth:basic_proto
 Key: ARROW-11717
 URL: https://issues.apache.org/jira/browse/ARROW-11717
 Project: Apache Arrow
  Issue Type: Bug
  Components: Integration
Reporter: Andrew Lamb


Link to discussion on list: 
https://lists.apache.org/thread.html/r0dcdc2b6334e7f067a828634cf7584406ed859ff4d3fb622fef1bdd7%40%3Cdev.arrow.apache.org%3E

I noticed that the Rust/CPP integration tests are failing seemingly
intermittently on master (and on Rust PRs). The tests pass if they are re-run 
(enough)

There are several commits that  the little red `X` meaning that CI didn't
pass on master https://github.com/apache/arrow/commits/master

Here are some Some example CI runs that are failing
https://github.com/apache/arrow/runs/1935673508
https://github.com/apache/arrow/runs/1926705212

Here is another example:
https://github.com/apache/arrow/pull/9359/checks?check_run_id=1941967422

Example failure:
{code}

==
Testing file auth:basic_proto
==
Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in 
check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 
'['/build/cpp/debug/flight-test-integration-client', '-host', 'localhost', 
'-port=33569', '-scenario', 'auth:basic_proto']' died with .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/runner.py", line 308, in 
_run_flight_test_case
consumer.flight_request(port, **client_args)
  File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 116, in 
flight_request
run_cmd(cmd)
  File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd
raise RuntimeError(sio.getvalue())
RuntimeError: Command failed: /build/cpp/debug/flight-test-integration-client 
-host localhost -port=33569 -scenario auth:basic_proto
With output:
--
-- Arrow Fatal Error --
Invalid: Expected UNAUTHENTICATED but got Unavailable
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11681) [Rust] IPC writers shouldn't unwrap in destructors

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11681.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9520
[https://github.com/apache/arrow/pull/9520]

> [Rust] IPC writers shouldn't unwrap in destructors
> --
>
> Key: ARROW-11681
> URL: https://issues.apache.org/jira/browse/ARROW-11681
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 3.0.0
>Reporter: Steven Fackler
>Assignee: Steven Fackler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> FileWriter and StreamWriter call `self.finish().unwrap()` in their `Drop` 
> implementations if the write has not already been finished. However, a common 
> reason for the write to not be finished is an earlier IO error on the 
> underlying stream. In that case, the destructor will panic, which is not 
> desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11459) [Rust] Allow ListArray of primitives to be built from iterator

2021-02-20 Thread Andrew Lamb (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-11459.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9388
[https://github.com/apache/arrow/pull/9388]

> [Rust] Allow ListArray of primitives to be built from iterator
> --
>
> Key: ARROW-11459
> URL: https://issues.apache.org/jira/browse/ARROW-11459
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Jorge Leitão
>Assignee: Jorge Leitão
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11706) [JS] Better BigInt compatibility check

2021-02-20 Thread Diana Clarke (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diana Clarke updated ARROW-11706:
-
Description: 
See: https://github.com/apache/arrow/pull/9110

Check for whether {{BigInt64ArrayAvailable}} and {{BigUint64ArrayAvailable}} 
are available, rather than just {{BigIntAvailable}}. Recent versions of 
JavaScriptCore/WebKit in Safari support {{BigInt}} but do not support 
{{BigInt64Array}}, and so anything that relies on {{BigInt64Array}} will fail 
despite {{BigIntAvailable}} being true.

The manifestation of this issue can be seen when trying to run the following 
within Safari on a table that contains bigints:
{code:java}
RecordBatchJSONWriter.writeAll(table).toString(true)

message: "BigUint64Array is not available in this environment"
  BigUint64ArrayUnavailableError
  BigUint64ArrayUnavailable
  bignumToString
  bigNumsToStrings
  generatorResume@[native code]
  performIteration@[native code]
  visitInt
  visit
  map@[native code]
  recordBatchToJSON
  close
  finish
  global code
{code}

See also: https://bugs.webkit.org/show_bug.cgi?id=190800

  was:
See: https://github.com/apache/arrow/pull/9110

Check for whether BigInt64ArrayAvailable and BigUint64ArrayAvailable are 
available, rather than just BigIntAvailable. Recent versions of 
JavaScriptCore/WebKit in Safari support BigInt but do not support 
BigInt64Array, and so anything that relies on BigInt64Array will fail despite 
BigIntAvailable being true.

The manifestation of this issue can be seen when trying to run the following 
within Safari on a table that contains bigints:
{code:java}
RecordBatchJSONWriter.writeAll(table).toString(true)

message: "BigUint64Array is not available in this environment"
  BigUint64ArrayUnavailableError
  BigUint64ArrayUnavailable
  bignumToString
  bigNumsToStrings
  generatorResume@[native code]
  performIteration@[native code]
  visitInt
  visit
  map@[native code]
  recordBatchToJSON
  close
  finish
  global code
{code}

See also: https://bugs.webkit.org/show_bug.cgi?id=190800


> [JS] Better BigInt compatibility check
> --
>
> Key: ARROW-11706
> URL: https://issues.apache.org/jira/browse/ARROW-11706
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Diana Clarke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See: https://github.com/apache/arrow/pull/9110
> Check for whether {{BigInt64ArrayAvailable}} and {{BigUint64ArrayAvailable}} 
> are available, rather than just {{BigIntAvailable}}. Recent versions of 
> JavaScriptCore/WebKit in Safari support {{BigInt}} but do not support 
> {{BigInt64Array}}, and so anything that relies on {{BigInt64Array}} will fail 
> despite {{BigIntAvailable}} being true.
> The manifestation of this issue can be seen when trying to run the following 
> within Safari on a table that contains bigints:
> {code:java}
> RecordBatchJSONWriter.writeAll(table).toString(true)
> message: "BigUint64Array is not available in this environment"
>   BigUint64ArrayUnavailableError
>   BigUint64ArrayUnavailable
>   bignumToString
>   bigNumsToStrings
>   generatorResume@[native code]
>   performIteration@[native code]
>   visitInt
>   visit
>   map@[native code]
>   recordBatchToJSON
>   close
>   finish
>   global code
> {code}
> See also: https://bugs.webkit.org/show_bug.cgi?id=190800



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11706) [JS] Better BigInt compatibility check

2021-02-20 Thread Diana Clarke (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diana Clarke updated ARROW-11706:
-
Component/s: JavaScript

> [JS] Better BigInt compatibility check
> --
>
> Key: ARROW-11706
> URL: https://issues.apache.org/jira/browse/ARROW-11706
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Diana Clarke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See: https://github.com/apache/arrow/pull/9110
> Check for whether {{BigInt64ArrayAvailable}} and {{BigUint64ArrayAvailable}} 
> are available, rather than just {{BigIntAvailable}}. Recent versions of 
> JavaScriptCore/WebKit in Safari support {{BigInt}} but do not support 
> {{BigInt64Array}}, and so anything that relies on {{BigInt64Array}} will fail 
> despite {{BigIntAvailable}} being true.
> The manifestation of this issue can be seen when trying to run the following 
> within Safari on a table that contains bigints:
> {code:java}
> RecordBatchJSONWriter.writeAll(table).toString(true)
> message: "BigUint64Array is not available in this environment"
>   BigUint64ArrayUnavailableError
>   BigUint64ArrayUnavailable
>   bignumToString
>   bigNumsToStrings
>   generatorResume@[native code]
>   performIteration@[native code]
>   visitInt
>   visit
>   map@[native code]
>   recordBatchToJSON
>   close
>   finish
>   global code
> {code}
> See also: https://bugs.webkit.org/show_bug.cgi?id=190800



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11698) [Website] Write blog post about C++ endianness compatibility

2021-02-20 Thread Kazuaki Ishizaki (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287755#comment-17287755
 ] 

Kazuaki Ishizaki commented on ARROW-11698:
--

Sounds good.

It would be nice to add big-endian support on Java, too. This is because our 
community do not announce it while Arrow 3.0 includes the feature. What do you 
think?

> [Website] Write blog post about C++ endianness compatibility
> 
>
> Key: ARROW-11698
> URL: https://issues.apache.org/jira/browse/ARROW-11698
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Antoine Pitrou
>Priority: Minor
>
> It might be nice to announce the cross-endian compatibility effort on the 
> website.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11718) [Rust] IPC writers shouldn't implicitly finish on drop

2021-02-20 Thread Steven Fackler (Jira)
Steven Fackler created ARROW-11718:
--

 Summary: [Rust] IPC writers shouldn't implicitly finish on drop
 Key: ARROW-11718
 URL: https://issues.apache.org/jira/browse/ARROW-11718
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 3.0.0
Reporter: Steven Fackler
Assignee: Steven Fackler


The Rust IPC writer types have a destructor that automatically writes the 
footer if necessary. This is not ideal, though, since it can hide errors. For 
example, if a web server is streaming data to a client in the Arrow IPC format 
and it encounters an internal error trying to generate the next batch, the 
outbound stream will appear valid to the client as the footer will 
automatically be written out but some amount of data will actually be missing. 
If the footer was not automatically written, the client would properly detect 
the truncation.

For reference, the C++ implementation does not attempt to write the footer 
implicitly on drop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11718) [Rust] IPC writers shouldn't implicitly finish on drop

2021-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11718:
---
Labels: pull-request-available  (was: )

> [Rust] IPC writers shouldn't implicitly finish on drop
> --
>
> Key: ARROW-11718
> URL: https://issues.apache.org/jira/browse/ARROW-11718
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 3.0.0
>Reporter: Steven Fackler
>Assignee: Steven Fackler
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Rust IPC writer types have a destructor that automatically writes the 
> footer if necessary. This is not ideal, though, since it can hide errors. For 
> example, if a web server is streaming data to a client in the Arrow IPC 
> format and it encounters an internal error trying to generate the next batch, 
> the outbound stream will appear valid to the client as the footer will 
> automatically be written out but some amount of data will actually be 
> missing. If the footer was not automatically written, the client would 
> properly detect the truncation.
> For reference, the C++ implementation does not attempt to write the footer 
> implicitly on drop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11017) [Rust] [DataFusion] Add support for Parquet schema merging

2021-02-20 Thread QP Hou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287787#comment-17287787
 ] 

QP Hou commented on ARROW-11017:


The Schema struct does have a try_merge method defined, should we just reuse 
that one?

> [Rust] [DataFusion] Add support for Parquet schema merging 
> ---
>
> Key: ARROW-11017
> URL: https://issues.apache.org/jira/browse/ARROW-11017
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> Add support for Parquet schema merging so that we can read data sets where 
> some files have additional columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6399) [C++] More extensive attributes usage could improve debugging

2021-02-20 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287794#comment-17287794
 ] 

Wes McKinney commented on ARROW-6399:
-

Is it possible to go ahead and add {{ARROW_MUST_USE_RESULT}} to {{Result}}?

> [C++] More extensive attributes usage could improve debugging
> -
>
> Key: ARROW-6399
> URL: https://issues.apache.org/jira/browse/ARROW-6399
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Minor
>
> Wrapping  raw or smart pointer parameters and other declarations with 
> {{gsl::not_null}} will assert they are not null. The check is dropped for 
> release builds.
> Status is tagged with ARROW_MUST_USE_RESULT which emits warnings when a 
> Status might be ignored if compiling with clang, but Result<> should probably 
> be tagged with this too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6414) [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6414.
---
Resolution: Won't Fix

These functions are now deprecated

> [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame
> -
>
> Key: ARROW-6414
> URL: https://issues.apache.org/jira/browse/ARROW-6414
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0
>Reporter: Stephen Gowdy
>Priority: Major
>
> If you have an empty multiindex columns in a pandas dataframe pyarrow cannot 
> serialise and deserialise it. Example code is below to show this.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> columns = pd.MultiIndex.from_tuples([('a', 'b', 'c')])
> df = pd.DataFrame(columns = columns)
> df = df[[]]
> pa.deserialize_pandas(pa.serialize_pandas(df).to_pybytes())
> ...
> AttributeError: 'dict' object has no attribute 'dtype'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6455) [C++] Implement ExtensionType for non-UTF8 Unicode data

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6455.
---
Resolution: Later

Closing this until it can be demonstrated that it's actually needed

> [C++] Implement ExtensionType for non-UTF8 Unicode data
> ---
>
> Key: ARROW-6455
> URL: https://issues.apache.org/jira/browse/ARROW-6455
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
> Fix For: 4.0.0
>
>
> For use cases where it's OK to transport such data as binary (without 
> transcoding to UTF-8)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6495) [Plasma] Use xxh3 for object hashing

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6495.
---
Resolution: Won't Fix

> [Plasma] Use xxh3 for object hashing
> 
>
> Key: ARROW-6495
> URL: https://issues.apache.org/jira/browse/ARROW-6495
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Antoine Pitrou
>Priority: Minor
>
> We recently vendored xxh3 in Arrow. Plasma may want to use it for object 
> hashing, since it's supposed to be even faster than XXH64.
> See https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for 
> performance numbers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6498) [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6498.
---
Resolution: Cannot Reproduce

Closing since this doesn't seem to be an active problem

> [C++][CI] Download googletest tarball and use for EP build to avoid 
> occasional flakiness
> 
>
> Key: ARROW-6498
> URL: https://issues.apache.org/jira/browse/ARROW-6498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Major
>
> Failures such as 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27281370/job/dn0ji349v8popkd9
>  seem to be happening a fair amount.
> We might try to avoid this by wget-ing a tarball and setting 
> {{$ARROW_GTEST_URL}}. Open to other ideas about how to reduce flakiness



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6571) [Developer] Provide means to "plug in" a third party Arrow implementation into the integration test suite for validation purposes

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6571.
---
Resolution: Won't Fix

Someone is free to contribute this, but I don't think it's a priority for the 
community

> [Developer] Provide means to "plug in" a third party Arrow implementation 
> into the integration test suite for validation purposes
> -
>
> Key: ARROW-6571
> URL: https://issues.apache.org/jira/browse/ARROW-6571
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Priority: Major
>
> Our current integration test suite has the details of our reference 
> implementations hard coded. If a third party implements integration tests for 
> their Arrow library, we should provide a way for them to test their library 
> against one or more reference implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6680) [Python] Add Array ctor microbenchmarks

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6680:

Fix Version/s: 4.0.0

> [Python] Add Array ctor microbenchmarks
> ---
>
> Key: ARROW-6680
> URL: https://issues.apache.org/jira/browse/ARROW-6680
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 4.0.0
>
>
> Since more unavoidable validation is being added in e.g. 
> https://github.com/apache/arrow/pull/5488



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6673) [Python] Consider separating libarrow.pxd into multiple definition files

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6673.
---
Resolution: Won't Fix

> [Python] Consider separating libarrow.pxd into multiple definition files
> 
>
> Key: ARROW-6673
> URL: https://issues.apache.org/jira/browse/ARROW-6673
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Minor
>
> See discussion https://github.com/apache/arrow/pull/5423#discussion_r327522836



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6779) [Python] Conversion from datetime.datetime to timstamp('ns') can overflow

2021-02-20 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-6779:

Fix Version/s: 4.0.0

> [Python] Conversion from datetime.datetime to timstamp('ns') can overflow
> -
>
> Key: ARROW-6779
> URL: https://issues.apache.org/jira/browse/ARROW-6779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
> Fix For: 4.0.0
>
>
> In the python conversion of datetime scalars, there is no check for integer 
> overflow:
> {code}
> In [32]: pa.array([datetime.datetime(3000, 1, 1)], pa.timestamp('ns'))
>   
>
> Out[32]: 
> 
> [
>   1830-11-23 00:50:52.580896768
> ]
> {code}
> So in case the target type has nanosecond unit, this can give wrong results 
> (I don't think the other resolutions can reach overflow, given the limited 
> range of years of datetime.datetime).
> We should probably check for this case and raise an error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6414) [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame

2021-02-20 Thread Stephen Gowdy (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287799#comment-17287799
 ] 

Stephen Gowdy commented on ARROW-6414:
--

What are the replacements? 

> [Python] pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame
> -
>
> Key: ARROW-6414
> URL: https://issues.apache.org/jira/browse/ARROW-6414
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.0
>Reporter: Stephen Gowdy
>Priority: Major
>
> If you have an empty multiindex columns in a pandas dataframe pyarrow cannot 
> serialise and deserialise it. Example code is below to show this.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> columns = pd.MultiIndex.from_tuples([('a', 'b', 'c')])
> df = pd.DataFrame(columns = columns)
> df = df[[]]
> pa.deserialize_pandas(pa.serialize_pandas(df).to_pybytes())
> ...
> AttributeError: 'dict' object has no attribute 'dtype'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6498) [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness

2021-02-20 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287806#comment-17287806
 ] 

Neal Richardson commented on ARROW-6498:


And by "we host" I mean GitHub/bintray.

> [C++][CI] Download googletest tarball and use for EP build to avoid 
> occasional flakiness
> 
>
> Key: ARROW-6498
> URL: https://issues.apache.org/jira/browse/ARROW-6498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Major
>
> Failures such as 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27281370/job/dn0ji349v8popkd9
>  seem to be happening a fair amount.
> We might try to avoid this by wget-ing a tarball and setting 
> {{$ARROW_GTEST_URL}}. Open to other ideas about how to reduce flakiness



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6498) [C++][CI] Download googletest tarball and use for EP build to avoid occasional flakiness

2021-02-20 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287805#comment-17287805
 ] 

Neal Richardson commented on ARROW-6498:


We ended up doing a different solution: mirrors of dependencies that we host. I 
don't recall the issue that added them, but ARROW-11611 is the issue for 
updating them following our most recent dependency version bump.

> [C++][CI] Download googletest tarball and use for EP build to avoid 
> occasional flakiness
> 
>
> Key: ARROW-6498
> URL: https://issues.apache.org/jira/browse/ARROW-6498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Major
>
> Failures such as 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27281370/job/dn0ji349v8popkd9
>  seem to be happening a fair amount.
> We might try to avoid this by wget-ing a tarball and setting 
> {{$ARROW_GTEST_URL}}. Open to other ideas about how to reduce flakiness



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11719) Support merged schema for memory table

2021-02-20 Thread QP Hou (Jira)
QP Hou created ARROW-11719:
--

 Summary: Support merged schema for memory table
 Key: ARROW-11719
 URL: https://issues.apache.org/jira/browse/ARROW-11719
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust - DataFusion
Reporter: QP Hou
Assignee: QP Hou


Memory table should support loading batches with compatible schemas instead of 
forcing all schemas to be the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11719) Support merged schema for memory table

2021-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11719:
---
Labels: pull-request-available  (was: )

> Support merged schema for memory table
> --
>
> Key: ARROW-11719
> URL: https://issues.apache.org/jira/browse/ARROW-11719
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust - DataFusion
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Memory table should support loading batches with compatible schemas instead 
> of forcing all schemas to be the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11432) [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified field name

2021-02-20 Thread R J (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287821#comment-17287821
 ] 

R J commented on ARROW-11432:
-

I'm not sure there is an easy fix without making breaking changes to the public 
API. When building a join schema, it checks if the join set is valid 
(physical_plan::hash_utils::check_join_set_is_valid), which has a parent public 
API call (physical_plan::hash_utils::check_join_is_valid). This join is unaware 
of the registered name (CSV or parquet) as it is performed with arrow schemas 
rather than DataFusion schemas.

> [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified 
> field name
> ---
>
> Key: ARROW-11432
> URL: https://issues.apache.org/jira/browse/ARROW-11432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0
>Reporter: GANG LIAO
>Priority: Critical
>
> https://github.com/apache/arrow/issues/9307



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5

2021-02-20 Thread Sachit Vithaldas (Jira)
Sachit Vithaldas created ARROW-11720:


 Summary: [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
 Key: ARROW-11720
 URL: https://issues.apache.org/jira/browse/ARROW-11720
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 3.0.0
 Environment: Ubuntu 18.04.5 LTS
Reporter: Sachit Vithaldas


I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the 
following command:
{code:java}
pip3 install pyarrow{code}
When doing so I get the following error:
{code:java}
Collecting pyarrow
 Downloading 
https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz
 (682kB)
 100% || 686kB 2.3MB/s 
 Complete output from command python setup.py egg_info:
 Traceback (most recent call last):
 File "", line 1, in 
 File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in 
 from Cython.Distutils import build_ext as _build_ext
 ModuleNotFoundError: No module named 'Cython'

 
Command "python setup.py egg_info" failed with error code 1 in 
/tmp/pip-build-2zxk66af/pyarrow/{code}

However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This 
problem seems to be specific to 18.04.5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5

2021-02-20 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287823#comment-17287823
 ] 

Kouhei Sutou commented on ARROW-11720:
--

Could you upgrade your pip to use manylinux2010 or manylinux2014?

See also:  ARROW-11498

> [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
> -
>
> Key: ARROW-11720
> URL: https://issues.apache.org/jira/browse/ARROW-11720
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 3.0.0
> Environment: Ubuntu 18.04.5 LTS
>Reporter: Sachit Vithaldas
>Priority: Major
>
> I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the 
> following command:
> {code:java}
> pip3 install pyarrow{code}
> When doing so I get the following error:
> {code:java}
> Collecting pyarrow
>  Downloading 
> https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz
>  (682kB)
>  100% || 686kB 2.3MB/s 
>  Complete output from command python setup.py egg_info:
>  Traceback (most recent call last):
>  File "", line 1, in 
>  File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in 
>  from Cython.Distutils import build_ext as _build_ext
>  ModuleNotFoundError: No module named 'Cython'
>  
> Command "python setup.py egg_info" failed with error code 1 in 
> /tmp/pip-build-2zxk66af/pyarrow/{code}
> However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This 
> problem seems to be specific to 18.04.5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5

2021-02-20 Thread Sachit Vithaldas (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287825#comment-17287825
 ] 

Sachit Vithaldas commented on ARROW-11720:
--

Upgrading pip resolved the issue for me. Thanks for your help!

> [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
> -
>
> Key: ARROW-11720
> URL: https://issues.apache.org/jira/browse/ARROW-11720
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 3.0.0
> Environment: Ubuntu 18.04.5 LTS
>Reporter: Sachit Vithaldas
>Priority: Major
>
> I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the 
> following command:
> {code:java}
> pip3 install pyarrow{code}
> When doing so I get the following error:
> {code:java}
> Collecting pyarrow
>  Downloading 
> https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz
>  (682kB)
>  100% || 686kB 2.3MB/s 
>  Complete output from command python setup.py egg_info:
>  Traceback (most recent call last):
>  File "", line 1, in 
>  File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in 
>  from Cython.Distutils import build_ext as _build_ext
>  ModuleNotFoundError: No module named 'Cython'
>  
> Command "python setup.py egg_info" failed with error code 1 in 
> /tmp/pip-build-2zxk66af/pyarrow/{code}
> However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This 
> problem seems to be specific to 18.04.5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11720) [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5

2021-02-20 Thread Sachit Vithaldas (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachit Vithaldas resolved ARROW-11720.
--
Resolution: Fixed

> [Python] Cannot install pyarrow via pip on Ubuntu 18.04.5
> -
>
> Key: ARROW-11720
> URL: https://issues.apache.org/jira/browse/ARROW-11720
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 3.0.0
> Environment: Ubuntu 18.04.5 LTS
>Reporter: Sachit Vithaldas
>Priority: Major
>
> I was attempting to install pyarrow via pip on Ubuntu 18.04 by using the 
> following command:
> {code:java}
> pip3 install pyarrow{code}
> When doing so I get the following error:
> {code:java}
> Collecting pyarrow
>  Downloading 
> https://files.pythonhosted.org/packages/62/d3/a482d8a4039bf931ed6388308f0cc0541d0cab46f0bbff7c897a74f1c576/pyarrow-3.0.0.tar.gz
>  (682kB)
>  100% || 686kB 2.3MB/s 
>  Complete output from command python setup.py egg_info:
>  Traceback (most recent call last):
>  File "", line 1, in 
>  File "/tmp/pip-build-2zxk66af/pyarrow/setup.py", line 37, in 
>  from Cython.Distutils import build_ext as _build_ext
>  ModuleNotFoundError: No module named 'Cython'
>  
> Command "python setup.py egg_info" failed with error code 1 in 
> /tmp/pip-build-2zxk66af/pyarrow/{code}
> However this does seem to work without any issues on Ubuntu 20.04.1 LTS. This 
> problem seems to be specific to 18.04.5.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-11432) [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified field name

2021-02-20 Thread R J (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287821#comment-17287821
 ] 

R J edited comment on ARROW-11432 at 2/20/21, 11:59 PM:


I'm not sure there is an easy fix without making breaking changes to the public 
API. When building a join schema, it checks if the join set is valid 
(physical_plan::hash_utils::check_join_set_is_valid), which has a parent public 
API call (physical_plan::hash_utils::check_join_is_valid). This join is unaware 
of the registered name (CSV or parquet) as it is performed with arrow schemas 
rather than DataFusion schemas.

 

EDIT:

It could be my lack of knowledge of the DataFusion codebase, but it appears it 
would need a lot of changes.


was (Author: turnofacard):
I'm not sure there is an easy fix without making breaking changes to the public 
API. When building a join schema, it checks if the join set is valid 
(physical_plan::hash_utils::check_join_set_is_valid), which has a parent public 
API call (physical_plan::hash_utils::check_join_is_valid). This join is unaware 
of the registered name (CSV or parquet) as it is performed with arrow schemas 
rather than DataFusion schemas.

> [Rust][DataFusion] Join Statement: Schema contains duplicate unqualified 
> field name
> ---
>
> Key: ARROW-11432
> URL: https://issues.apache.org/jira/browse/ARROW-11432
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 3.0.0
>Reporter: GANG LIAO
>Priority: Critical
>
> https://github.com/apache/arrow/issues/9307



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11721) json schame inference should return Schema type instead of SchemaRef

2021-02-20 Thread QP Hou (Jira)
QP Hou created ARROW-11721:
--

 Summary: json schame inference should return Schema type instead 
of SchemaRef
 Key: ARROW-11721
 URL: https://issues.apache.org/jira/browse/ARROW-11721
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11721) json schema inference should return Schema type instead of SchemaRef

2021-02-20 Thread QP Hou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QP Hou updated ARROW-11721:
---
Summary: json schema inference should return Schema type instead of 
SchemaRef  (was: json schame inference should return Schema type instead of 
SchemaRef)

> json schema inference should return Schema type instead of SchemaRef
> 
>
> Key: ARROW-11721
> URL: https://issues.apache.org/jira/browse/ARROW-11721
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11721) json schema inference should return Schema type instead of SchemaRef

2021-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11721:
---
Labels: pull-request-available  (was: )

> json schema inference should return Schema type instead of SchemaRef
> 
>
> Key: ARROW-11721
> URL: https://issues.apache.org/jira/browse/ARROW-11721
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)