[jira] [Created] (ARROW-10575) [Rust] Rename union.rs to be cosistent with other arrays

2020-11-12 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-10575:
---

 Summary: [Rust] Rename union.rs to be cosistent with other arrays
 Key: ARROW-10575
 URL: https://issues.apache.org/jira/browse/ARROW-10575
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


Each type of array was split into it's own module but union was not renamed to 
be consistent (it was split out previously)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9591) [Rust] Investigate removing the offset requirement on the boolean kernels

2020-07-28 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9591:
--

 Summary: [Rust] Investigate removing the offset requirement on the 
boolean kernels
 Key: ARROW-9591
 URL: https://issues.apache.org/jira/browse/ARROW-9591
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9590) [Rust] Ensure SIMD kernel implementations handle slicing correctly

2020-07-28 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9590:
--

 Summary: [Rust] Ensure SIMD kernel implementations handle slicing 
correctly
 Key: ARROW-9590
 URL: https://issues.apache.org/jira/browse/ARROW-9590
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9494) [Rust] master fails due to use of "fXX::NAN"

2020-07-15 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9494:
--

 Summary: [Rust] master fails due to use of "fXX::NAN"
 Key: ARROW-9494
 URL: https://issues.apache.org/jira/browse/ARROW-9494
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


I'm getting an error that no associated type exists.  Changing to 
"std::fXX::NAN" fixes the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9491) [Rust] "simd" feature is not testing in CI

2020-07-15 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9491:
--

 Summary: [Rust] "simd" feature is not testing in CI
 Key: ARROW-9491
 URL: https://issues.apache.org/jira/browse/ARROW-9491
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9361) [Rust] Move other array types into their own modules

2020-07-07 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9361:
--

 Summary: [Rust] Move other array types into their own modules
 Key: ARROW-9361
 URL: https://issues.apache.org/jira/browse/ARROW-9361
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


The array module is getting too big to be practical.  We should leave the core 
types like the Array trait in `array.rs` and move each array type into its own 
sub-module as we did while implementing the Union array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9339) [Rust] Comments on SIMD in Arrow README are incorrect

2020-07-06 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9339:
--

 Summary: [Rust] Comments on SIMD in Arrow README are incorrect
 Key: ARROW-9339
 URL: https://issues.apache.org/jira/browse/ARROW-9339
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9338) [Rust] Add instructions for running clippy locally

2020-07-06 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9338:
--

 Summary: [Rust] Add instructions for running clippy locally
 Key: ARROW-9338
 URL: https://issues.apache.org/jira/browse/ARROW-9338
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan


Similar to the "Code Formatting" section in the top level README it would be 
useful to add instructions for running clippy locally to avoid wasted CI time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9324) [Rust] Implement Union validity bitmap changes from ARROW-9222

2020-07-04 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9324:
--

 Summary: [Rust] Implement Union validity bitmap changes from 
ARROW-9222
 Key: ARROW-9324
 URL: https://issues.apache.org/jira/browse/ARROW-9324
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9200) [Rust] Investigate running Valgrind on the Rust tests in CI

2020-06-21 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9200:
--

 Summary: [Rust] Investigate running Valgrind on the Rust tests in 
CI
 Key: ARROW-9200
 URL: https://issues.apache.org/jira/browse/ARROW-9200
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6482) [Rust] Investigate enabling features in regex crate to reduce compile times

2020-06-01 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121073#comment-17121073
 ] 

Paddy Horan commented on ARROW-6482:


Yep, it's the unicode features I had in mind.

> [Rust] Investigate enabling features in regex crate to reduce compile times
> ---
>
> Key: ARROW-6482
> URL: https://issues.apache.org/jira/browse/ARROW-6482
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Priority: Minor
>  Labels: beginner
> Fix For: 1.0.0
>
>
> The regex crate recently added a feature flag to reduce compile times and 
> binary size if certain unicode related features are not needed.  We should 
> investigate using this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-8819) [Rust] Rust docs don't complile for the Arrow crate

2020-05-27 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan closed ARROW-8819.
--
Resolution: Not A Problem

> [Rust] Rust docs don't complile for the Arrow crate
> ---
>
> Key: ARROW-8819
> URL: https://issues.apache.org/jira/browse/ARROW-8819
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Paddy Horan
>Priority: Major
>
> See Github [issue|https://github.com/apache/arrow/issues/7194]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8902) [rust][datafusion] optimize count(*) queries on parquet sources

2020-05-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-8902:
---
Component/s: Rust

> [rust][datafusion] optimize count(*) queries on parquet sources
> ---
>
> Key: ARROW-8902
> URL: https://issues.apache.org/jira/browse/ARROW-8902
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Alex Gaynor
>Priority: Minor
>
> Currently, as far as I can tell, when you perform a `select count(*) from 
> dataset` in datafusion against a parquet dataset, the way this is implemented 
> is by doing a scan on column 0, and counting up all of the rows (specifically 
> I think it counts the # of rows in each batch).
>  
> However, for the specific case of just counting _everythign_ in a parquet 
> file, you can just read the rowcount from the footer metadata, so it's O(1) 
> instead of O(n)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8902) [rust][datafusion] optimize count(*) queries on parquet sources

2020-05-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-8902:
---
Issue Type: Improvement  (was: Bug)

> [rust][datafusion] optimize count(*) queries on parquet sources
> ---
>
> Key: ARROW-8902
> URL: https://issues.apache.org/jira/browse/ARROW-8902
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Alex Gaynor
>Priority: Minor
>
> Currently, as far as I can tell, when you perform a `select count(*) from 
> dataset` in datafusion against a parquet dataset, the way this is implemented 
> is by doing a scan on column 0, and counting up all of the rows (specifically 
> I think it counts the # of rows in each batch).
>  
> However, for the specific case of just counting _everythign_ in a parquet 
> file, you can just read the rowcount from the footer metadata, so it's O(1) 
> instead of O(n)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8559) [Rust] Consolidate Record Batch iterator traits in main arrow crate

2020-05-16 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109019#comment-17109019
 ] 

Paddy Horan commented on ARROW-8559:


Yep, I agree.  Reader it is.

On the _SendableBatchReader_, both have a _schema_ method in addition to _next_ 
which have different return types _Arc_ vs _Rc_ I think.  I guess it's a 
convenience also (when used as a trait object, etc.).

> [Rust] Consolidate Record Batch iterator traits in main arrow crate
> ---
>
> Key: ARROW-8559
> URL: https://issues.apache.org/jira/browse/ARROW-8559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>
> We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` 
> trait in the main arrow crate.
> They differ in that `BatchIterator` is Send + Sync.  They should both be in 
> the Arrow crate and be named `BatchIterator` and `SendableBatchIterator`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8559) [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-05-16 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-8559:
---
Summary: [Rust] Consolidate Record Batch reader traits in main arrow crate  
(was: [Rust] Consolidate Record Batch iterator traits in main arrow crate)

> [Rust] Consolidate Record Batch reader traits in main arrow crate
> -
>
> Key: ARROW-8559
> URL: https://issues.apache.org/jira/browse/ARROW-8559
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>
> We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` 
> trait in the main arrow crate.
> They differ in that `BatchIterator` is Send + Sync.  They should both be in 
> the Arrow crate and be named `BatchIterator` and `SendableBatchIterator`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8819) [Rust] Rust docs don't complile for the Arrow crate

2020-05-15 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8819:
--

 Summary: [Rust] Rust docs don't complile for the Arrow crate
 Key: ARROW-8819
 URL: https://issues.apache.org/jira/browse/ARROW-8819
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 0.17.0
Reporter: Paddy Horan


See Github [issue|https://github.com/apache/arrow/issues/7194]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8818) [Rust] Failing to build on master due to Flatbuffers/Union issues

2020-05-15 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8818:
--

 Summary: [Rust] Failing to build on master due to 
Flatbuffers/Union issues
 Key: ARROW-8818
 URL: https://issues.apache.org/jira/browse/ARROW-8818
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8817) [Rust] Add support for Union arrays in Parquet

2020-05-15 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8817:
--

 Summary: [Rust] Add support for Union arrays in Parquet
 Key: ARROW-8817
 URL: https://issues.apache.org/jira/browse/ARROW-8817
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8751) [Rust] ParquetFileArrowReader should be able to read empty parquet file without error

2020-05-14 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8751.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7140
[https://github.com/apache/arrow/pull/7140]

> [Rust] ParquetFileArrowReader should be able to read empty parquet file 
> without error
> -
>
> Key: ARROW-8751
> URL: https://issues.apache.org/jira/browse/ARROW-8751
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sometimes spark will write out parquet files with zero row groups, which will 
> result in error if read using ParquetFileArrowReader.
> It would be more convenient if ParquetFileArrowReader can support this 
> edge-case out of the box.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8723) [Rust] Remove SIMD specific benchmark code

2020-05-06 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8723:
--

 Summary: [Rust] Remove SIMD specific benchmark code
 Key: ARROW-8723
 URL: https://issues.apache.org/jira/browse/ARROW-8723
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


Now that SIMD is behind a feature flag it's trivial to compare SIMD vs non-SIMD 
and the SIMD versions of benchmarks can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8614) [Website] Create Rust-specific 0.17.0 blog post

2020-05-06 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101118#comment-17101118
 ] 

Paddy Horan commented on ARROW-8614:


Nothing specific for the 0.17.0 release.  I think we should specifically call 
out the integration testing as our top goal for the next release and "rally the 
community" so to speak.

> [Website] Create Rust-specific 0.17.0 blog post
> ---
>
> Key: ARROW-8614
> URL: https://issues.apache.org/jira/browse/ARROW-8614
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>
> I wrote a brief blog post about DataFusion 0.17.0 [1] and Wes suggested that 
> I post it on the Arrow blog. We might want to expand this to cover the  Rust 
> implementation of Arrow in general and it might be a good opportunity to talk 
> about things we want help with for the next release (such as integration 
> testing, implementing a parquet writer, etc).
>  [1] https://andygrove.io/2020/04/datafusion-0.17.0/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8622) [Rust] Parquet crate does not compile on aarch64

2020-04-30 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8622.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7059
[https://github.com/apache/arrow/pull/7059]

> [Rust] Parquet crate does not compile on aarch64
> 
>
> Key: ARROW-8622
> URL: https://issues.apache.org/jira/browse/ARROW-8622
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: R. Tyler Croy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8622) [Rust] Parquet crate does not compile on aarch64

2020-04-29 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8622:
--

Assignee: R. Tyler Croy

> [Rust] Parquet crate does not compile on aarch64
> 
>
> Key: ARROW-8622
> URL: https://issues.apache.org/jira/browse/ARROW-8622
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: R. Tyler Croy
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8622) [Rust] Parquet crate does not compile on aarch64

2020-04-29 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8622:
--

 Summary: [Rust] Parquet crate does not compile on aarch64
 Key: ARROW-8622
 URL: https://issues.apache.org/jira/browse/ARROW-8622
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8617) [Rust] simd_load_set_invalid does not exist on aarch64

2020-04-28 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8617.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7049
[https://github.com/apache/arrow/pull/7049]

> [Rust] simd_load_set_invalid does not exist on aarch64
> --
>
> Key: ARROW-8617
> URL: https://issues.apache.org/jira/browse/ARROW-8617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: R. Tyler Croy
>Assignee: R. Tyler Croy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/7049



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8617) [Rust] simd_load_set_invalid does not exist on aarch64

2020-04-28 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8617:
--

Assignee: R. Tyler Croy

> [Rust] simd_load_set_invalid does not exist on aarch64
> --
>
> Key: ARROW-8617
> URL: https://issues.apache.org/jira/browse/ARROW-8617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: R. Tyler Croy
>Assignee: R. Tyler Croy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/7049



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8591) [Rust] Reverse lookup for a key in DictionaryArray

2020-04-28 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8591.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7036
[https://github.com/apache/arrow/pull/7036]

> [Rust] Reverse lookup for a key in DictionaryArray
> --
>
> Key: ARROW-8591
> URL: https://issues.apache.org/jira/browse/ARROW-8591
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Mahmut Bulut
>Assignee: Mahmut Bulut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, there is no way to do a reverse lookup for DictionaryArray. A 
> reverse lookup would be beneficial. (Enables creation of combiner masks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8616) [Rust] Turn explicit SIMD off by default

2020-04-28 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8616:
--

 Summary: [Rust] Turn explicit SIMD off by default
 Key: ARROW-8616
 URL: https://issues.apache.org/jira/browse/ARROW-8616
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


We plan to remove packed_simd so we should turn it off by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8598) [Rust] simd_compare_op creates buffer of incorrect length when item count is not a multiple of T::lanes()

2020-04-28 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8598.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7043
[https://github.com/apache/arrow/pull/7043]

> [Rust] simd_compare_op creates buffer of incorrect length when item count is 
> not a multiple of T::lanes()
> -
>
> Key: ARROW-8598
> URL: https://issues.apache.org/jira/browse/ARROW-8598
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Yordan Pavlov
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> the simd_compare_op function defined here 
> [https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L229]
> appears to only work correctly when item count is a multiple of T::lanes().
> Otherwise the resulting boolean array is created with a buffer of incorrect 
> length and subsequent binary operations such as compute::and return an error.
> The no_simd_compare_op function defined in the same module does not appear to 
> have this problem.
> This bug can be reproduced with the following code:
>  
> {code:java}
> fn main() {
> let lanes = Int8Type::lanes();
> println("i8 lanes: {}", lanes); // 64
> // let item_count = 128; // this works because item_count divides by 64 
> (lanes) without remainder
> let item_count = 130; // this fails because item_count divides by 64 
> (lanes) with remainder
> // item_count = 130 should return error:
> // ComputeError("Buffers must be the same size to apply Bitwise AND.")
>  
> // create boolean array
> let mut select_mask: BooleanArray = vec![true; item_count].into();
> // create arrays with i8 values
> let value_array: PrimitiveArray = vec![1; item_count].into();
> let filter_array: PrimitiveArray = vec![2; item_count].into();
> // compare i8 arrays and produce a boolean array
> let result_mask = compute::gt_eq(_array, _array).unwrap();
> // compare boolean arrays
>  select_mask = compute::and(_mask, _mask).unwrap();
> // print result, should be all false
> println!("select mask: {:?}", select_mask);
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8598) [Rust] simd_compare_op creates buffer of incorrect length when item count is not a multiple of T::lanes()

2020-04-26 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8598:
--

Assignee: Paddy Horan

> [Rust] simd_compare_op creates buffer of incorrect length when item count is 
> not a multiple of T::lanes()
> -
>
> Key: ARROW-8598
> URL: https://issues.apache.org/jira/browse/ARROW-8598
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Yordan Pavlov
>Assignee: Paddy Horan
>Priority: Major
>
> the simd_compare_op function defined here 
> [https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L229]
> appears to only work correctly when item count is a multiple of T::lanes().
> Otherwise the resulting boolean array is created with a buffer of incorrect 
> length and subsequent binary operations such as compute::and return an error.
> The no_simd_compare_op function defined in the same module does not appear to 
> have this problem.
> This bug can be reproduced with the following code:
>  
> {code:java}
> fn main() {
> let lanes = Int8Type::lanes();
> println("i8 lanes: {}", lanes); // 64
> // let item_count = 128; // this works because item_count divides by 64 
> (lanes) without remainder
> let item_count = 130; // this fails because item_count divides by 64 
> (lanes) with remainder
> // item_count = 130 should return error:
> // ComputeError("Buffers must be the same size to apply Bitwise AND.")
>  
> // create boolean array
> let mut select_mask: BooleanArray = vec![true; item_count].into();
> // create arrays with i8 values
> let value_array: PrimitiveArray = vec![1; item_count].into();
> let filter_array: PrimitiveArray = vec![2; item_count].into();
> // compare i8 arrays and produce a boolean array
> let result_mask = compute::gt_eq(_array, _array).unwrap();
> // compare boolean arrays
>  select_mask = compute::and(_mask, _mask).unwrap();
> // print result, should be all false
> println!("select mask: {:?}", select_mask);
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8576) [Rust] Implement ArrayEqual for UnionArray

2020-04-23 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8576:
--

 Summary: [Rust] Implement ArrayEqual for UnionArray
 Key: ARROW-8576
 URL: https://issues.apache.org/jira/browse/ARROW-8576
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8516) [Rust] Slow BufferBuilder inserts within PrimitiveBuilder::append_slice

2020-04-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8516.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6980
[https://github.com/apache/arrow/pull/6980]

> [Rust] Slow BufferBuilder inserts within 
> PrimitiveBuilder::append_slice
> 
>
> Key: ARROW-8516
> URL: https://issues.apache.org/jira/browse/ARROW-8516
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Raphael Taylor-Davies
>Assignee: Raphael Taylor-Davies
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {color:#00}BufferBuilder{color}{color:#0073bf}<{color}{color:#00}BooleanType>::append_slice
>  is called by PrimitiveBuilder{color}{color:#00}::append_slice with a 
> constructed vector of true values. {color}
> {color:#00}Even in release builds the associated allocations and 
> iterations are not optimised out, resulting in a third of the time to parse a 
> parquet file containing single integers being spent in 
> PrimitiveBuilder::append_slice.{color}
> {color:#00}This PR adds an append_n method to the BufferBuilderTrait that 
> allows this to be handled more efficiently. My rather unscientific testing 
> shows it to halve the amount of time spent in this method yielding an ~20% 
> speedup for my particular workload.{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8552) [Rust] support column iteration for parquet row

2020-04-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8552.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7009
[https://github.com/apache/arrow/pull/7009]

> [Rust] support column iteration for parquet row
> ---
>
> Key: ARROW-8552
> URL: https://issues.apache.org/jira/browse/ARROW-8552
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would be useful to be able to iterate through all the columns in a parquet 
> row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8552) [Rust] support column iteration for parquet row

2020-04-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8552:
--

Assignee: QP Hou

> [Rust] support column iteration for parquet row
> ---
>
> Key: ARROW-8552
> URL: https://issues.apache.org/jira/browse/ARROW-8552
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would be useful to be able to iterate through all the columns in a parquet 
> row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8560) [Rust] Docs for MutableBuffer resize are incorrect

2020-04-22 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8560:
--

 Summary: [Rust] Docs for MutableBuffer resize are incorrect
 Key: ARROW-8560
 URL: https://issues.apache.org/jira/browse/ARROW-8560
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8559) [Rust] Consolidate Record Batch iterator traits in main arrow crate

2020-04-22 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8559:
--

 Summary: [Rust] Consolidate Record Batch iterator traits in main 
arrow crate
 Key: ARROW-8559
 URL: https://issues.apache.org/jira/browse/ARROW-8559
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


We have the `BatchIterator` trait in DataFusion and the `RecordBatchReader` 
trait in the main arrow crate.

They differ in that `BatchIterator` is Send + Sync.  They should both be in the 
Arrow crate and be named `BatchIterator` and `SendableBatchIterator`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8558) [Rust] GitHub Actions missing rustfmt

2020-04-22 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8558.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7010
[https://github.com/apache/arrow/pull/7010]

> [Rust] GitHub Actions missing rustfmt
> -
>
> Key: ARROW-8558
> URL: https://issues.apache.org/jira/browse/ARROW-8558
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: CI, Rust
>Reporter: Paddy Horan
>Assignee: Neville Dipale
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8558) [Rust] GitHub Actions missing rustfmt

2020-04-22 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8558:
--

 Summary: [Rust] GitHub Actions missing rustfmt
 Key: ARROW-8558
 URL: https://issues.apache.org/jira/browse/ARROW-8558
 Project: Apache Arrow
  Issue Type: New Feature
  Components: CI, Rust
Reporter: Paddy Horan
Assignee: Neville Dipale






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8547) [Rust] Implement JsonEqual for UnionArray

2020-04-21 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8547:
--

 Summary: [Rust] Implement JsonEqual for UnionArray
 Key: ARROW-8547
 URL: https://issues.apache.org/jira/browse/ARROW-8547
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8546) [Rust] Handle UnionArray in get_fb_field_type

2020-04-21 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8546:
--

 Summary: [Rust] Handle UnionArray in get_fb_field_type
 Key: ARROW-8546
 URL: https://issues.apache.org/jira/browse/ARROW-8546
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8516) [Rust] Slow BufferBuilder inserts within PrimitiveBuilder::append_slice

2020-04-19 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8516:
--

Assignee: Raphael Taylor-Davies

> [Rust] Slow BufferBuilder inserts within 
> PrimitiveBuilder::append_slice
> 
>
> Key: ARROW-8516
> URL: https://issues.apache.org/jira/browse/ARROW-8516
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Raphael Taylor-Davies
>Assignee: Raphael Taylor-Davies
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {color:#00}BufferBuilder{color}{color:#0073bf}<{color}{color:#00}BooleanType>::append_slice
>  is called by PrimitiveBuilder{color}{color:#00}::append_slice with a 
> constructed vector of true values. {color}
> {color:#00}Even in release builds the associated allocations and 
> iterations are not optimised out, resulting in a third of the time to parse a 
> parquet file containing single integers being spent in 
> PrimitiveBuilder::append_slice.{color}
> {color:#00}This PR adds an append_n method to the BufferBuilderTrait that 
> allows this to be handled more efficiently. My rather unscientific testing 
> shows it to halve the amount of time spent in this method yielding an ~20% 
> speedup for my particular workload.{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8516) [Rust] Slow BufferBuilder inserts within PrimitiveBuilder::append_slice

2020-04-19 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-8516:
---
Summary: [Rust] Slow BufferBuilder inserts within 
PrimitiveBuilder::append_slice  (was: Slow BufferBuilder inserts 
within PrimitiveBuilder::append_slice)

> [Rust] Slow BufferBuilder inserts within 
> PrimitiveBuilder::append_slice
> 
>
> Key: ARROW-8516
> URL: https://issues.apache.org/jira/browse/ARROW-8516
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Raphael Taylor-Davies
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {color:#00}BufferBuilder{color}{color:#0073bf}<{color}{color:#00}BooleanType>::append_slice
>  is called by PrimitiveBuilder{color}{color:#00}::append_slice with a 
> constructed vector of true values. {color}
> {color:#00}Even in release builds the associated allocations and 
> iterations are not optimised out, resulting in a third of the time to parse a 
> parquet file containing single integers being spent in 
> PrimitiveBuilder::append_slice.{color}
> {color:#00}This PR adds an append_n method to the BufferBuilderTrait that 
> allows this to be handled more efficiently. My rather unscientific testing 
> shows it to halve the amount of time spent in this method yielding an ~20% 
> speedup for my particular workload.{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8480) [Rust] There is no check for allocation failure

2020-04-16 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8480:
--

 Summary: [Rust] There is no check for allocation failure
 Key: ARROW-8480
 URL: https://issues.apache.org/jira/browse/ARROW-8480
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan


Reported by bluss on Github:

[https://github.com/rust-ndarray/ndarray/issues/771]

 

"What I can see, there is no check for allocation success, so any buffer can be 
created with a null pointer, which leads to soundness problems in most methods. 
Best look into using {{std::alloc::handle_alloc_error}} or alternatives. (This 
problem means that the mutablebuffer is not a safe abstraction, and it should 
preferably not be exposed as public API like this.)"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8479) [Rust] Use "raw_data_mut" in "Buffer::typed_data_mut"

2020-04-16 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8479:
--

 Summary: [Rust] Use "raw_data_mut" in "Buffer::typed_data_mut"
 Key: ARROW-8479
 URL: https://issues.apache.org/jira/browse/ARROW-8479
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan


See: [https://github.com/apache/arrow/pull/6395/files#r408699014]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8387) [Rust] Make schema_to_fb public

2020-04-11 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8387.

Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6884
[https://github.com/apache/arrow/pull/6884]

> [Rust] Make schema_to_fb public
> ---
>
> Key: ARROW-8387
> URL: https://issues.apache.org/jira/browse/ARROW-8387
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Max Burke
>Assignee: Max Burke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Make schema_to_fb public because it is very useful!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8387) [Rust] Make schema_to_fb public

2020-04-11 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8387:
--

Assignee: Max Burke

> [Rust] Make schema_to_fb public
> ---
>
> Key: ARROW-8387
> URL: https://issues.apache.org/jira/browse/ARROW-8387
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Max Burke
>Assignee: Max Burke
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Make schema_to_fb public because it is very useful!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8219) [Rust] sqlparser crate needs to be bumped to version 0.2.5

2020-03-25 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8219:
--

 Summary: [Rust] sqlparser crate needs to be bumped to version 0.2.5
 Key: ARROW-8219
 URL: https://issues.apache.org/jira/browse/ARROW-8219
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Affects Versions: 0.16.0
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8204) [Rust] [DataFusion] Add support for aliased expressions in SQL

2020-03-25 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8204.

Resolution: Fixed

Issue resolved by pull request 6713
[https://github.com/apache/arrow/pull/6713]

> [Rust] [DataFusion] Add support for aliased expressions in SQL
> --
>
> Key: ARROW-8204
> URL: https://issues.apache.org/jira/browse/ARROW-8204
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8197) [Rust] DataFusion "create_physical_plan" returns incorrect schema?

2020-03-24 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066145#comment-17066145
 ] 

Paddy Horan commented on ARROW-8197:


Ok, that's basically the conclusion I came to, that Column::new should be 
updated to receive the name.  Then I thought that maybe it was a design 
decision (I've yet to read your new book, though it's on my list...)

> [Rust] DataFusion "create_physical_plan" returns incorrect schema?
> --
>
> Key: ARROW-8197
> URL: https://issues.apache.org/jira/browse/ARROW-8197
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.1
>Reporter: Paddy Horan
>Assignee: Andy Grove
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am using DataFusion in a situation where I know there will only be a single 
> file.  DataFusion currently collects all batches into a vector.
> As I am writing the data back out I want to work with an iterator instead of 
> a vector.
> I have something as follows:
> {code:java}
> let plan = ctx.create_logical_plan().unwrap();
> let plan = ctx.optimize().unwrap();
> dbg!(plan.schema());  // Returns field names
> let plan = ctx.create_physical_plan(, batch_size).unwrap();
> dbg!(plan.schema()); // Returns c0, c1, etc{code}
> Maybe this is expected after turning the plan into a physical plan?
> I can change the schema of the returned batches, would this be the 
> recommended way to address this or is there something in DataFusion I should 
> leverage to do this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8197) [Rust] DataFusion "create_physical_plan" returns incorrect schema?

2020-03-24 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-8197:
---
Description: 
I am using DataFusion in a situation where I know there will only be a single 
file.  DataFusion currently collects all batches into a vector.

As I am writing the data back out I want to work with an iterator instead of a 
vector.

I have something as follows:
{code:java}
let plan = ctx.create_logical_plan().unwrap();
let plan = ctx.optimize().unwrap();
dbg!(plan.schema());  // Returns field names
let plan = ctx.create_physical_plan(, batch_size).unwrap();
dbg!(plan.schema()); // Returns c0, c1, etc{code}
Maybe this is expected after turning the plan into a physical plan?

I can change the schema of the returned batches, would this be the recommended 
way to address this or is there something in DataFusion I should leverage to do 
this?

  was:
I am using DataFusion in a situation where I know there will only be a single 
file.  DataFusion currently collects all batches into a vector.

As I am writing the data back out I want to work with an iterator instead of a 
vector.

I have something as follows:
{code:java}
let plan = ctx.create_logical_plan().unwrap();
let plan = ctx.optimize().unwrap();
dbg!(plan.schema());  // Returns field names
let plan = ctx.create_physical_plan(, batch_size).unwrap();
dbg!(plan.schema()); // Returns c0, c1, etc{code}
Maybe this is expected after turning the plan into a physical plan?

I can change the schema of the returned batches, would this be the recommended 
way to address this or is their something in DataFusion I should leverage to do 
this?


> [Rust] DataFusion "create_physical_plan" returns incorrect schema?
> --
>
> Key: ARROW-8197
> URL: https://issues.apache.org/jira/browse/ARROW-8197
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.1
>Reporter: Paddy Horan
>Priority: Minor
>
> I am using DataFusion in a situation where I know there will only be a single 
> file.  DataFusion currently collects all batches into a vector.
> As I am writing the data back out I want to work with an iterator instead of 
> a vector.
> I have something as follows:
> {code:java}
> let plan = ctx.create_logical_plan().unwrap();
> let plan = ctx.optimize().unwrap();
> dbg!(plan.schema());  // Returns field names
> let plan = ctx.create_physical_plan(, batch_size).unwrap();
> dbg!(plan.schema()); // Returns c0, c1, etc{code}
> Maybe this is expected after turning the plan into a physical plan?
> I can change the schema of the returned batches, would this be the 
> recommended way to address this or is there something in DataFusion I should 
> leverage to do this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8197) [Rust] DataFusion "create_physical_plan" returns incorrect schema?

2020-03-24 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8197:
--

 Summary: [Rust] DataFusion "create_physical_plan" returns 
incorrect schema?
 Key: ARROW-8197
 URL: https://issues.apache.org/jira/browse/ARROW-8197
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Affects Versions: 0.15.1
Reporter: Paddy Horan


I am using DataFusion in a situation where I know there will only be a single 
file.  DataFusion currently collects all batches into a vector.

As I am writing the data back out I want to work with an iterator instead of a 
vector.

I have something as follows:
{code:java}
let plan = ctx.create_logical_plan().unwrap();
let plan = ctx.optimize().unwrap();
dbg!(plan.schema());  // Returns field names
let plan = ctx.create_physical_plan(, batch_size).unwrap();
dbg!(plan.schema()); // Returns c0, c1, etc{code}
Maybe this is expected after turning the plan into a physical plan?

I can change the schema of the returned batches, would this be the recommended 
way to address this or is their something in DataFusion I should leverage to do 
this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8177) [rust] Make schema_to_fb_offset public because it is very useful!

2020-03-21 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8177.

Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6677
[https://github.com/apache/arrow/pull/6677]

> [rust] Make schema_to_fb_offset public because it is very useful!
> -
>
> Key: ARROW-8177
> URL: https://issues.apache.org/jira/browse/ARROW-8177
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Max Burke
>Assignee: Max Burke
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We also use flatbuffer encoded data into which we expose Arrow schemas and so 
> it would be tremendously useful to not have to duplicate this code in our 
> codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8177) [rust] Make schema_to_fb_offset public because it is very useful!

2020-03-21 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-8177:
---
Component/s: Rust

> [rust] Make schema_to_fb_offset public because it is very useful!
> -
>
> Key: ARROW-8177
> URL: https://issues.apache.org/jira/browse/ARROW-8177
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Max Burke
>Assignee: Max Burke
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We also use flatbuffer encoded data into which we expose Arrow schemas and so 
> it would be tremendously useful to not have to duplicate this code in our 
> codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8177) [rust] Make schema_to_fb_offset public because it is very useful!

2020-03-21 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8177:
--

Assignee: Max Burke

> [rust] Make schema_to_fb_offset public because it is very useful!
> -
>
> Key: ARROW-8177
> URL: https://issues.apache.org/jira/browse/ARROW-8177
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Max Burke
>Assignee: Max Burke
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We also use flatbuffer encoded data into which we expose Arrow schemas and so 
> it would be tremendously useful to not have to duplicate this code in our 
> codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8117) [Rust] [Datafusion] Allow CAST from number to timestamp

2020-03-14 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-8117.

Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6618
[https://github.com/apache/arrow/pull/6618]

> [Rust] [Datafusion] Allow CAST from number to timestamp
> ---
>
> Key: ARROW-8117
> URL: https://issues.apache.org/jira/browse/ARROW-8117
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Morgan Cassels
>Assignee: Morgan Cassels
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> With the current sqlparser version, timestamps cannot be parsed in queries. 
> Upgrading datafusion to a new sqlparser version is a major change. In the 
> meantime allow casting number to timestamp to allow selection on timestamp 
> type columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8117) [Rust] [Datafusion] Allow CAST from number to timestamp

2020-03-14 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-8117:
--

Assignee: Morgan Cassels

> [Rust] [Datafusion] Allow CAST from number to timestamp
> ---
>
> Key: ARROW-8117
> URL: https://issues.apache.org/jira/browse/ARROW-8117
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Morgan Cassels
>Assignee: Morgan Cassels
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> With the current sqlparser version, timestamps cannot be parsed in queries. 
> Upgrading datafusion to a new sqlparser version is a major change. In the 
> meantime allow casting number to timestamp to allow selection on timestamp 
> type columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5357) [Rust] Add capacity field in Buffer

2020-02-20 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-5357.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 4331
[https://github.com/apache/arrow/pull/4331]

> [Rust] Add capacity field in Buffer
> ---
>
> Key: ARROW-5357
> URL: https://issues.apache.org/jira/browse/ARROW-5357
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently {{Buffer}} only has {{len}}, but no {{capacity}}. We should add 
> both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7836) [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB

2020-02-17 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-7836.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6422
[https://github.com/apache/arrow/pull/6422]

> [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB
> 
>
> Key: ARROW-7836
> URL: https://issues.apache.org/jira/browse/ARROW-7836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> See [this thread|https://github.com/apache/arrow/pull/6397] for background.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7836) [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB

2020-02-17 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-7836:
---
Component/s: Rust

> [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB
> 
>
> Key: ARROW-7836
> URL: https://issues.apache.org/jira/browse/ARROW-7836
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See [this thread|https://github.com/apache/arrow/pull/6397] for background.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7836) [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB

2020-02-13 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-7836:
--

Assignee: Paddy Horan

> [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB
> 
>
> Key: ARROW-7836
> URL: https://issues.apache.org/jira/browse/ARROW-7836
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>
> See [this thread|https://github.com/apache/arrow/pull/6397] for background.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7836) [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB

2020-02-11 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-7836:
--

 Summary: [Rust] "allocate_aligned"/"reallocate" need to initialize 
memory to avoid UB
 Key: ARROW-7836
 URL: https://issues.apache.org/jira/browse/ARROW-7836
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Paddy Horan


See [this thread|https://github.com/apache/arrow/pull/6397] for background.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7624) [Rust] Soundness issues via `Buffer` methods

2020-02-11 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-7624.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6397
[https://github.com/apache/arrow/pull/6397]

> [Rust] Soundness issues via `Buffer` methods
> 
>
> Key: ARROW-7624
> URL: https://issues.apache.org/jira/browse/ARROW-7624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.15.1
>Reporter: Jim Turner
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This is my first time creating an issue, so please let me know if I need to 
> do anything differently.
> There are a few soundness issues with the methods currently available on 
> {{Buffer}}.
>  # Using a combination of {{from_raw_parts}} and {{data}}/{{as_ref}}, e.g. 
> {{Buffer::from_raw_parts(ptr, len).data()}}, it's possible to dereference 
> arbitrary memory locations, break pointer aliasing rules, etc. To fix this, 
> `from_raw_parts` needs to be `unsafe`, and the safety requirements on `ptr` 
> and `len` should be specified. (For an example of a similar method in the 
> standard library, see 
> [{{std::slice::from_raw_parts}}|https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html].)
>  # By implementing the {{ArrowNativeType}} trait on a struct, it's possible 
> for a user to create invalid values of that struct using the {{typed_data}} 
> method. To fix this, the {{ArrowNativeType}} trait needs to be {{unsafe}}, or 
> users need to be prevented from implementing {{ArrowNativeType}} on arbitrary 
> types. Alternatively, the {{typed_data}} method could be made unsafe.
>  # It's possible to create invalid values of the {{bool}} type using 
> {{typed_data}}. ([Values of {{bool}} must be {{0x00}} or 
> {{0x01}}|https://doc.rust-lang.org/nomicon/what-unsafe-does.html]; arbitrary 
> {{u8}} cannot safely be reinterpreted as {{bool}}.) To fix this, 
> {{typed_data::()}} needs to iterate over all the data and check that 
> all the elements are valid, or {{typed_data}} needs to be marked {{unsafe}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7624) [Rust] Soundness issues via `Buffer` methods

2020-02-10 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-7624:
--

Assignee: Paddy Horan

> [Rust] Soundness issues via `Buffer` methods
> 
>
> Key: ARROW-7624
> URL: https://issues.apache.org/jira/browse/ARROW-7624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.15.1
>Reporter: Jim Turner
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is my first time creating an issue, so please let me know if I need to 
> do anything differently.
> There are a few soundness issues with the methods currently available on 
> {{Buffer}}.
>  # Using a combination of {{from_raw_parts}} and {{data}}/{{as_ref}}, e.g. 
> {{Buffer::from_raw_parts(ptr, len).data()}}, it's possible to dereference 
> arbitrary memory locations, break pointer aliasing rules, etc. To fix this, 
> `from_raw_parts` needs to be `unsafe`, and the safety requirements on `ptr` 
> and `len` should be specified. (For an example of a similar method in the 
> standard library, see 
> [{{std::slice::from_raw_parts}}|https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html].)
>  # By implementing the {{ArrowNativeType}} trait on a struct, it's possible 
> for a user to create invalid values of that struct using the {{typed_data}} 
> method. To fix this, the {{ArrowNativeType}} trait needs to be {{unsafe}}, or 
> users need to be prevented from implementing {{ArrowNativeType}} on arbitrary 
> types. Alternatively, the {{typed_data}} method could be made unsafe.
>  # It's possible to create invalid values of the {{bool}} type using 
> {{typed_data}}. ([Values of {{bool}} must be {{0x00}} or 
> {{0x01}}|https://doc.rust-lang.org/nomicon/what-unsafe-does.html]; arbitrary 
> {{u8}} cannot safely be reinterpreted as {{bool}}.) To fix this, 
> {{typed_data::()}} needs to iterate over all the data and check that 
> all the elements are valid, or {{typed_data}} needs to be marked {{unsafe}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7624) [Rust] Soundness issues via `Buffer` methods

2020-01-20 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019793#comment-17019793
 ] 

Paddy Horan commented on ARROW-7624:


Thanks for reporting this [~jturner314].

We recently had a discussion about these kinds of issues on the [mailing 
list|[https://lists.apache.org/thread.html/r07f4ba469563a764d19fd08622adbcd0e3ac895a6e8165ae44b8dee8%40%3Cdev.arrow.apache.org%3E].]
  I'm going to start trying to clean them up soon.

> [Rust] Soundness issues via `Buffer` methods
> 
>
> Key: ARROW-7624
> URL: https://issues.apache.org/jira/browse/ARROW-7624
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.15.1
>Reporter: Jim Turner
>Priority: Major
>
> This is my first time creating an issue, so please let me know if I need to 
> do anything differently.
> There are a few soundness issues with the methods currently available on 
> {{Buffer}}.
>  # Using a combination of {{from_raw_parts}} and {{data}}/{{as_ref}}, e.g. 
> {{Buffer::from_raw_parts(ptr, len).data()}}, it's possible to dereference 
> arbitrary memory locations, break pointer aliasing rules, etc. To fix this, 
> `from_raw_parts` needs to be `unsafe`, and the safety requirements on `ptr` 
> and `len` should be specified. (For an example of a similar method in the 
> standard library, see 
> [{{std::slice::from_raw_parts}}|https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html].)
>  # By implementing the {{ArrowNativeType}} trait on a struct, it's possible 
> for a user to create invalid values of that struct using the {{typed_data}} 
> method. To fix this, the {{ArrowNativeType}} trait needs to be {{unsafe}}, or 
> users need to be prevented from implementing {{ArrowNativeType}} on arbitrary 
> types. Alternatively, the {{typed_data}} method could be made unsafe.
>  # It's possible to create invalid values of the {{bool}} type using 
> {{typed_data}}. ([Values of {{bool}} must be {{0x00}} or 
> {{0x01}}|https://doc.rust-lang.org/nomicon/what-unsafe-does.html]; arbitrary 
> {{u8}} cannot safely be reinterpreted as {{bool}}.) To fix this, 
> {{typed_data::()}} needs to iterate over all the data and check that 
> all the elements are valid, or {{typed_data}} needs to be marked {{unsafe}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7559) [Rust] Possibly incorrect index check assertion in StringArray and BinaryArray

2020-01-13 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014817#comment-17014817
 ] 

Paddy Horan commented on ARROW-7559:


Thanks [~jhorstmann].

It took a while to convince myself but I agree with your suggested fix, I 
posted a PR [here|https://github.com/apache/arrow/pull/6180] to fix it.

FYI - `ArrayData` is fairly low level, you can use the other builders so you 
don't have to manage the offsets, etc. yourself, see the test I added in the PR 
(which is basically your exact example above).

> [Rust] Possibly incorrect index check assertion in StringArray and BinaryArray
> --
>
> Key: ARROW-7559
> URL: https://issues.apache.org/jira/browse/ARROW-7559
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.16.0
>Reporter: Jörn Horstmann
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following code tries to build a list array based on an underlying string 
> array and panics on master (commit acfcdee75acb4b1814f2e727c150a7403d618e8f)
> {code:java}
>  #[test]
> fn nested_string_array() {
> let strarray = StringArray::from(vec!["foo", "bar", "foobar"]);
> let nestedData = 
> ArrayData::builder(DataType::List(Box::new(DataType::Utf8)))
> .len(2)
> .add_buffer(Buffer::from(&[0, 2, 3].to_byte_slice()))
> .add_child_data(ArrayData::builder(DataType::Utf8)
> .len(strarray.len())
> .add_buffer(strarray.value_offsets())
> .add_buffer(strarray.value_data())
> .build())
> .build();
> let nestedArray = ListArray::from(nestedData);
> dbg!(nestedArray);
> }{code}
> My guess is that the index check in StringArray.value is incorrect, instead 
> of 
> {code:java}
> pub fn value(, i: usize) ->  {
> assert!(
> i + self.offset() < self.data.len(),
> "StringArray out of bounds access"
> );
> {code}
> it should probably compare {{i}} without adding the offset. The same check is 
> also done in {{BinaryArray}}. Changing this results in the expected output of
> {code:java}
> [arrow/src/array/array.rs:2460] nestedArray = ListArray
> [
>   StringArray
> [
>   "foo",
>   "bar",
> ],
>   StringArray
> [
>   "foobar",
> ],
> ]
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7559) [Rust] Possibly incorrect index check assertion in StringArray and BinaryArray

2020-01-13 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-7559:
--

Assignee: Paddy Horan

> [Rust] Possibly incorrect index check assertion in StringArray and BinaryArray
> --
>
> Key: ARROW-7559
> URL: https://issues.apache.org/jira/browse/ARROW-7559
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.16.0
>Reporter: Jörn Horstmann
>Assignee: Paddy Horan
>Priority: Major
>
> The following code tries to build a list array based on an underlying string 
> array and panics on master (commit acfcdee75acb4b1814f2e727c150a7403d618e8f)
> {code:java}
>  #[test]
> fn nested_string_array() {
> let strarray = StringArray::from(vec!["foo", "bar", "foobar"]);
> let nestedData = 
> ArrayData::builder(DataType::List(Box::new(DataType::Utf8)))
> .len(2)
> .add_buffer(Buffer::from(&[0, 2, 3].to_byte_slice()))
> .add_child_data(ArrayData::builder(DataType::Utf8)
> .len(strarray.len())
> .add_buffer(strarray.value_offsets())
> .add_buffer(strarray.value_data())
> .build())
> .build();
> let nestedArray = ListArray::from(nestedData);
> dbg!(nestedArray);
> }{code}
> My guess is that the index check in StringArray.value is incorrect, instead 
> of 
> {code:java}
> pub fn value(, i: usize) ->  {
> assert!(
> i + self.offset() < self.data.len(),
> "StringArray out of bounds access"
> );
> {code}
> it should probably compare {{i}} without adding the offset. The same check is 
> also done in {{BinaryArray}}. Changing this results in the expected output of
> {code:java}
> [arrow/src/array/array.rs:2460] nestedArray = ListArray
> [
>   StringArray
> [
>   "foo",
>   "bar",
> ],
>   StringArray
> [
>   "foobar",
> ],
> ]
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7193) [Rust] Create Arrow stream reader

2019-12-31 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-7193.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 6013
[https://github.com/apache/arrow/pull/6013]

> [Rust] Create Arrow stream reader
> -
>
> Key: ARROW-7193
> URL: https://issues.apache.org/jira/browse/ARROW-7193
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6718) [Rust] packed_simd requires nightly

2019-12-24 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002842#comment-17002842
 ] 

Paddy Horan commented on ARROW-6718:


I would love to get back on stable.  I added a feature to disable explicit SIMD 
to try and make progress toward this goal.  Although the main thing we need is 
specialization on stable.

 

If we can get the same level of performance then I'm all for removing 
packed_simd.  At the time we adopted it, the author was trying to get it 
adopted into std.  Since then he has stopped driving this forward until other 
features land.

 

I'll take a look in the next few days to compare performance, etc. 

> [Rust] packed_simd requires nightly 
> 
>
> Key: ARROW-6718
> URL: https://issues.apache.org/jira/browse/ARROW-6718
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
>
> See [https://github.com/rust-lang/rfcs/pull/2366] for more info on 
> stabilization of this crate.
>  
> {code:java}
> error[E0554]: `#![feature]` may not be used on the stable release channel
>--> 
> /home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/lib.rs:202:1
> |
> 202 | / #![feature(
> 203 | | repr_simd,
> 204 | | const_fn,
> 205 | | platform_intrinsics,
> ...   |
> 215 | | custom_inner_attributes
> 216 | | )]
> | |__^
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7325) [Rust] [Parquet] Update to parquet-format 2.6 and thrift 0.12

2019-12-05 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-7325.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5971
[https://github.com/apache/arrow/pull/5971]

> [Rust] [Parquet] Update to parquet-format 2.6 and thrift 0.12
> -
>
> Key: ARROW-7325
> URL: https://issues.apache.org/jira/browse/ARROW-7325
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Kornelijus Survila
>Assignee: Kornelijus Survila
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {{parquet-format 2.5}} and {{thrift 0.0.4}} bring in outdated versions of 
> third-party crates such as {{byteorder}}, {{ordered-float}}, and 
> {{num-traits}}. Let's update as few of them have reached 1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7325) [Rust] [Parquet] Update to parquet-format 2.6 and thrift 0.12

2019-12-05 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-7325:
--

Assignee: Kornelijus Survila

> [Rust] [Parquet] Update to parquet-format 2.6 and thrift 0.12
> -
>
> Key: ARROW-7325
> URL: https://issues.apache.org/jira/browse/ARROW-7325
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Kornelijus Survila
>Assignee: Kornelijus Survila
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{parquet-format 2.5}} and {{thrift 0.0.4}} bring in outdated versions of 
> third-party crates such as {{byteorder}}, {{ordered-float}}, and 
> {{num-traits}}. Let's update as few of them have reached 1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5181) [Rust] Create Arrow File reader

2019-11-19 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-5181.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 4167
[https://github.com/apache/arrow/pull/4167]

> [Rust] Create Arrow File reader
> ---
>
> Key: ARROW-5181
> URL: https://issues.apache.org/jira/browse/ARROW-5181
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Initial support for reading the Arrow File format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7006) [Rust] Bump flatbuffers version to avoid vulnerability

2019-10-28 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-7006.

Resolution: Fixed

Issue resolved by pull request 5744
[https://github.com/apache/arrow/pull/5744]

> [Rust] Bump flatbuffers version to avoid vulnerability
> --
>
> Key: ARROW-7006
> URL: https://issues.apache.org/jira/browse/ARROW-7006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From GitHub use emilk:
> [{{cargo audit}}|https://github.com/RustSec/cargo-audit] output:
>  
> {{ID:  RUSTSEC-2019-0028
> Crate: flatbuffers
> Version: 0.5.0
> Date:  2019-10-20
> URL:   https://github.com/google/flatbuffers/issues/5530
> Title: Unsound `impl Follow for bool`}}
> The fix should be as simple as editing 
> [https://github.com/apache/arrow/blob/master/rust/arrow/Cargo.toml] from 
> {{flatbuffers = "0.5.0"}} to {{flatbuffers = "0.6.0"}}
> A more longterm improvement is to add a call to {{cargo audit}} in your CI to 
> catch these problems as early as possible
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7006) [Rust] Bump flatbuffers version to avoid vulnerability

2019-10-28 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-7006:
---
  Component/s: Rust
Fix Version/s: 1.0.0

> [Rust] Bump flatbuffers version to avoid vulnerability
> --
>
> Key: ARROW-7006
> URL: https://issues.apache.org/jira/browse/ARROW-7006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> From GitHub use emilk:
> [{{cargo audit}}|https://github.com/RustSec/cargo-audit] output:
>  
> {{ID:  RUSTSEC-2019-0028
> Crate: flatbuffers
> Version: 0.5.0
> Date:  2019-10-20
> URL:   https://github.com/google/flatbuffers/issues/5530
> Title: Unsound `impl Follow for bool`}}
> The fix should be as simple as editing 
> [https://github.com/apache/arrow/blob/master/rust/arrow/Cargo.toml] from 
> {{flatbuffers = "0.5.0"}} to {{flatbuffers = "0.6.0"}}
> A more longterm improvement is to add a call to {{cargo audit}} in your CI to 
> catch these problems as early as possible
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7006) [Rust] Bump flatbuffers version to avoid vulnerability

2019-10-28 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-7006:
--

Assignee: Paddy Horan

> [Rust] Bump flatbuffers version to avoid vulnerability
> --
>
> Key: ARROW-7006
> URL: https://issues.apache.org/jira/browse/ARROW-7006
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>
> From GitHub use emilk:
> [{{cargo audit}}|https://github.com/RustSec/cargo-audit] output:
>  
> {{ID:  RUSTSEC-2019-0028
> Crate: flatbuffers
> Version: 0.5.0
> Date:  2019-10-20
> URL:   https://github.com/google/flatbuffers/issues/5530
> Title: Unsound `impl Follow for bool`}}
> The fix should be as simple as editing 
> [https://github.com/apache/arrow/blob/master/rust/arrow/Cargo.toml] from 
> {{flatbuffers = "0.5.0"}} to {{flatbuffers = "0.6.0"}}
> A more longterm improvement is to add a call to {{cargo audit}} in your CI to 
> catch these problems as early as possible
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7006) [Rust] Bump flatbuffers version to avoid vulnerability

2019-10-28 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-7006:
--

 Summary: [Rust] Bump flatbuffers version to avoid vulnerability
 Key: ARROW-7006
 URL: https://issues.apache.org/jira/browse/ARROW-7006
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Paddy Horan


>From GitHub use emilk:

[{{cargo audit}}|https://github.com/RustSec/cargo-audit] output:

 

{{ID:RUSTSEC-2019-0028
Crate:   flatbuffers
Version: 0.5.0
Date:2019-10-20
URL: https://github.com/google/flatbuffers/issues/5530
Title:   Unsound `impl Follow for bool`}}

The fix should be as simple as editing 
[https://github.com/apache/arrow/blob/master/rust/arrow/Cargo.toml] from 
{{flatbuffers = "0.5.0"}} to {{flatbuffers = "0.6.0"}}

A more longterm improvement is to add a call to {{cargo audit}} in your CI to 
catch these problems as early as possible

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7005) [Rust] run "cargo audit" in CI

2019-10-28 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-7005:
--

 Summary: [Rust] run "cargo audit" in CI
 Key: ARROW-7005
 URL: https://issues.apache.org/jira/browse/ARROW-7005
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-6971) [Rust] Replace "RecordBatchReader" with "BatchIterator"

2019-10-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan closed ARROW-6971.
--
Fix Version/s: (was: 1.0.0)
   Resolution: Not A Bug

> [Rust] Replace "RecordBatchReader" with "BatchIterator"
> ---
>
> Key: ARROW-6971
> URL: https://issues.apache.org/jira/browse/ARROW-6971
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
>
> As part of the recent reader work we introduced 
> {code:java}
> // arrow::record_batch::RecordBatchReader{code}
> but in datafusion we have
> {code:java}
> // datafusion::physical_plan::BatchIterator
> {code}
> These two trait are almost identical (BatchIterator implements Send + Sync 
> whereas RecordBatchReader does not).  I propose we replace RecordBatchReader 
> with BatchIterator (i.e. move it to arrow as it's generally useful outside of 
> datafusion) and update parquet and data fusion accordingly.
> [~andygrove] [~liurenjie1024] do you see any issues with this? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6971) [Rust] Replace "RecordBatchReader" with "BatchIterator"

2019-10-23 Thread Paddy Horan (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958012#comment-16958012
 ] 

Paddy Horan commented on ARROW-6971:


Ahh, ok then.

 

> [Rust] Replace "RecordBatchReader" with "BatchIterator"
> ---
>
> Key: ARROW-6971
> URL: https://issues.apache.org/jira/browse/ARROW-6971
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
> Fix For: 1.0.0
>
>
> As part of the recent reader work we introduced 
> {code:java}
> // arrow::record_batch::RecordBatchReader{code}
> but in datafusion we have
> {code:java}
> // datafusion::physical_plan::BatchIterator
> {code}
> These two trait are almost identical (BatchIterator implements Send + Sync 
> whereas RecordBatchReader does not).  I propose we replace RecordBatchReader 
> with BatchIterator (i.e. move it to arrow as it's generally useful outside of 
> datafusion) and update parquet and data fusion accordingly.
> [~andygrove] [~liurenjie1024] do you see any issues with this? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6971) [Rust] Replace "RecordBatchReader" with "BatchIterator"

2019-10-22 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-6971:
--

 Summary: [Rust] Replace "RecordBatchReader" with "BatchIterator"
 Key: ARROW-6971
 URL: https://issues.apache.org/jira/browse/ARROW-6971
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.15.0
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 1.0.0


As part of the recent reader work we introduced 
{code:java}
// arrow::record_batch::RecordBatchReader{code}
but in datafusion we have
{code:java}
// datafusion::physical_plan::BatchIterator
{code}
These two trait are almost identical (BatchIterator implements Send + Sync 
whereas RecordBatchReader does not).  I propose we replace RecordBatchReader 
with BatchIterator (i.e. move it to arrow as it's generally useful outside of 
datafusion) and update parquet and data fusion accordingly.

[~andygrove] [~liurenjie1024] do you see any issues with this? 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6650) [Rust] [Integration] Create methods to test Arrow files against Integration JSON

2019-10-17 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6650.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5679
[https://github.com/apache/arrow/pull/5679]

> [Rust] [Integration] Create methods to test Arrow files against Integration 
> JSON
> 
>
> Key: ARROW-6650
> URL: https://issues.apache.org/jira/browse/ARROW-6650
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Integration, Rust
>Affects Versions: 0.14.1
>Reporter: Neville Dipale
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [~emkornfi...@gmail.com] recommended that we use the integration IPC files. 
> To be able to compare against the JSON files that are used, we need to be 
> able to generate a JSON represention of Arrow data in Rust.
> We can already do this for schemas, and this ticket is for supporting 
> converting RecordBatch to JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6901) [Rust][Parquet] SerializedFileWriter writes total_num_rows as zero

2019-10-16 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6901.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5672
[https://github.com/apache/arrow/pull/5672]

> [Rust][Parquet] SerializedFileWriter writes total_num_rows as zero
> --
>
> Key: ARROW-6901
> URL: https://issues.apache.org/jira/browse/ARROW-6901
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.14.1, 0.15.0
>Reporter: Matthew Franglen
>Assignee: Matthew Franglen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The SerializedFileWriter does not update total_num_rows at any point. This 
> results in consistently writing zero as the number of rows in the file.
>  
> This code will fail:
> {code:java}
> let data = vec![vec![1, 2, 3, 4, 5]];
> let file: File = ...;
> let schema = Rc::new(
> types::Type::group_type_builder("schema")
> .with_fields( vec![Rc::new(
> types::Type::primitive_type_builder("col1", Type::INT32)
> .with_repetition(Repetition::REQUIRED)
> .build()
> .unwrap(),
> )])
> .build()
> .unwrap(),
> );
> let props = Rc::new(WriterProperties::builder().build());
> let mut file_writer =
> SerializedFileWriter::new(file.try_clone().unwrap(), schema, 
> props).unwrap();
> let mut rows: i64 = 0;
> for subset in  {
> let mut row_group_writer = file_writer.next_row_group().unwrap();
> let col_writer = row_group_writer.next_column().unwrap();
> if let Some(mut writer) = col_writer {
> match writer {
> ColumnWriter::Int32ColumnWriter(ref mut typed) => {
> rows += typed.write_batch([..], None, None).unwrap() 
> as i64;
> }
> _ => {
> unimplemented!();
> }
> }
> row_group_writer.close_column(writer).unwrap();
> }
> file_writer.close_row_group(row_group_writer).unwrap();
> }
> file_writer.close().unwrap();
> let reader = SerializedFileReader::new(file).unwrap();
> assert_eq!(reader.num_row_groups(), data.len());
> assert_eq!(reader.metadata().file_metadata().num_rows(), rows, "row count in 
> metadata not equal to number of rows written");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6901) [Rust][Parquet] SerializedFileWriter writes total_num_rows as zero

2019-10-16 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-6901:
--

Assignee: Matthew Franglen

> [Rust][Parquet] SerializedFileWriter writes total_num_rows as zero
> --
>
> Key: ARROW-6901
> URL: https://issues.apache.org/jira/browse/ARROW-6901
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.14.1, 0.15.0
>Reporter: Matthew Franglen
>Assignee: Matthew Franglen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The SerializedFileWriter does not update total_num_rows at any point. This 
> results in consistently writing zero as the number of rows in the file.
>  
> This code will fail:
> {code:java}
> let data = vec![vec![1, 2, 3, 4, 5]];
> let file: File = ...;
> let schema = Rc::new(
> types::Type::group_type_builder("schema")
> .with_fields( vec![Rc::new(
> types::Type::primitive_type_builder("col1", Type::INT32)
> .with_repetition(Repetition::REQUIRED)
> .build()
> .unwrap(),
> )])
> .build()
> .unwrap(),
> );
> let props = Rc::new(WriterProperties::builder().build());
> let mut file_writer =
> SerializedFileWriter::new(file.try_clone().unwrap(), schema, 
> props).unwrap();
> let mut rows: i64 = 0;
> for subset in  {
> let mut row_group_writer = file_writer.next_row_group().unwrap();
> let col_writer = row_group_writer.next_column().unwrap();
> if let Some(mut writer) = col_writer {
> match writer {
> ColumnWriter::Int32ColumnWriter(ref mut typed) => {
> rows += typed.write_batch([..], None, None).unwrap() 
> as i64;
> }
> _ => {
> unimplemented!();
> }
> }
> row_group_writer.close_column(writer).unwrap();
> }
> file_writer.close_row_group(row_group_writer).unwrap();
> }
> file_writer.close().unwrap();
> let reader = SerializedFileReader::new(file).unwrap();
> assert_eq!(reader.num_row_groups(), data.len());
> assert_eq!(reader.metadata().file_metadata().num_rows(), rows, "row count in 
> metadata not equal to number of rows written");
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6881) [Rust] Remove "array_ops" in favor of the "compute" sub-module

2019-10-14 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-6881:
--

 Summary: [Rust] Remove "array_ops" in favor of the "compute" 
sub-module
 Key: ARROW-6881
 URL: https://issues.apache.org/jira/browse/ARROW-6881
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.15.0
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 1.0.0


Once 4591 (https://issues.apache.org/jira/browse/ARROW-4591) is complete only 
filter and limit will remain in the "array_ops" module and they can be moved 
under the "compute" sub-crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6880) [Rust] Add explicit SIMD for min/max kernel

2019-10-14 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-6880:
--

 Summary: [Rust] Add explicit SIMD for min/max kernel
 Key: ARROW-6880
 URL: https://issues.apache.org/jira/browse/ARROW-6880
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 0.15.0
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6879) [Rust] Add explicit SIMD for sum kernel

2019-10-14 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-6879:
---
Parent: ARROW-4591
Issue Type: Sub-task  (was: Improvement)

> [Rust] Add explicit SIMD for sum kernel
> ---
>
> Key: ARROW-6879
> URL: https://issues.apache.org/jira/browse/ARROW-6879
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Affects Versions: 0.15.0
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6879) [Rust] Add explicit SIMD for sum kernel

2019-10-14 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-6879:
--

 Summary: [Rust] Add explicit SIMD for sum kernel
 Key: ARROW-6879
 URL: https://issues.apache.org/jira/browse/ARROW-6879
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.15.0
Reporter: Paddy Horan
Assignee: Paddy Horan
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6744) [Rust] Export JsonEqual trait in the array module

2019-10-03 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6744.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5549
[https://github.com/apache/arrow/pull/5549]

> [Rust] Export JsonEqual trait in the array module
> -
>
> Key: ARROW-6744
> URL: https://issues.apache.org/jira/browse/ARROW-6744
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Kyle McCarthy
>Assignee: Kyle McCarthy
>Priority: Trivial
>  Labels: easyfix, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ARROW-5901 added checking for array equality with JSON arrays. This added the 
> JsonEqual trait bound to the Array trait but it isn't exported making it 
> private.
> The JsonEqual is a public trait, but the equal module is private and the 
> JsonEqual trait isn't exported like the ArrayEqual trait.
> AFAIK this makes it impossible to implement your own arrays that are bound by 
> the Array trait.
> I suggest that JsonEqual is exported with pub use like the ArrayEqual trait 
> from the array module. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6744) [Rust] Export JsonEqual trait in the array module

2019-10-03 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-6744:
--

Assignee: Kyle McCarthy

> [Rust] Export JsonEqual trait in the array module
> -
>
> Key: ARROW-6744
> URL: https://issues.apache.org/jira/browse/ARROW-6744
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Kyle McCarthy
>Assignee: Kyle McCarthy
>Priority: Trivial
>  Labels: easyfix, pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ARROW-5901 added checking for array equality with JSON arrays. This added the 
> JsonEqual trait bound to the Array trait but it isn't exported making it 
> private.
> The JsonEqual is a public trait, but the equal module is private and the 
> JsonEqual trait isn't exported like the ArrayEqual trait.
> AFAIK this makes it impossible to implement your own arrays that are bound by 
> the Array trait.
> I suggest that JsonEqual is exported with pub use like the ArrayEqual trait 
> from the array module. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6761) [Rust] Travis CI builds not respecting rust-toolchain

2019-10-02 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6761.

Resolution: Fixed

Issue resolved by pull request 5561
[https://github.com/apache/arrow/pull/5561]

> [Rust] Travis CI builds not respecting rust-toolchain
> -
>
> Key: ARROW-6761
> URL: https://issues.apache.org/jira/browse/ARROW-6761
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Travis builds recently started failing with a Rust ICE (Internal Compiler 
> Error) which has been reported to the Rust compiler team 
> ([https://github.com/rust-lang/rust/issues/64908]).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6745) [Rust] Fix a variety of typos

2019-10-01 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6745.

Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 5548
[https://github.com/apache/arrow/pull/5548]

> [Rust] Fix a variety of typos
> -
>
> Key: ARROW-6745
> URL: https://issues.apache.org/jira/browse/ARROW-6745
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Bruce Mitchener
>Assignee: Bruce Mitchener
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-6745) [Rust] Fix a variety of typos

2019-10-01 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan reassigned ARROW-6745:
--

Assignee: Bruce Mitchener

> [Rust] Fix a variety of typos
> -
>
> Key: ARROW-6745
> URL: https://issues.apache.org/jira/browse/ARROW-6745
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Bruce Mitchener
>Assignee: Bruce Mitchener
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6716) [CI] [Rust] New 1.40.0 nightly causing builds to fail

2019-09-27 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6716.

Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5519
[https://github.com/apache/arrow/pull/5519]

> [CI] [Rust] New 1.40.0 nightly causing builds to fail
> -
>
> Key: ARROW-6716
> URL: https://issues.apache.org/jira/browse/ARROW-6716
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: CI, Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> So much for pinning the nightly version ... that doesn't work when there is a 
> new major version of a nightly apparently.
> Travis is now using:
> {code:java}
> rustc 1.40.0-nightly (37538aa13 2019-09-25) {code}
> Despite rust-toolchain containing:
> {code:java}
> nightly-2019-07-30 {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6669) [Rust] [DataFusion] Implement physical expression for binary expressions

2019-09-24 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6669.

Fix Version/s: 0.15.0
   Resolution: Fixed

Issue resolved by pull request 5478
[https://github.com/apache/arrow/pull/5478]

> [Rust] [DataFusion] Implement physical expression for binary expressions
> 
>
> Key: ARROW-6669
> URL: https://issues.apache.org/jira/browse/ARROW-6669
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Implement comparison operators (<, <=, >, >=, =, !=) as well as binary 
> operators AND and OR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6668) [Rust] [DataFusion] Implement CAST expression

2019-09-23 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6668.

Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5477
[https://github.com/apache/arrow/pull/5477]

> [Rust] [DataFusion] Implement CAST expression
> -
>
> Key: ARROW-6668
> URL: https://issues.apache.org/jira/browse/ARROW-6668
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Implement CAST expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6621) [Rust][DataFusion] Examples for DataFusion are not executed in CI

2019-09-22 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6621.

Fix Version/s: (was: 1.0.0)
   0.15.0
   Resolution: Fixed

Issue resolved by pull request 5467
[https://github.com/apache/arrow/pull/5467]

> [Rust][DataFusion] Examples for DataFusion are not executed in CI
> -
>
> Key: ARROW-6621
> URL: https://issues.apache.org/jira/browse/ARROW-6621
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Assignee: Andy Grove
>Priority: Minor
>  Labels: beginner, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See the CI scripts, we already test the examples for the Arrow sub-crate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6660) [Rust] [DataFusion] Minor docs update for 0.15.0 release

2019-09-22 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan resolved ARROW-6660.

Resolution: Fixed

Issue resolved by pull request 5466
[https://github.com/apache/arrow/pull/5466]

> [Rust] [DataFusion] Minor docs update for 0.15.0 release
> 
>
> Key: ARROW-6660
> URL: https://issues.apache.org/jira/browse/ARROW-6660
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Minor docs update for 0.15.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6621) [Rust][DataFusion] Examples for DataFusion are not executed in CI

2019-09-19 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-6621:
--

 Summary: [Rust][DataFusion] Examples for DataFusion are not 
executed in CI
 Key: ARROW-6621
 URL: https://issues.apache.org/jira/browse/ARROW-6621
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.14.1
Reporter: Paddy Horan


See the CI scripts, we already test the examples for the Arrow sub-crate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6409) [Rust] Expand testing of kernels to other numeric types

2019-09-17 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-6409:
---
Fix Version/s: (was: 0.15.0)

> [Rust] Expand testing of kernels to other numeric types
> ---
>
> Key: ARROW-6409
> URL: https://issues.apache.org/jira/browse/ARROW-6409
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Priority: Minor
>
> The testing of compute kernels we normally use a simple example that is 
> specific to a particular numeric type.  We should write a generic helper 
> function and expand testing to all numeric types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6482) [Rust] Investigate enabling features in regex crate to reduce compile times

2019-09-17 Thread Paddy Horan (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-6482:
---
Fix Version/s: (was: 0.15.0)

> [Rust] Investigate enabling features in regex crate to reduce compile times
> ---
>
> Key: ARROW-6482
> URL: https://issues.apache.org/jira/browse/ARROW-6482
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.14.1
>Reporter: Paddy Horan
>Priority: Minor
>  Labels: beginner
>
> The regex crate recently added a feature flag to reduce compile times and 
> binary size if certain unicode related features are not needed.  We should 
> investigate using this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >