[jira] [Created] (ARROW-8959) [Rust] Broken build due to new benchmark crate using old API

2020-05-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-8959:
-

 Summary: [Rust] Broken build due to new benchmark crate using old 
API
 Key: ARROW-8959
 URL: https://issues.apache.org/jira/browse/ARROW-8959
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Broken build due to new benchmark crate using old API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8925) [Rust] [DataFusion] CsvExec::schema() returns incorrect results

2020-05-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-8925:
-

 Summary: [Rust] [DataFusion] CsvExec::schema() returns incorrect 
results
 Key: ARROW-8925
 URL: https://issues.apache.org/jira/browse/ARROW-8925
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


CsvExec::schema() returns then underlying CSV schema and not the projected 
schema. Also, the documentation for the CsvExec schema field is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8910) [Rus5t] [DataFusion] Add support for explicit casts between signed and unsigned ints

2020-05-23 Thread Andy Grove (Jira)
Andy Grove created ARROW-8910:
-

 Summary: [Rus5t] [DataFusion] Add support for explicit casts 
between signed and unsigned ints
 Key: ARROW-8910
 URL: https://issues.apache.org/jira/browse/ARROW-8910
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Add support for explicit casts between signed and unsigned ints.

Note that the type coercion optimizer rule shoud never implicity perform casts 
between types when data would be lost e.g. from negative value to unsigned type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8869) [Rust] [DataFusion] Type Coercion optimizer rule does not support new scan nodes

2020-05-19 Thread Andy Grove (Jira)
Andy Grove created ARROW-8869:
-

 Summary: [Rust] [DataFusion] Type Coercion optimizer rule does not 
support new scan nodes
 Key: ARROW-8869
 URL: https://issues.apache.org/jira/browse/ARROW-8869
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Affects Versions: 1.0.0
Reporter: Andy Grove
 Fix For: 1.0.0


Type Coercion optimizer rule does not support new scan nodes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8859) [Rust] [Integration Testing] Implement --quiet / verbose correctly

2020-05-19 Thread Andy Grove (Jira)
Andy Grove created ARROW-8859:
-

 Summary: [Rust] [Integration Testing] Implement --quiet / verbose 
correctly
 Key: ARROW-8859
 URL: https://issues.apache.org/jira/browse/ARROW-8859
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


The Rust tester has verbose=true hard-coded for now.

{{archery --quiet}}, RustTester should receive a {{quiet: Bool}} via 
[kwargs|https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L335]
 somehwere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8856) [Rust] [Integration Testing] Reading types other than record batches not yet supported

2020-05-18 Thread Andy Grove (Jira)
Andy Grove created ARROW-8856:
-

 Summary: [Rust] [Integration Testing] Reading types other than 
record batches not yet supported
 Key: ARROW-8856
 URL: https://issues.apache.org/jira/browse/ARROW-8856
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


{code:java}
Error: IoError("Reading types other than record batches not yet supported") 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8855) [Rust] [Integration Testing] data type Date32(Day) not supported

2020-05-18 Thread Andy Grove (Jira)
Andy Grove created ARROW-8855:
-

 Summary: [Rust] [Integration Testing] data type Date32(Day) not 
supported
 Key: ARROW-8855
 URL: https://issues.apache.org/jira/browse/ARROW-8855
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


{code:java}
Error: JsonError("data type Date32(Day) not supported") {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8854) [Rust] [Integration Testing] Show output from arrow-json-integration-test

2020-05-18 Thread Andy Grove (Jira)
Andy Grove created ARROW-8854:
-

 Summary: [Rust] [Integration Testing] Show output from 
arrow-json-integration-test
 Key: ARROW-8854
 URL: https://issues.apache.org/jira/browse/ARROW-8854
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Show output from arrow-json-integration-test. It is currently hidden, making it 
hard to debug the test failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8853) [Rust] [Integration Testing] Enable Flight tests

2020-05-18 Thread Andy Grove (Jira)
Andy Grove created ARROW-8853:
-

 Summary: [Rust] [Integration Testing] Enable Flight tests
 Key: ARROW-8853
 URL: https://issues.apache.org/jira/browse/ARROW-8853
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8838) [Rust] File reader fails to read header from valid files

2020-05-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-8838:
-

 Summary: [Rust] File reader fails to read header from valid files
 Key: ARROW-8838
 URL: https://issues.apache.org/jira/browse/ARROW-8838
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


{code:java}
Rust: arrow-file-to-stream; 
args=["/home/andy/git/andygrove/arrow/rust/target/debug/arrow-file-to-stream", 
"/tmp/tmpnq6vtil1/567f8248_generated_primitive_no_batches.json_as_file"]
Reading from Arrow file 
/home/andy/git/andygrove/arrow/rust/target/debug/arrow-file-to-stream
Error: IoError("Arrow file does not contain correct header")
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8837) [Rust] Type Null not supported

2020-05-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-8837:
-

 Summary: [Rust] Type Null not supported
 Key: ARROW-8837
 URL: https://issues.apache.org/jira/browse/ARROW-8837
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove
 Fix For: 1.0.0


{code:java}
thread 'main' panicked at 'not implemented: Type Null not supported', 
arrow/src/ipc/convert.rs:316:14
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8835) [Rust] Implement arrow-stream-to-file for integration testing

2020-05-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-8835:
-

 Summary: [Rust] Implement arrow-stream-to-file for integration 
testing
 Key: ARROW-8835
 URL: https://issues.apache.org/jira/browse/ARROW-8835
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


Implement arrow-stream-to-file for integration testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8834) [Rust] Implement arrow-file-to-stream for integration testing

2020-05-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-8834:
-

 Summary: [Rust] Implement arrow-file-to-stream for integration 
testing
 Key: ARROW-8834
 URL: https://issues.apache.org/jira/browse/ARROW-8834
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


Implement arrow-file-to-stream for integration testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8833) [Rust] Implement VALIDATE mode in integration test binary

2020-05-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-8833:
-

 Summary: [Rust] Implement VALIDATE mode in integration test binary
 Key: ARROW-8833
 URL: https://issues.apache.org/jira/browse/ARROW-8833
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


The binary arrow-json-integration-test.rs already supports converting arrow 
files to json, and json to arrow.

We now need to implement the VALIDATE command, which should read an Arrow and a 
JSON file and ensure that they contain the same batches.

See the Java implementation at 
java/tools/src/main/java/org/apache/arrow/tools/Integration.java for an example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8829) [Rust] Implement SQL parser

2020-05-16 Thread Andy Grove (Jira)
Andy Grove created ARROW-8829:
-

 Summary: [Rust] Implement SQL parser
 Key: ARROW-8829
 URL: https://issues.apache.org/jira/browse/ARROW-8829
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Implement SQL parser that can take a Vec and produce a SQL AST.

We can potentially break this down into separate JIRAs.

It needs to support:
 * Single table SELECT ... FROM
 * WHERE
 * GROUP BY
 * ORDER BY
 * LIMIT

It needs to support the following expressions:
 * Literals (long, string, double)
 * Identifiers
 * Binary expressions
 ** Arithmetic (+, -, *, /, %)
 ** Boolean (AND, OR)
 ** Comparison (=, !=, <, <=, >, >=, <>)
 * Unary boolean expression: NOT
 * CAST(expr AS type)
 * Aliased expressions: expr AS alias



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8828) [Rust] Implement SQL tokenizer

2020-05-16 Thread Andy Grove (Jira)
Andy Grove created ARROW-8828:
-

 Summary: [Rust] Implement SQL tokenizer
 Key: ARROW-8828
 URL: https://issues.apache.org/jira/browse/ARROW-8828
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Implement enum for all supported SQL tokens and implement a tokenizer that can 
tokenize a SQL string and produce a Vec.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8827) [Integration Testing] Initial skeleton for Rust integration tests

2020-05-16 Thread Andy Grove (Jira)
Andy Grove created ARROW-8827:
-

 Summary: [Integration Testing] Initial skeleton for Rust 
integration tests
 Key: ARROW-8827
 URL: https://issues.apache.org/jira/browse/ARROW-8827
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Initial skeleton for Rust integration tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8824) [Rust] [DataFusion] Implement new SQL parser

2020-05-16 Thread Andy Grove (Jira)
Andy Grove created ARROW-8824:
-

 Summary: [Rust] [DataFusion] Implement new SQL parser
 Key: ARROW-8824
 URL: https://issues.apache.org/jira/browse/ARROW-8824
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


We currently depend on the sqlparser crate that I originally created but has 
moved on since the version we use and that project is aiming to support 
multiple SQL dialects and I don't think it is appropriate for what we need in 
DataFusion.

I think it would be better to build a new SQL parser as part of the DataFusion 
crate so that we can more easily maintain it, and it can use Arrow as the 
native type system.

Another option would be to try and donate the sqlparser 0.2.x code base but 
there are a fair number of committers and it is probably easier just to 
implement it from scratch (without referencing the existing code).

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8822) [Rust] [DataFusion] Add MemoryScan variant to LogicalPlan

2020-05-16 Thread Andy Grove (Jira)
Andy Grove created ARROW-8822:
-

 Summary: [Rust] [DataFusion] Add MemoryScan variant to LogicalPlan
 Key: ARROW-8822
 URL: https://issues.apache.org/jira/browse/ARROW-8822
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Allow queries against Vec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8809) [Rust] schema mismatch in integration test

2020-05-14 Thread Andy Grove (Jira)
Andy Grove created ARROW-8809:
-

 Summary: [Rust] schema mismatch in integration test
 Key: ARROW-8809
 URL: https://issues.apache.org/jira/browse/ARROW-8809
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


I apologize for the vagueness here, will flesh out details when I learn more 
but it looks like Rust is specifying an int64 as a 32 bit type somewhere.
{code:java}

diff schema1.txt schema2.txt 
15c15
<  int64_nullable: Int(32,
---
>  int64_nullable: Int(64,
17c17
<  int64_nonnullable: Int(32,
---
>  int64_nonnullable: Int(64,
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8808) [Rust] Divide by zero in arrays/builder.rs

2020-05-14 Thread Andy Grove (Jira)
Andy Grove created ARROW-8808:
-

 Summary: [Rust] Divide by zero in arrays/builder.rs
 Key: ARROW-8808
 URL: https://issues.apache.org/jira/browse/ARROW-8808
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


Integration testing exposed a bug in cases where values_data.len() is zero. 
This fails with divide by zero error.
{code:java}
// check that values_data length is multiple of len
assert!(
values_data.len() / len == self.list_len as usize,
"Values of FixedSizeList must have equal lengths, values have length {} and 
list has {}",
values_data.len(),
len
); {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8807) [Docs] Integration testing instructions for base docker image are incorrect

2020-05-14 Thread Andy Grove (Jira)
Andy Grove created ARROW-8807:
-

 Summary: [Docs] Integration testing instructions for base docker 
image are incorrect
 Key: ARROW-8807
 URL: https://issues.apache.org/jira/browse/ARROW-8807
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andy Grove


The dev/README.md says to build a base image using this command, but the 
referenced file does not exist.
{code:java}
docker build -t arrow_integration_xenial_base -f 
docker_common/Dockerfile.xenial.base . {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8789) [Rust] Add separate crate for integration test binaries

2020-05-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8789:
-

 Summary: [Rust] Add separate crate for integration test binaries
 Key: ARROW-8789
 URL: https://issues.apache.org/jira/browse/ARROW-8789
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Add separate crate for integration test binaries



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8784) [Rust] [DataFusion] Remove use of Arc from LogicalPlan

2020-05-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8784:
-

 Summary: [Rust] [DataFusion] Remove use of Arc from LogicalPlan
 Key: ARROW-8784
 URL: https://issues.apache.org/jira/browse/ARROW-8784
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


The reason that the LogicalPlan currently uses Arc rather than Box is to 
support the ability to pass the logical plan between threads.

I am no longer sure that this is a requirement, and if it is, then it would 
perhaps be better to serialize the plan to JSON using the serde crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8783) [Rust] [DataFusion] Logical plan should have ParquetScan and CsvScan entries

2020-05-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8783:
-

 Summary: [Rust] [DataFusion] Logical plan should have ParquetScan 
and CsvScan entries
 Key: ARROW-8783
 URL: https://issues.apache.org/jira/browse/ARROW-8783
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


The LogicalPlan currently has a TableScan entry which references a Table (any 
logical plan registered with an ExecutionContext) and is often backed by a 
Parquet or CSV data source.

I am finding it increasingly inconvenient that we can't just create a logical 
plan referencing a Parquet or CSV file, without having to create an execution 
context first and register the data sources with it.

This addition will not remove any existing behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8782) [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set

2020-05-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8782:
-

 Summary: [Rust] [DataFusion] Add benchmarks based on NYC Taxi data 
set
 Key: ARROW-8782
 URL: https://issues.apache.org/jira/browse/ARROW-8782
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


I plan on adding a new benchmarks folder beneatch the datafusion crate, 
containing benchmarks based on the NYC Taxi data set. The benchmark will be a 
CLI and will support running a number of different queries against CSV and 
Parquet.

The README will contain instructions for downloading the data set.

The benchmark will produce CSV files containing results.

These benchmarks will allow us to manually verify performance before major 
releases and on an ongoing basis as we make changes to Arrow/Parquet/DataFusion.

I will be basing this on existing benchmarks I recently built in Ballista [1] 
(I am the only contributor to these benchmarks so far).

A dockerfile will be provided, making it easy to restrict CPU and RAM when 
running these benchmarks.

[1] https://github.com/ballista-compute/ballista/tree/master/rust/benchmarks


 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8774) [Rust] [DataFusion] Improve threading model

2020-05-12 Thread Andy Grove (Jira)
Andy Grove created ARROW-8774:
-

 Summary: [Rust] [DataFusion] Improve threading model
 Key: ARROW-8774
 URL: https://issues.apache.org/jira/browse/ARROW-8774
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


DataFusion currently spawns one thread per partition and this results in poor 
performance if there are more partitions than available cores/threads. It would 
be better to have a thread-pool that defaults to number of available cores.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8737) [Rust] [Parquet] Parquet array reader panics

2020-05-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-8737:
-

 Summary: [Rust] [Parquet] Parquet array reader panics
 Key: ARROW-8737
 URL: https://issues.apache.org/jira/browse/ARROW-8737
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Andy Grove


I'm trying to read some parquet files produced by Apache Spark 3.0.0-preview2 
and the parquet crate is panicking. It should at least fail with an Err rather 
than panic.
{code:java}
thread '' panicked at 'index out of bounds: the len is 1024 but the 
index is 1087', 
/home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-0.17.0/src/arrow/record_reader.rs:415:21
stack backtrace:
   0: 0x564dbc25a9d4 - 
backtrace::backtrace::libunwind::trace::hfcd33194db0151d4
   at 
/cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: 0x564dbc25a9d4 - 
backtrace::backtrace::trace_unsynchronized::hfd1904bbbd5335b5
   at 
/cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: 0x564dbc25a9d4 - 
std::sys_common::backtrace::_print_fmt::h8476c57b177b254e
   at src/libstd/sys_common/backtrace.rs:78
   3: 0x564dbc25a9d4 - 
::fmt::h73acbc5f6d4b1044
   at src/libstd/sys_common/backtrace.rs:59
   4: 0x564dbc28727c - core::fmt::write::hdf236390fbd68d3d
   at src/libcore/fmt/mod.rs:1069
   5: 0x564dbc2536c3 - std::io::Write::write_fmt::h5722fa40bb2afafd
   at src/libstd/io/mod.rs:1532
   6: 0x564dbc25d2d5 - std::sys_common::backtrace::_print::ha468e873aada7c78
   at src/libstd/sys_common/backtrace.rs:62
   7: 0x564dbc25d2d5 - std::sys_common::backtrace::print::h149365a2f029de62
   at src/libstd/sys_common/backtrace.rs:49
   8: 0x564dbc25d2d5 - 
std::panicking::default_hook::{{closure}}::hb4a33f9e05934a52
   at src/libstd/panicking.rs:198
   9: 0x564dbc25d012 - std::panicking::default_hook::hc4535d7b0c743abd
   at src/libstd/panicking.rs:218
  10: 0x564dbc25d918 - 
std::panicking::rust_panic_with_hook::haa34a96a6dbd5a2e
   at src/libstd/panicking.rs:477
  11: 0x564dbc25d51b - rust_begin_unwind
   at src/libstd/panicking.rs:385
  12: 0x564dbc285071 - core::panicking::panic_fmt::hd101a87121fa411f
   at src/libcore/panicking.rs:89
  13: 0x564dbc285032 - 
core::panicking::panic_bounds_check::ha0668dcff6357ef4
   at src/libcore/panicking.rs:65
  14: 0x564dbbcdbf46 - 
parquet::arrow::record_reader::RecordReader::read_records::hc8f50faae4afaae7
  15: 0x564dbbc4da98 - 
 as 
parquet::arrow::array_reader::ArrayReader>::next_batch::hb4e5b687cd08ee46
  16: 0x564dbbcca3c9 -  as 
core::iter::traits::iterator::Iterator>::try_fold::h4206004da76eb745
  17: 0x564dbbc51c51 - ::next_batch::hf1c89300e65c72e8
  18: 0x564dbbcacaba - 
::next_batch::ha906d7eb32c7238a
  19: 0x564dbbbe33b8 - 
std::sys_common::backtrace::__rust_begin_short_backtrace::hc2fd908045ecbee0
  20: 0x564dbbb4a7ff - 
core::ops::function::FnOnce::call_once{{vtable.shim}}::h58c848a35fea035b
  21: 0x564dbc264f7a -  as 
core::ops::function::FnOnce>::call_once::ha26a994a135d55de
   at 
/rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034
  22: 0x564dbc264f7a -  as 
core::ops::function::FnOnce>::call_once::h677072ad3ba2806b
   at 
/rustc/1836e3b42a5b2f37fd79104eedbe8f48a5afdee6/src/liballoc/boxed.rs:1034
  23: 0x564dbc264f7a - 
std::sys::unix::thread::Thread::new::thread_start::h7c46ce580f54dd0e
   at src/libstd/sys/unix/thread.rs:87
  24: 0x7f332cf79669 - start_thread
   at 
/build/glibc-t7JzpG/glibc-2.30/nptl/pthread_create.c:479
  25: 0x7f332ce85323 - clone
  26:0x0 - 
Error: DataFusionError(General("Error receiving batch: RecvError"))
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8736) [Rust] [DataFusion] Table API should provide a schema() method

2020-05-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-8736:
-

 Summary: [Rust] [DataFusion] Table API should provide a schema() 
method
 Key: ARROW-8736
 URL: https://issues.apache.org/jira/browse/ARROW-8736
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Table API should provide a schema() method. It is currently not possible to 
examine the schema of a registered table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8735) [Rust] [Parquet] Parquet crate fails to compile on Arm architecture

2020-05-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-8735:
-

 Summary: [Rust] [Parquet] Parquet crate fails to compile on Arm 
architecture
 Key: ARROW-8735
 URL: https://issues.apache.org/jira/browse/ARROW-8735
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Andy Grove
 Fix For: 1.0.0


I'm trying to compile the project in Raspbian, on a Raspberry Pi and the build 
fails:
{code:java}
error[E0308]: mismatched types
  --> /home/pi/git/arrow/rust/parquet/src/util/hash_util.rs:26:37
   |
26 | fn hash_(data: &[u8], seed: u32) -> u32 {
   |-^^^ expected `u32`, found `()`
   ||
   |implicitly returns `()` as its body has no tail or `return` expression
 {code}
This method is only implemented for x86, x86_64 and aarch64.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8730) [Rust] Use slice instead of for function arguments

2020-05-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-8730:
-

 Summary: [Rust] Use slice instead of  for function arguments
 Key: ARROW-8730
 URL: https://issues.apache.org/jira/browse/ARROW-8730
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.18


It is best practice to use slice instead of  for function arguments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8701) [Rust] Unresolved import `crate::compute::util::simd_load_set_invalid` on Raspberry Pi

2020-05-04 Thread Andy Grove (Jira)
Andy Grove created ARROW-8701:
-

 Summary: [Rust] Unresolved import 
`crate::compute::util::simd_load_set_invalid` on Raspberry Pi
 Key: ARROW-8701
 URL: https://issues.apache.org/jira/browse/ARROW-8701
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Andy Grove
Assignee: Andy Grove


I'm trying to run some Rust code that has a dependency on the Arrow 0.17 crates 
and the build fails as follows.
{code:java}
error[E0432]: unresolved import `crate::compute::util::simd_load_set_invalid`
  --> 
/home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-0.17.0/src/compute/kernels/arithmetic.rs:42:5
   |
42 | use crate::compute::util::simd_load_set_invalid;
   | ^^^ no `simd_load_set_invalid` 
in `compute::util`
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8650) [Rust] [Website] Add documentation to Arrow website

2020-04-30 Thread Andy Grove (Jira)
Andy Grove created ARROW-8650:
-

 Summary: [Rust] [Website] Add documentation to Arrow website
 Key: ARROW-8650
 URL: https://issues.apache.org/jira/browse/ARROW-8650
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Website
Reporter: Andy Grove
 Fix For: 1.0.0


The documentation page [1] on the Arrow site has links for C, C++, Java, 
Python, JavaScript, and R. It would be good do add Rust here as well, even if 
the docs here are brief and link to the rustdocs on docs.rs [2] (which are 
currently broken due to ARROW-8536 [3].

 

[1] [https://arrow.apache.org/docs/]

[2] https://docs.rs/crate/arrow/0.17.0

[3] https://issues.apache.org/jira/browse/ARROW-8536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8649) [Java] [Website] Java documentation on website is hidden

2020-04-30 Thread Andy Grove (Jira)
Andy Grove created ARROW-8649:
-

 Summary: [Java] [Website] Java documentation on website is hidden
 Key: ARROW-8649
 URL: https://issues.apache.org/jira/browse/ARROW-8649
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andy Grove
 Fix For: 1.0.0


There is some excellent Java documentation on the web site that is hard to find 
because the Java documentation link  [1] goes straight to the generated 
javadocs.

 

 [1] https://arrow.apache.org/docs/java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8634) [Java] Create an example

2020-04-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8634:
-

 Summary: [Java] Create an example
 Key: ARROW-8634
 URL: https://issues.apache.org/jira/browse/ARROW-8634
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


The Java implementation doesn't seem to have any documentation or examples on 
how to get started with basic operations such as creating an array. Javadocs 
exist but how do new users even know which class to look for?

I would like to create an examples module and one simple example as a starting 
point. I hope to have a PR soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8614) [Website] Create Rust-specific 0.17.0 blog post

2020-04-28 Thread Andy Grove (Jira)
Andy Grove created ARROW-8614:
-

 Summary: [Website] Create Rust-specific 0.17.0 blog post
 Key: ARROW-8614
 URL: https://issues.apache.org/jira/browse/ARROW-8614
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove


I wrote a brief blog post about DataFusion 0.17.0 [1] and Wes suggested that I 
post it on the Arrow blog. We might want to expand this to cover the  Rust 
implementation of Arrow in general and it might be a good opportunity to talk 
about things we want help with for the next release (such as integration 
testing, implementing a parquet writer, etc).

 [1] https://andygrove.io/2020/04/datafusion-0.17.0/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8573) [Rust] Upgrade to Rust 1.44 nightly

2020-04-23 Thread Andy Grove (Jira)
Andy Grove created ARROW-8573:
-

 Summary: [Rust] Upgrade to Rust 1.44 nightly
 Key: ARROW-8573
 URL: https://issues.apache.org/jira/browse/ARROW-8573
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Rust 1.43.0 was just released, so we should update to 1.44 nightly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8536) [Rust] Failed to locate format/Flight.proto in any parent directory

2020-04-20 Thread Andy Grove (Jira)
Andy Grove created ARROW-8536:
-

 Summary: [Rust] Failed to locate format/Flight.proto in any parent 
directory
 Key: ARROW-8536
 URL: https://issues.apache.org/jira/browse/ARROW-8536
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Andy Grove


When using Arrow 0.17.0 as a dependency, it is likely that you will get the 
error "Failed to locate format/Flight.proto in any parent directory".

The workaround is to create a directoy `/format` in the root of your file 
system and place the Flight.proto file there.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8535) [Rust] Fix issues discovered when releasing 0.17.0

2020-04-20 Thread Andy Grove (Jira)
Andy Grove created ARROW-8535:
-

 Summary: [Rust] Fix issues discovered when releasing 0.17.0
 Key: ARROW-8535
 URL: https://issues.apache.org/jira/browse/ARROW-8535
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Issues ...

1) Arrow Cargo.toml does not specify a version for Arrow Flight. This is 
trivial to fix, we just need to add the "version =" part.

2) "Failed to locate format/Flight.proto in any parent directory" when 
publishing Arrow crate
{code:java}
error: failed to run custom build command for `arrow-flight v0.17.0`Caused by:
  process didn't exit successfully: 
`/home/andy/apache-arrow-0.17.0/rust/target/package/arrow-0.17.0/target/debug/build/arrow-flight-33d3d930b565975b/build-script-build`
 (exit code: 1)
--- stderr
Error: "Failed to locate format/Flight.proto in any parent directory"warning: 
build failed, waiting for other jobs to finish...
error: failed to verify package tarballCaused by:
  build failed
 {code}
I'm not sure how to resolve this yet.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8464) [Rust] [DataFusion] Add support for dictionary types

2020-04-14 Thread Andy Grove (Jira)
Andy Grove created ARROW-8464:
-

 Summary: [Rust] [DataFusion] Add support for dictionary types
 Key: ARROW-8464
 URL: https://issues.apache.org/jira/browse/ARROW-8464
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andy Grove


 
 * BatchIterator should accept both DictionaryBatch and RecordBatch
 * Type Coercion optimizer rule should inject expression for converting 
dictionary value types to index types (for equality expressions, and IN(values, 
...)
 * Physical expression would lookup index for dictionary values referenced in 
the query so that at runtime, only indices are being compared per batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8426) [Rust] [Parquet] Add support for writing dictionary types

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8426:
-

 Summary: [Rust] [Parquet] Add support for writing dictionary types
 Key: ARROW-8426
 URL: https://issues.apache.org/jira/browse/ARROW-8426
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8425) [Rust] [Parquet] Add support for writing timestamp types

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8425:
-

 Summary: [Rust] [Parquet] Add support for writing timestamp types
 Key: ARROW-8425
 URL: https://issues.apache.org/jira/browse/ARROW-8425
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8424) [Rust] [Parquet] Add support for writing floating point types

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8424:
-

 Summary: [Rust] [Parquet] Add support for writing floating point 
types
 Key: ARROW-8424
 URL: https://issues.apache.org/jira/browse/ARROW-8424
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8422) [Rust] Implement function to convert Arrow schema to Parquet schema

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8422:
-

 Summary: [Rust] Implement function to convert Arrow schema to 
Parquet schema
 Key: ARROW-8422
 URL: https://issues.apache.org/jira/browse/ARROW-8422
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andy Grove


Implement function to convert Arrow schema to Parquet schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8421) [Rust] [Parquet] Implement parquet writer

2020-04-13 Thread Andy Grove (Jira)
Andy Grove created ARROW-8421:
-

 Summary: [Rust] [Parquet] Implement parquet writer
 Key: ARROW-8421
 URL: https://issues.apache.org/jira/browse/ARROW-8421
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


This is the parent story. See subtasks for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8407) [Rust] Add rustdoc for Dictionary type

2020-04-12 Thread Andy Grove (Jira)
Andy Grove created ARROW-8407:
-

 Summary: [Rust] Add rustdoc for Dictionary type
 Key: ARROW-8407
 URL: https://issues.apache.org/jira/browse/ARROW-8407
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


Add rustdoc for Dictionary type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8366) [Rust] Need to revert recent arrow-flight build change

2020-04-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-8366:
-

 Summary: [Rust] Need to revert recent arrow-flight build change
 Key: ARROW-8366
 URL: https://issues.apache.org/jira/browse/ARROW-8366
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


The PR  [1] merged for ARROW-7794 causes problems with projects that have a 
dependency on this crate where the build.rs code becomes an infinite loop 
looking for a parent directory named "arrow" that doesn't exist.

This PR simply reverts that change. I will need to find a better approach to 
resolving the original issue.

 [1] https://github.com/apache/arrow/pull/6858



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8357) [Rust] [DataFusion] Dockerfile for CLI is missing format dir

2020-04-06 Thread Andy Grove (Jira)
Andy Grove created ARROW-8357:
-

 Summary: [Rust] [DataFusion] Dockerfile for CLI is missing format 
dir
 Key: ARROW-8357
 URL: https://issues.apache.org/jira/browse/ARROW-8357
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


{code:java}
error: failed to run custom build command for `arrow-flight v1.0.0-SNAPSHOT 
(/arrow/rust/arrow-flight)`Caused by:
  process didn't exit successfully: 
`/arrow/rust/target/release/build/arrow-flight-a0fb14daffea70f5/build-script-build`
 (exit code: 1)
--- stderr
Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
directory does not exist.\nCould not make proto path relative: 
../../format/Flight.proto: No such file or directory\n" }warning: build failed, 
waiting for other jobs to finish...
error: failed to compile `datafusion v1.0.0-SNAPSHOT (/arrow/rust/datafusion)`, 
intermediate artifacts can be found at `/arrow/rust/target`Caused by:
  build failed
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8289) Implement Arrow Parquet writer

2020-03-31 Thread Andy Grove (Jira)
Andy Grove created ARROW-8289:
-

 Summary: Implement Arrow Parquet writer
 Key: ARROW-8289
 URL: https://issues.apache.org/jira/browse/ARROW-8289
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Implement an Arrow writer for Parquet so that RecordBatches can be written to a 
Parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8287) [Rust] Arrow examples should use utility to print results

2020-03-31 Thread Andy Grove (Jira)
Andy Grove created ARROW-8287:
-

 Summary: [Rust] Arrow examples should use utility to print results
 Key: ARROW-8287
 URL: https://issues.apache.org/jira/browse/ARROW-8287
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


[https://github.com/apache/arrow/pull/6773] added a utility for printing record 
batches and the DataFusion examples were updated to use this. We should now do 
the same for the Arrow examples. This will require moving the utility method 
from the datafusion crate to the arrow crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8265) [Rust] [DataFusion] Table API collect() should not require context

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8265:
-

 Summary: [Rust] [DataFusion] Table API collect() should not 
require context
 Key: ARROW-8265
 URL: https://issues.apache.org/jira/browse/ARROW-8265
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


The Table API requires the context to be passed into the collect() method which 
leads to this odd code.
{code:java}
let results = ctx.table("alltypes_plain")?
.filter(col("c12").gt(_f64(0.5)))?
.aggregate(vec![col("c1")], vec![min(col("c12"))])?
.collect( ctx, 1024)?; {code}
Since the table comes from the context, it should not be necessary to pass the 
context back in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8264) [Rust] Create utility for printing record batches

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8264:
-

 Summary: [Rust] Create utility for printing record batches
 Key: ARROW-8264
 URL: https://issues.apache.org/jira/browse/ARROW-8264
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


It is too difficult to write examples that print record batches and it would be 
good to have a utility method to print a batch or to get rows from a batch as a 
Vec. We already have code in the CSV writer that could be repurposed.

Another option is to modify the csv writer to be able to print to a string 
rather than a file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8263) [Rust] [DataFusion] Add documentation for supported SQL functions

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8263:
-

 Summary: [Rust] [DataFusion] Add documentation for supported SQL 
functions
 Key: ARROW-8263
 URL: https://issues.apache.org/jira/browse/ARROW-8263
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


Add documentation for supported SQL functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8262) [Rust] [DataFusion] Add example that uses LogicalPlanBuilder

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8262:
-

 Summary: [Rust] [DataFusion] Add example that uses 
LogicalPlanBuilder
 Key: ARROW-8262
 URL: https://issues.apache.org/jira/browse/ARROW-8262
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


Add example that uses LogicalPlanBuilder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8261) [Rust] [DataFusion] LogicalPlanBuilder.limit() should take a literal argument

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8261:
-

 Summary: [Rust] [DataFusion] LogicalPlanBuilder.limit() should 
take a literal argument
 Key: ARROW-8261
 URL: https://issues.apache.org/jira/browse/ARROW-8261
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


LogicalPlanBuilder.limit() should take a literal argument rather than requiring 
an expression representing a literal value, or maybe we have two versions of 
this method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8260) [Rust] [DataFusion] Add validation for unreferenced table in query

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8260:
-

 Summary: [Rust] [DataFusion] Add validation for unreferenced table 
in query
 Key: ARROW-8260
 URL: https://issues.apache.org/jira/browse/ARROW-8260
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andy Grove
 Fix For: 1.0.0


This is an edge case but the query "SELECT 1 FROM t" causes an error in the 
Parquet reader because we are not reading any columns. We should have the query 
planner recognize this and fail the query is invalid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8259) [Rust] [DataFusion] ProjectionPushDownRule does not rewrite LIMIT

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8259:
-

 Summary: [Rust] [DataFusion] ProjectionPushDownRule does not 
rewrite LIMIT
 Key: ARROW-8259
 URL: https://issues.apache.org/jira/browse/ARROW-8259
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


ProjectionPushDownRule does not rewrite LIMIT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8258) [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema error

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8258:
-

 Summary: [Rust] [DataFusion[ SELECT * LIMIT 1 fails with schema 
error
 Key: ARROW-8258
 URL: https://issues.apache.org/jira/browse/ARROW-8258
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


{code:java}
> SELECT * FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected 
Timestamp(Microsecond, None) but found UInt64 at column index 1\")") {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8256) [Rust] [DatFusion] Update CLI documentation for 0.17.0 release

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8256:
-

 Summary: [Rust] [DatFusion] Update CLI documentation for 0.17.0 
release
 Key: ARROW-8256
 URL: https://issues.apache.org/jira/browse/ARROW-8256
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
Assignee: Andy Grove


Update CLI documentation for 0.17.0 release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8255) [Rust] [DataFusion] COUNT(*) results in confusing error

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8255:
-

 Summary: [Rust] [DataFusion] COUNT(*) results in confusing error
 Key: ARROW-8255
 URL: https://issues.apache.org/jira/browse/ARROW-8255
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


COUNT(*) is not supported and results in a confusing error. We should implement 
this support or at least provide an error saying that it isn't supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8254) [Rust] [DataFusion] Cannot run SELECT COUNT(*) against CSV

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8254:
-

 Summary: [Rust] [DataFusion] Cannot run SELECT COUNT(*) against CSV
 Key: ARROW-8254
 URL: https://issues.apache.org/jira/browse/ARROW-8254
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 0.17.0


{code:java}
> SELECT COUNT(*) FROM aggregate_test_100;

ArrowError(InvalidArgumentError("at least one column must be defined to create 
a record batch"))
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8253) [Rust] [DataFusion] Improve ergonomics of registering UDFs

2020-03-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-8253:
-

 Summary: [Rust] [DataFusion] Improve ergonomics of registering UDFs
 Key: ARROW-8253
 URL: https://issues.apache.org/jira/browse/ARROW-8253
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Creating and registering UDFs currently requires quite a lot of boilerplate 
code and it would be good to improve this. See the comments on 
[https://github.com/apache/arrow/pull/6749] for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8249) [Rust] [DataFusion] Make Table and LogicalPlanBuilder APIs more consistent

2020-03-27 Thread Andy Grove (Jira)
Andy Grove created ARROW-8249:
-

 Summary: [Rust] [DataFusion] Make Table and LogicalPlanBuilder 
APIs more consistent
 Key: ARROW-8249
 URL: https://issues.apache.org/jira/browse/ARROW-8249
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


We now have two similar APIs with Table and LogicalPlanBuilder and although 
they are similar, there are some differences and it would be good to unify 
them. There is also code duplication and it most likely makes sense for the 
Table API to delegate to the query builder API to build logical plans.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8243) [Rust] [DataFusion] Fix inconsistent API in LogicalPlanBuilder

2020-03-27 Thread Andy Grove (Jira)
Andy Grove created ARROW-8243:
-

 Summary: [Rust] [DataFusion] Fix inconsistent API in 
LogicalPlanBuilder
 Key: ARROW-8243
 URL: https://issues.apache.org/jira/browse/ARROW-8243
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andy Grove
Assignee: Andy Grove


LogicalPlanBuilder project method takes a  whereas other methods take a 
Vec. It makes sense to take Vec and take ownership of these inputs since they 
are being used to build the plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8241) Add convenience methods to Schema

2020-03-27 Thread Andy Grove (Jira)
Andy Grove created ARROW-8241:
-

 Summary: Add convenience methods to Schema
 Key: ARROW-8241
 URL: https://issues.apache.org/jira/browse/ARROW-8241
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0


I would like to add the following methods to Schema to make it easier to work 
with.

 
{code:java}
pub fn field_with_name(, name: ) -> Result<>;

pub fn index_of(, name: ) -> Result;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8205) [Rust] Arrow should enforce unique field names in a schema

2020-03-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-8205:
-

 Summary: [Rust] Arrow should enforce unique field names in a schema
 Key: ARROW-8205
 URL: https://issues.apache.org/jira/browse/ARROW-8205
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.16.0
Reporter: Andy Grove
 Fix For: 0.17.0


There does not seem to be any validation to avoid schemas being created with 
duplicate field names. We should add this along with unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8204) [Rust] [DataFusdion] Add support for aliased expressions in SQL

2020-03-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-8204:
-

 Summary: [Rust] [DataFusdion] Add support for aliased expressions 
in SQL
 Key: ARROW-8204
 URL: https://issues.apache.org/jira/browse/ARROW-8204
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.17.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8123) [Rust] [DataFusion] Create LogicalPlanBuilder

2020-03-15 Thread Andy Grove (Jira)
Andy Grove created ARROW-8123:
-

 Summary: [Rust] [DataFusion] Create LogicalPlanBuilder
 Key: ARROW-8123
 URL: https://issues.apache.org/jira/browse/ARROW-8123
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Building logical plans is arduous and a builder would make this nicer. Example:
{code:java}
let plan = LogicalPlanBuilder::new()
.scan(
"default",
"employee.csv",
_schema(),
Some(vec![0, 3]),
)?
.filter(col(1).eq(_str("CO")))?
.project(vec![col(0)])?
.build()?; {code}
Note that I am already working on this and will have a PR shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7947) [Rust] [Flight] [DataFusion] Implement example for get_schema

2020-02-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-7947:
-

 Summary: [Rust] [Flight] [DataFusion] Implement example for 
get_schema
 Key: ARROW-7947
 URL: https://issues.apache.org/jira/browse/ARROW-7947
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Implement example for get_schema and implement the required helper methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7941) [Rust] Logical plan should refer to columns by name not index

2020-02-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-7941:
-

 Summary: [Rust] Logical plan should refer to columns by name not 
index
 Key: ARROW-7941
 URL: https://issues.apache.org/jira/browse/ARROW-7941
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.16.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


I made a mistake in the design of the logical plan. It is better to refer to 
columns by name rather than index.

Benefits of making this change:
 * Allows for support for schemaless data sources e.g. JSON
 * Reduces the complexity of the optimizer rules

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7794) [Rust] cargo publish fails for arrow-flight due to relative path to Flight.proto

2020-02-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-7794:
-

 Summary: [Rust] cargo publish fails for arrow-flight due to 
relative path to Flight.proto
 Key: ARROW-7794
 URL: https://issues.apache.org/jira/browse/ARROW-7794
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.16.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Running "cargo publish" for the arrow-flight crate resulted in this error:
{code:java}
error: failed to run custom build command for `arrow-flight v0.16.0 
(/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0)`Caused 
by:
  process didn't exit successfully: 
`/home/andy/apache-arrow-0.16.0/rust/target/package/arrow-flight-0.16.0/target/debug/build/arrow-flight-1b2906a3933d2832/build-script-build`
 (exit code: 1)
--- stderr
Error: Custom { kind: Other, error: "protoc failed: ../../format: warning: 
directory does not exist.\nCould not make proto path relative: 
../../format/Flight.proto: No such file or directory\n" }
 {code}
The workaround was to edit the build.rs and make the path absolute and then run 
"cargo publish --allow-dirty", but we should find a better solution before the 
next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7763) [Java] Update README with instructions for IntelliJ users

2020-02-04 Thread Andy Grove (Jira)
Andy Grove created ARROW-7763:
-

 Summary: [Java] Update README with instructions for IntelliJ users
 Key: ARROW-7763
 URL: https://issues.apache.org/jira/browse/ARROW-7763
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


IntelliJ needs to be configured to use the errorprone compiler and this is not 
currently documented, making it hard for new contributors to build/test the 
project. We can pretty much just link to the instructions at 
https://errorprone.info/docs/installation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7756) [Java] Explore using Avatica as basis for Flight JDBC Driver

2020-02-03 Thread Andy Grove (Jira)
Andy Grove created ARROW-7756:
-

 Summary: [Java] Explore using Avatica as basis for Flight JDBC 
Driver
 Key: ARROW-7756
 URL: https://issues.apache.org/jira/browse/ARROW-7756
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Explore using Avatica as basis for Flight JDBC Driver to see how suitable it is 
compared to building the JDBC driver from the ground up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7744) [Java] Implement Flight JDBC Driver

2020-02-02 Thread Andy Grove (Jira)
Andy Grove created ARROW-7744:
-

 Summary: [Java] Implement Flight JDBC Driver
 Key: ARROW-7744
 URL: https://issues.apache.org/jira/browse/ARROW-7744
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


As a Java developer, I would like the ability to use JDBC to interact with 
Flight servers. For example, there is now an example in the Arrow repo to run a 
Flight server wrapping DataFusion and it supports executing SQL against CSV and 
Parquet files. I would like to be able to call this from Java.

A flight Arrow JDBC driver would also then simplify developing integrations 
with other Apache projects, such as building a Spark V2 Data Source or a Drill 
storage plugin. It would also be directly usable from many BI tools.

I propose that the class name of the driver should be 
"org.apache.arrow.jdbc.Driver" and the connection string should be 
"jdbc:arrow://host:port?[properties]". I'm purposely leaving "flight" out of 
these because I don't think it makes sense to support multiple protocols now 
that we have flight and it is easier for users to remember "arrow" rather than 
needing to know about the protocol. This is easy to change if there are 
objections.

JDBC is designed around sending queries as strings and then receiving results. 
These strings could be SQL queries, JSON-encoded query plans, or something 
else. The JDBC driver will not make any assumptions about the format or dialect 
of these strings. Queries would be executed using the "DoGet" method.

The JDBC metadata functionality for reading schema information could possibly 
use ListFlights but I haven't looked into this part yet.

I do expect that this JDBC driver will serve as a base that could be extended 
to add specific functionality for different Flight servers rather than attempt 
to support them all.

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7684) [Rust] Provide example of Flight server

2020-01-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-7684:
-

 Summary: [Rust] Provide example of Flight server
 Key: ARROW-7684
 URL: https://issues.apache.org/jira/browse/ARROW-7684
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


Now that IPC is in place and we have the Flight crate, it should be possible to 
build a working Flight server in Rust and call it from other languages such as 
Java.

I have started an initial attempt at this in 
[https://github.com/andygrove/datafusion-flight-poc/] with a JVM client and 
Rust server wrapping DataFusion, and this is probably overly complex for an 
example to be included in Rust, but might provide some inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7642) [Rust] Create build.rs to generate flatbuffers files

2020-01-21 Thread Andy Grove (Jira)
Andy Grove created ARROW-7642:
-

 Summary: [Rust] Create build.rs to generate flatbuffers files
 Key: ARROW-7642
 URL: https://issues.apache.org/jira/browse/ARROW-7642
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 1.0.0


We should take the logic from the regen.sh [1] bash script and convert it into 
a Rust build.rs script that can run in CI. This would require flatc to be 
installed to be able to build the project.

 

[1] https://github.com/apache/arrow/blob/master/rust/arrow/regen.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7253) [CI] Fix master failure with release test

2019-11-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-7253:
-

 Summary: [CI] Fix master failure with release test
 Key: ARROW-7253
 URL: https://issues.apache.org/jira/browse/ARROW-7253
 Project: Apache Arrow
  Issue Type: Bug
  Components: CI
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Fix master failure with release test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7249) [CI] Relase test fails in master due to new arrow-flight Rust crate

2019-11-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-7249:
-

 Summary: [CI] Relase test fails in master due to new arrow-flight 
Rust crate
 Key: ARROW-7249
 URL: https://issues.apache.org/jira/browse/ARROW-7249
 Project: Apache Arrow
  Issue Type: Bug
  Components: CI
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


See https://github.com/apache/arrow/runs/318192961



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7192) [Rust] Implemnent Flight crate

2019-11-16 Thread Andy Grove (Jira)
Andy Grove created ARROW-7192:
-

 Summary: [Rust] Implemnent Flight crate
 Key: ARROW-7192
 URL: https://issues.apache.org/jira/browse/ARROW-7192
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove


Implement a flight crate that can run a gRPC server that understands the flight 
protocol. This will be a library and the user will need to provide 
implementations of one or more traits to plug in their logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7003) [Format] [Rust] Generate flatbuffers files in build script

2019-10-27 Thread Andy Grove (Jira)
Andy Grove created ARROW-7003:
-

 Summary: [Format] [Rust] Generate flatbuffers files in build script
 Key: ARROW-7003
 URL: https://issues.apache.org/jira/browse/ARROW-7003
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 1.0.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


We should generate the flatbuffers files rather than check them in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

2019-10-20 Thread Andy Grove (Jira)
Andy Grove created ARROW-6947:
-

 Summary: [Rust] [DataFusion] Add support for scalar UDFs
 Key: ARROW-6947
 URL: https://issues.apache.org/jira/browse/ARROW-6947
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove


As a user, I would like to be able to define my own functions and then use them 
in SQL statements.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6945) [Rust] Enable integration tests

2019-10-19 Thread Andy Grove (Jira)
Andy Grove created ARROW-6945:
-

 Summary: [Rust] Enable integration tests
 Key: ARROW-6945
 URL: https://issues.apache.org/jira/browse/ARROW-6945
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Use docker-compose to generate test files using the Java implementation and 
then have Rust tests read them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6892) [Rust] [DataFusion] Implement optimizer rule to remove redundant projections

2019-10-15 Thread Andy Grove (Jira)
Andy Grove created ARROW-6892:
-

 Summary: [Rust] [DataFusion] Implement optimizer rule to remove 
redundant projections
 Key: ARROW-6892
 URL: https://issues.apache.org/jira/browse/ARROW-6892
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Currently we have code in the SQL query planner that wraps aggregate queries in 
a projection (if needed) to preserve the order of the final results. This is 
needed because the aggregate query execution always returns a result with 
grouping expressions first and then aggregate expressions.

It would be better (simpler, more readable code) to always wrap aggregates in 
projections and have an optimizer rule to remove redundant projections. There 
are likely other use cases where redundant projections might exist too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6891) [Rust] [Parquet] Add Utf8 support to ArrowReader

2019-10-15 Thread Andy Grove (Jira)
Andy Grove created ARROW-6891:
-

 Summary: [Rust] [Parquet] Add Utf8 support to ArrowReader 
 Key: ARROW-6891
 URL: https://issues.apache.org/jira/browse/ARROW-6891
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Affects Versions: 1.0.0
Reporter: Andy Grove
 Fix For: 1.0.0


Add Utf8 support to ArrowReader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6890) [Rust] [Parquet] ArrowReader fails with seg fault

2019-10-15 Thread Andy Grove (Jira)
Andy Grove created ARROW-6890:
-

 Summary: [Rust] [Parquet] ArrowReader fails with seg fault
 Key: ARROW-6890
 URL: https://issues.apache.org/jira/browse/ARROW-6890
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 1.0.0
Reporter: Andy Grove
 Fix For: 1.0.0


ArrowReader fails with seg fault when trying to read an unsupported type, like 
Utf8. We should have it return an Err instead of causing a segmentation fault.

 

See [https://github.com/apache/arrow/pull/5641] for a reproducible test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6803) [Rust] [DataFusion] Aggregate queries are slower with new physical query plan

2019-10-07 Thread Andy Grove (Jira)
Andy Grove created ARROW-6803:
-

 Summary: [Rust] [DataFusion] Aggregate queries are slower with new 
physical query plan
 Key: ARROW-6803
 URL: https://issues.apache.org/jira/browse/ARROW-6803
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Affects Versions: 1.0.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


executing direct from logical plan:
{code:java}
 aggregate_query_no_group_by
 
time:   [13.096 us 13.187 us 13.294 us]
change: [-88.712% -88.554% -88.398%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mildaggregate_query_group_by   
  
time:   [44.153 us 44.816 us 45.541 us]
change: [-77.984% -77.485% -77.009%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mildaggregate_query_group_by_with_filter   
 
time:   [75.383 us 76.076 us 76.817 us]
change: [-72.345% -71.811% -71.097%] (p = 0.00 < 0.05)
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
{code}
executing from physical plan:
{code:java}
aggregate_query_no_group_by 
   
time:   [112.13 us 113.63 us 115.26 us]
change: [-3.8005% -2.0342% -0.3584%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mildaggregate_query_group_by   
 
time:   [195.12 us 198.63 us 202.39 us]
change: [-1.3814% +1.0612% +3.5732%] (p = 0.40 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severeaggregate_query_group_by_with_filter 
   
time:   [270.69 us 272.18 us 273.63 us]
change: [-2.1583% -0.4877% +1.0161%] (p = 0.56 > 0.05)
No change in performance detected.
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6801) [Rust] Arrow source release tarball is missing benchmarks

2019-10-06 Thread Andy Grove (Jira)
Andy Grove created ARROW-6801:
-

 Summary: [Rust] Arrow source release tarball is missing benchmarks
 Key: ARROW-6801
 URL: https://issues.apache.org/jira/browse/ARROW-6801
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.15.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Arrow source tarball is missing benchmarks but references the benchmarks in the 
Cargo.toml causing the release script to fail. We need to include the 
benchmarks in the source tarball in the future to avoid having to manually 
release the crates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6798) [CI] [Rust] Improve build times by caching dependencies in the Docker image

2019-10-06 Thread Andy Grove (Jira)
Andy Grove created ARROW-6798:
-

 Summary: [CI] [Rust] Improve build times by caching dependencies 
in the Docker image
 Key: ARROW-6798
 URL: https://issues.apache.org/jira/browse/ARROW-6798
 Project: Apache Arrow
  Issue Type: Test
  Components: CI, Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Improve Rust build times by caching dependencies in the Docker image



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6761) [Rust] Travis builds failing due to Rust Internal Compiler Error

2019-10-01 Thread Andy Grove (Jira)
Andy Grove created ARROW-6761:
-

 Summary: [Rust] Travis builds failing due to Rust Internal 
Compiler Error
 Key: ARROW-6761
 URL: https://issues.apache.org/jira/browse/ARROW-6761
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 1.0.0
Reporter: Andy Grove
 Fix For: 1.0.0


Travis builds recently started failing with a Rust ICE (Internal Compiler 
Error) which has been reported to the Rust compiler team 
([https://github.com/rust-lang/rust/issues/64908]).

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6736) [Rust] [DataFusion] Aggregate expressions get evaluated repeatedly

2019-09-29 Thread Andy Grove (Jira)
Andy Grove created ARROW-6736:
-

 Summary: [Rust] [DataFusion] Aggregate expressions get evaluated 
repeatedly
 Key: ARROW-6736
 URL: https://issues.apache.org/jira/browse/ARROW-6736
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Affects Versions: 0.15.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


There is a design flaw in the new aggregate expression traits and 
implementations where the input to the aggregate expression gets evaluated 
against the whole batch once for each row in the batch. For example, if the 
batch has 1024 rows then the expression gets evaluated 1024 times instead of 
once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6731) [CI] [Rust] Set up Github Action to run Rust tests

2019-09-27 Thread Andy Grove (Jira)
Andy Grove created ARROW-6731:
-

 Summary: [CI] [Rust] Set up Github Action to run Rust tests
 Key: ARROW-6731
 URL: https://issues.apache.org/jira/browse/ARROW-6731
 Project: Apache Arrow
  Issue Type: Improvement
  Components: CI, Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


Set up Github Action to run Rust tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6718) [Rust] packed_simd requires nightly

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6718:
-

 Summary: [Rust] packed_simd requires nightly 
 Key: ARROW-6718
 URL: https://issues.apache.org/jira/browse/ARROW-6718
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Andy Grove


{code:java}
error[E0554]: `#![feature]` may not be used on the stable release channel
   --> 
/home/andy/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd-0.3.3/src/lib.rs:202:1
|
202 | / #![feature(
203 | | repr_simd,
204 | | const_fn,
205 | | platform_intrinsics,
...   |
215 | | custom_inner_attributes
216 | | )]
| |__^
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6717) Support stable Rust

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6717:
-

 Summary: Support stable Rust
 Key: ARROW-6717
 URL: https://issues.apache.org/jira/browse/ARROW-6717
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove


I'm creating this issue to track all the stories we need to implement to be 
able to use stable Rust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6716) [CI] [Rust] New 1.40.0 nightly causing builds to fail

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6716:
-

 Summary: [CI] [Rust] New 1.40.0 nightly causing builds to fail
 Key: ARROW-6716
 URL: https://issues.apache.org/jira/browse/ARROW-6716
 Project: Apache Arrow
  Issue Type: Bug
  Components: CI, Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


So much for pinning the nightly version ... that doesn't work when there is a 
new major version of a nightly apparently.

Travis is now using:
rustc 1.40.0-nightly (37538aa13 2019-09-25)
Despite rust-toolchain containing:
{code:java}
nightly-2019-07-30 {code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6706) [Developer Tools] Cannot merge PRs from authors with "Á" (U+00C1) in their name

2019-09-26 Thread Andy Grove (Jira)
Andy Grove created ARROW-6706:
-

 Summary: [Developer Tools] Cannot merge PRs from authors with "Á" 
(U+00C1) in their name
 Key: ARROW-6706
 URL: https://issues.apache.org/jira/browse/ARROW-6706
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Andy Grove


I tried merging a PR from Ádám Lippai ([https://github.com/alippai)] and the 
merge script failed with:


 
{code:java}
./dev/merge_arrow_pr.py 
ARROW_HOME = /home/andy/git/andygrove/arrow/dev
PROJECT_NAME = arrow
Which pull request would you like to merge? (e.g. 34): 5499
Env APACHE_JIRA_USERNAME not set, please enter your JIRA username:andygrove
Env APACHE_JIRA_PASSWORD not set, please enter your JIRA password:=== Pull 
Request #5499 ===
title   ARROW-6705: [Rust] [DataFusion] README has invalid github URL
source  alippai/patch-1
target  master
url https://api.github.com/repos/apache/arrow/pulls/5499
=== JIRA ARROW-6705 ===
Summary [Rust] [DataFusion] README has invalid github URL
AssigneeNOT ASSIGNED!!!
Components  Rust
Status  Open
URL https://issues.apache.org/jira/browse/ARROW-6705Proceed with 
merging pull request #5499? (y/n): y
Switched to branch 'PR_TOOL_MERGE_PR_5499_MASTER'
Automatic merge went well; stopped before committing as requested
Traceback (most recent call last):
  File "./dev/merge_arrow_pr.py", line 571, in 
cli()
  File "./dev/merge_arrow_pr.py", line 556, in cli
pr.merge()
  File "./dev/merge_arrow_pr.py", line 354, in merge
print("Author {}: {}".format(i + 1, author))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 0: 
ordinal not in range(128)
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6700) [Rust] [DataFusion] Use new parquet arrow reader

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6700:
-

 Summary: [Rust] [DataFusion] Use new parquet arrow reader
 Key: ARROW-6700
 URL: https://issues.apache.org/jira/browse/ARROW-6700
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Once [https://github.com/apache/arrow/pull/5378] is merged, DataFusion should 
be updated to use this new array reader support instead of the current parquet 
reader code in the DataFusion crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6697) [Rust] [DataFusion] Validate that all parquet partitions have the same schema

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6697:
-

 Summary: [Rust] [DataFusion] Validate that all parquet partitions 
have the same schema
 Key: ARROW-6697
 URL: https://issues.apache.org/jira/browse/ARROW-6697
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


When reading a partitioned Parquet file in DataFusion, the schema is read from 
the first partition and it is assumed that all other partitions have the same 
schema.

It would be better to actually validate that all of the partitions have the 
same schema since there is no support for schema merging yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6696) [Rust] [DataFusion] Implement simple math operations in physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6696:
-

 Summary: [Rust] [DataFusion] Implement simple math operations in 
physical query plan
 Key: ARROW-6696
 URL: https://issues.apache.org/jira/browse/ARROW-6696
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Update BinaryExpr to support simple math operations such as +, -, *, / using 
compute kernels where possible.

See the original implementation when executing directly from the logical plan 
for inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6695) [Rust] [DataFusion] Remove execution of logical plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6695:
-

 Summary: [Rust] [DataFusion] Remove execution of logical plan
 Key: ARROW-6695
 URL: https://issues.apache.org/jira/browse/ARROW-6695
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Remove execution of logical plan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6693) [Rust] [DataFusion] Update unit tests to use physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6693:
-

 Summary: [Rust] [DataFusion] Update unit tests to use physical 
query plan
 Key: ARROW-6693
 URL: https://issues.apache.org/jira/browse/ARROW-6693
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Update unit tests to use physical query plan (once all features are supported)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >