[jira] [Created] (ARROW-17909) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part 2: Encoding Structs and Lists

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17909:
---

 Summary: [Website] Arbitrarily Nested Data in Parquet and Arrow: 
Part 2: Encoding Structs and Lists
 Key: ARROW-17909
 URL: https://issues.apache.org/jira/browse/ARROW-17909
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17910) [Website] Arbitrarily Nested Data in Parquet and Arrow: Part

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17910:
---

 Summary: [Website] Arbitrarily Nested Data in Parquet and Arrow: 
Part 
 Key: ARROW-17910
 URL: https://issues.apache.org/jira/browse/ARROW-17910
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17908) [Website] Arbitrarily Nested Data in Parqet and Arrow: Part 1: Introduction

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17908:
---

 Summary: [Website] Arbitrarily Nested Data in Parqet and Arrow: 
Part 1: Introduction
 Key: ARROW-17908
 URL: https://issues.apache.org/jira/browse/ARROW-17908
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17907) [Website] Blog about Arrow <--> Parquet translation and structured representation

2022-10-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-17907:
---

 Summary: [Website] Blog about Arrow <--> Parquet translation and 
structured representation 
 Key: ARROW-17907
 URL: https://issues.apache.org/jira/browse/ARROW-17907
 Project: Apache Arrow
  Issue Type: Task
Reporter: Andrew Lamb
Assignee: Andrew Lamb


@tustvold has spent a significant amount of time fixing the Rust implementation 
of the parquet <–> arrow conversion logic for all the corner cases of nulls, 
etc. 

 

During that process, he observed there was a relative lack of information on 
the topic to be found, so we would like to write some blog posts to remedy that 
and explain the format and parquet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16846) [Rust] Write blog post with Rust release highlights

2022-06-16 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-16846:
---

 Summary: [Rust] Write blog post with Rust release highlights
 Key: ARROW-16846
 URL: https://issues.apache.org/jira/browse/ARROW-16846
 Project: Apache Arrow
  Issue Type: Task
  Components: Website
Reporter: Andrew Lamb
Assignee: Andrew Lamb


See details here 

https://github.com/apache/arrow-rs/issues/1808

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-15902) [Website] Add Add new committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, Kun Liu

2022-03-10 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-15902:
---

 Summary: [Website] Add Add new committers: Raphael Taylor-Davies, 
Wang Xudong, Yijie Shen, Kun Liu
 Key: ARROW-15902
 URL: https://issues.apache.org/jira/browse/ARROW-15902
 Project: Apache Arrow
  Issue Type: Task
  Components: Website
Reporter: Andrew Lamb


 

Reference: [https://lists.apache.org/thread/n26odmwlv7vgxvp9xboql0txk00nyypx]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15683) [Rust] [DataFusion] Make a 7.0.0 release announcement blog

2022-02-14 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-15683:
---

 Summary: [Rust] [DataFusion] Make a 7.0.0 release announcement blog
 Key: ARROW-15683
 URL: https://issues.apache.org/jira/browse/ARROW-15683
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust, Rust - DataFusion, Website
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15675) [Rust] Blog post for versions 7-9

2022-02-14 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-15675:
---

 Summary: [Rust] Blog post for versions 7-9
 Key: ARROW-15675
 URL: https://issues.apache.org/jira/browse/ARROW-15675
 Project: Apache Arrow
  Issue Type: Task
  Components: Website
Reporter: Andrew Lamb


It would be good to tell the world about the progress we have made



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-12427) [Rust][DataFusion] Reenable physical_optimizer::repartition::Repartition;

2021-04-16 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12427:
---

 Summary: [Rust][DataFusion] Reenable 
physical_optimizer::repartition::Repartition;
 Key: ARROW-12427
 URL: https://issues.apache.org/jira/browse/ARROW-12427
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


To fix https://issues.apache.org/jira/browse/ARROW-12421

We disabled the physical_optimizer::repartition::Repartition rule in 
https://github.com/apache/arrow/pull/10069


this ticket tracks finding the root cause of the CI test failure and reenabing 
physical_optimizer::repartition::Repartition;




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12411) [Rust] Add Builder interface for adding Arrays to record batches

2021-04-15 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12411:
---

 Summary: [Rust] Add Builder interface for adding Arrays to record 
batches
 Key: ARROW-12411
 URL: https://issues.apache.org/jira/browse/ARROW-12411
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb
Assignee: Andrew Lamb



Use case:

While writing tests (both in IOx and in DataFusion) where I need a single 
`RecordBatch`, I often find myself doing something like this:

```
let schema = Arc::new(Schema::new(vec![
ArrowField::new("float_field", ArrowDataType::Float64, true),
ArrowField::new("time", ArrowDataType::Int64, true),
]));

let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 
20.1, 30.1, 40.1]));
let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 
2000, 3000, 4000]));

let batch = RecordBatch::try_new(schema, vec![float_array, 
timestamp_array])
.expect("created new record batch");
```

This is annoying because the information that `float_field` is a float is 
encoded both in the Schema and the `Float64Array`

I would much rather rather be able to construct RecordBatches a a builder style 
to avoid the the redundancy and reduce the amount of typing / redundancy:


```

let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 
20.1, 30.1, 40.1]));
let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 
2000, 3000, 4000]));

let batch = RecordBatch::empty()
  .append("float_field", timestamp_array).unwrap()
  .append("time", float_array).unwrap;

```

The proposal is to add a method to `RecordBatch` like

```
impl RecordBatch {
...
  fn append(self, field_name: , field_values: ArrayRef) -> Result
}
```

That would append the a field name to the current schema, returning an error if 
field_name was already present.

The nullability of the field would be set based on the actual null count of the 
field_values




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12397) [Rust] [DataFusion] Simplify readme example #10038

2021-04-15 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12397:
---

 Summary: [Rust] [DataFusion] Simplify readme example #10038
 Key: ARROW-12397
 URL: https://issues.apache.org/jira/browse/ARROW-12397
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb


MINOR: [Rust] [DataFusion] Simplify readme example #10038




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12339) [Rust][DataFusion] COUNT DISTINCT does not support for `Boolean`

2021-04-12 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12339:
---

 Summary: [Rust][DataFusion] COUNT DISTINCT does not support for 
`Boolean`
 Key: ARROW-12339
 URL: https://issues.apache.org/jira/browse/ARROW-12339
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb


If you try to run a `COUNT (DISTINCT ..)` query on a float column you get the 
following error:

thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
datafusion/src/scalar.rs:342:22

Reproducer:
{code}
 echo "foo,1.23" > /tmp/foo.csv
 ./target/debug/datafusion-cli

> CREATE EXTERNAL TABLE t (a varchar, b float) STORED AS CSV LOCATION 
> '/tmp/foo.csv';
0 rows in set. Query took 0 seconds.
> select count(distinct a) from t;
+---+
| COUNT(DISTINCT a) |
+---+
| 1 |
+---+
1 rows in set. Query took 0 seconds.
> select count(distinct b) from t;
thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
datafusion/src/scalar.rs:342:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
ArrowError(ExternalError(Canceled))
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12319) [Rust][DataFusion] Improve the errors that result when a aggregate type is not supported

2021-04-09 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12319:
---

 Summary: [Rust][DataFusion] Improve the errors that result when a 
aggregate type is not supported
 Key: ARROW-12319
 URL: https://issues.apache.org/jira/browse/ARROW-12319
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb



When you try and run a query such as

{code}
select AVG(ts_colum) from t;
{code}

where ts_column has `DataType::Timestamp` type, you get a pretty unintelligible 
error message

"Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, [Int8, 
Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) failed."

This error should be improved to say something more like AVG is not supported 
for {datatype} try an explicit cast.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12318) [Rust][DataFusion] Add support for AVG(Timestamp) types

2021-04-09 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12318:
---

 Summary: [Rust][DataFusion] Add support for AVG(Timestamp) types
 Key: ARROW-12318
 URL: https://issues.apache.org/jira/browse/ARROW-12318
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb


This is a follow on to ARROW-12277

Background: Support for Min/Max/Sum/Count were added for DataType::Timestamp(*) 
types in https://github.com/apache/arrow/pull/9970.

This ticket tracks adding support for Avg, which is slightly more involved as 
currently Avg assumes the output type is always F64, and in this case I think 
Avg(timestamp) should also be (timestamp). We should double check what postgres 
does in this case.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12317) [Rust] JSON writer does not support time, date or interval types

2021-04-09 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12317:
---

 Summary: [Rust] JSON writer does not support time, date or 
interval types
 Key: ARROW-12317
 URL: https://issues.apache.org/jira/browse/ARROW-12317
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb


While working on https://issues.apache.org/jira/browse/ARROW-12267 , adding 
support for writing Timestamp types, I noticed we were also lacking support for 
other time types. Specifically, if you try to write an array with any of the 
following types as JSON it will panic:

An example of adding support for timestamps is on 
https://github.com/apache/arrow/pull/9968

```
pub type Date32Array = PrimitiveArray;
pub type Date64Array = PrimitiveArray;
pub type Time32SecondArray = PrimitiveArray;
pub type Time32MillisecondArray = PrimitiveArray;
pub type Time64MicrosecondArray = PrimitiveArray;
pub type Time64NanosecondArray = PrimitiveArray;
pub type IntervalYearMonthArray = PrimitiveArray;
pub type IntervalDayTimeArray = PrimitiveArray;
pub type DurationSecondArray = PrimitiveArray;
pub type DurationMillisecondArray = PrimitiveArray;
pub type DurationMicrosecondArray = PrimitiveArray;
pub type DurationNanosecondArray = PrimitiveArray;
```





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12312) [Rust][DataFusion] COUNT DISTINCT not support for `Float64`

2021-04-09 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12312:
---

 Summary: [Rust][DataFusion] COUNT DISTINCT not support for 
`Float64`
 Key: ARROW-12312
 URL: https://issues.apache.org/jira/browse/ARROW-12312
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Andrew Lamb


If you try to run a `COUNT (DISTINCT ..)` query on a float column you get the 
following error:

thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
datafusion/src/scalar.rs:342:22

Reproducer:
{code}
 echo "foo,1.23" > /tmp/foo.csv
 ./target/debug/datafusion-cli

> CREATE EXTERNAL TABLE t (a varchar, b float) STORED AS CSV LOCATION 
> '/tmp/foo.csv';
0 rows in set. Query took 0 seconds.
> select count(distinct a) from t;
+---+
| COUNT(DISTINCT a) |
+---+
| 1 |
+---+
1 rows in set. Query took 0 seconds.
> select count(distinct b) from t;
thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', 
datafusion/src/scalar.rs:342:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
ArrowError(ExternalError(Canceled))
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12278) [Rust][DataFusion]Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

2021-04-07 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12278:
---

 Summary: [Rust][DataFusion]Use Timestamp(Nanosecond, None) for SQL 
TIMESTAMP Type
 Key: ARROW-12278
 URL: https://issues.apache.org/jira/browse/ARROW-12278
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb
Assignee: Andrew Lamb


# Rationale
Running the query `CREATE EXTERNAL TABLE .. (c TIMESTAMP)` today in DataFusion 
will result in a data type pf "Date64" which means that anything more specific 
than the date will be ignored.

This leads to strange behavior such as

{code}
echo "Jorge,2018-12-13T12:12:10.011" >> /tmp/foo.csv
echo "Andrew,2018-11-13T17:11:10.011" > /tmp/foo.csv

cargo run -p datafusion --bin datafusion-cli
Finished dev [unoptimized + debuginfo] target(s) in 0.23s
 Running `target/debug/datafusion-cli`
> CREATE EXTERNAL TABLE t(a varchar, b TIMESTAMP)
STORED AS CSV
LOCATION '/tmp/foo.csv';

0 rows in set. Query took 0 seconds.
> select * from t;
+++
| a  | b  |
+++
| Andrew | 2018-11-13 |
| Jorge  | 2018-12-13 |
+++
{code}

(note how it is only a date, not a timestamp)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12277) [Rust][DataFusion] Aggregates are not supported for timestamp types

2021-04-07 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12277:
---

 Summary: [Rust][DataFusion] Aggregates are not supported for 
timestamp types
 Key: ARROW-12277
 URL: https://issues.apache.org/jira/browse/ARROW-12277
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


If you try and aggregate (via SUM, for example) a column of a timestamp type, 
it generates an error:
```
Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, [Int8, 
Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) failed.
```

For example:

{code}
> show columns from t;
+---+--++-+-+-+
| table_catalog | table_schema | table_name | column_name | data_type   
| is_nullable |
+---+--++-+-+-+
| datafusion| public   | t  | a   | Utf8
| NO  |
| datafusion| public   | t  | b   | 
Timestamp(Nanosecond, None) | NO  |
+---+--++-+-+-+
2 row in set. Query took 0 seconds.
> select sum(b) from t;
Plan("Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, 
[Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) 
failed.")
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12267) [Rust] JSON writer does not support timestamp types

2021-04-07 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12267:
---

 Summary: [Rust] JSON writer does not support timestamp types
 Key: ARROW-12267
 URL: https://issues.apache.org/jira/browse/ARROW-12267
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb
Assignee: Andrew Lamb


Looks like the json writer.rs code in arrow doesn't support writing out 
timestamps. When I tried to write out a `TimestampNanosecondArray` I got the 
following error:

```
thread 'influxdb_ioxd::http::tests::test_query_json' panicked at 'Unsupported 
datatype: Timestamp(
Nanosecond,
None,
)', 
/Users/alamb/.cargo/git/checkouts/arrow-3a9cfebb6b7b2bdc/3e825a7/rust/arrow/src/json/writer.rs:326:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12254) [Rust][DataFusion] Limit keeps polling input after limit is reached

2021-04-07 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12254:
---

 Summary: [Rust][DataFusion] Limit keeps polling input after limit 
is reached
 Key: ARROW-12254
 URL: https://issues.apache.org/jira/browse/ARROW-12254
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb
Assignee: Andrew Lamb


Once the number of rows needed for a limit query has been produced, any further 
work done to read values from its input is wasted.

The current implementation of LimitStream will keep polling its input for the 
next value, and returning Poll::Ready(None) , even once the limit has been 
reached

This is wasteful




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12235) [Rust][DataFusion] LIMIT returns incorrect results when used with several small partitions

2021-04-06 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12235:
---

 Summary: [Rust][DataFusion] LIMIT returns incorrect results when 
used with several small partitions
 Key: ARROW-12235
 URL: https://issues.apache.org/jira/browse/ARROW-12235
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Andrew Lamb
Assignee: Andrew Lamb


I noticed when I was running some queries locally that `LIMIT` was not behaving 
correctly. For my case, a query with `LIMIT 10` was always returning zero rows. 

I spent some time and I have found a self contained reproducer. If you put the 
following test in `rust/src/datafusion/execution/context.rs` it will fail.


{code}


/// Return a RecordBatch with a single Int32 array with values (0..sz)
fn make_partition(sz: i32) -> RecordBatch {
let seq_start = 0;
let seq_end =  sz;
let values = (seq_start..seq_end).collect::>();
let schema = Arc::new(Schema::new(vec![Field::new("i", DataType::Int32, 
true)]));
let arr = Arc::new(Int32Array::from(values));
let arr = arr as ArrayRef;

RecordBatch::try_new(schema.clone(),vec![arr]).unwrap()
}

#[tokio::test]
async fn limit_multi_partitions() -> Result<()> {
let tmp_dir = TempDir::new()?;
let mut ctx = create_ctx(_dir, 1)?;

let partitions = vec![
vec![make_partition(0)],
vec![make_partition(1)],
vec![make_partition(2)],
vec![make_partition(3)],
vec![make_partition(4)],
vec![make_partition(5)],
];
let schema = partitions[0][0].schema();
let provider = Arc::new(MemTable::try_new(schema, partitions).unwrap());

ctx.register_table("t", provider)
.unwrap();

// select all rows
let results = plan_and_collect( ctx, "SELECT i FROM t")
.await
.unwrap();

let num_rows: usize = results.into_iter().map(|b| b.num_rows()).sum();
assert_eq!(num_rows, 15);

for limit in 1..10 {
let query = format!("SELECT i FROM t limit {}", limit);
let results = plan_and_collect( ctx, )
.await
.unwrap();

let num_rows: usize = results.into_iter().map(|b| 
b.num_rows()).sum();
assert_eq!(num_rows, limit, "mismatch with query {}", query);
}

Ok(())
}

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12234) [Rust][DataFusion] Can't subtract timestamps

2021-04-06 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12234:
---

 Summary: [Rust][DataFusion] Can't subtract timestamps
 Key: ARROW-12234
 URL: https://issues.apache.org/jira/browse/ARROW-12234
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb



I have two columns, time_of_last_write, and time_of_first_write, and that have 
type `Timestamp(Nanosecond, None)`

When I try to subtract them I get an error that there isn't a common type to 
coerce the types to:

{code}
> select id, partition_key, storage, estimated_bytes, time_of_last_write - 
> time_of_first_write as time_open from chunks where database_name = 
> '844910ece80be8bc_7be09b71c487d5d3' order by id;
Plan("\'Timestamp(Nanosecond, None) - Timestamp(Nanosecond, None)\' can\'t be 
evaluated because there isn\'t a common type to coerce the types to")
> 
{code}


Expected behavior: The query works (the resulting column should be a duration)

The data looks like this:
{code}
> select * from chunks where database_name = 
> '844910ece80be8bc_7be09b71c487d5d3' order by id;
+---+-+-+-+-+---+---+---+
| database_name | id  | partition_key   | storage   
  | estimated_bytes | time_of_first_write   | time_of_last_write
| time_closing  |
+---+-+-+-+-+---+---+---+
| 844910ece80be8bc_7be09b71c487d5d3 | 452 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 10746690| 2021-04-06 18:46:52.356380931 | 
2021-04-06 18:47:09.065541747 | 2021-04-06 18:47:09.098939917 |
| 844910ece80be8bc_7be09b71c487d5d3 | 453 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11248853| 2021-04-06 18:47:09.495662420 | 
2021-04-06 18:47:13.032639050 | 2021-04-06 18:47:13.058829814 |
| 844910ece80be8bc_7be09b71c487d5d3 | 454 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11249404| 2021-04-06 18:47:13.594526676 | 
2021-04-06 18:47:16.697048218 | 2021-04-06 18:47:16.723124402 |
| 844910ece80be8bc_7be09b71c487d5d3 | 455 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11248972| 2021-04-06 18:47:17.128724226 | 
2021-04-06 18:47:20.055123319 | 2021-04-06 18:47:20.081196973 |
| 844910ece80be8bc_7be09b71c487d5d3 | 456 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11248778| 2021-04-06 18:47:20.609498175 | 
2021-04-06 18:47:24.196610989 | 2021-04-06 18:47:24.233891509 |
| 844910ece80be8bc_7be09b71c487d5d3 | 457 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11249297| 2021-04-06 18:47:24.660687691 | 
2021-04-06 18:47:27.734848138 | 2021-04-06 18:47:27.762860931 |
| 844910ece80be8bc_7be09b71c487d5d3 | 458 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11249046| 2021-04-06 18:47:28.128078919 | 
2021-04-06 18:47:31.652250155 | 2021-04-06 18:47:31.690460702 |
| 844910ece80be8bc_7be09b71c487d5d3 | 459 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11249824| 2021-04-06 18:47:32.286068833 | 
2021-04-06 18:47:36.461676369 | 2021-04-06 18:47:36.486294829 |
| 844910ece80be8bc_7be09b71c487d5d3 | 460 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11249913| 2021-04-06 18:47:36.944984769 | 
2021-04-06 18:47:40.162251810 | 2021-04-06 18:47:40.188262747 |
| 844910ece80be8bc_7be09b71c487d5d3 | 461 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11248237| 2021-04-06 18:47:40.719734516 | 
2021-04-06 18:47:44.370867837 | 2021-04-06 18:47:44.397872698 |
| 844910ece80be8bc_7be09b71c487d5d3 | 462 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11602754| 2021-04-06 18:47:44.844728218 | 
2021-04-06 18:48:24.309093588 | 2021-04-06 18:48:24.339811197 |
| 844910ece80be8bc_7be09b71c487d5d3 | 463 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11249162| 2021-04-06 18:48:24.847852183 | 
2021-04-06 18:48:30.529014754 | 2021-04-06 18:48:30.556962859 |
| 844910ece80be8bc_7be09b71c487d5d3 | 464 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11248908| 2021-04-06 18:48:31.148468537 | 
2021-04-06 18:48:36.805296070 | 2021-04-06 18:48:36.830190418 |
| 844910ece80be8bc_7be09b71c487d5d3 | 465 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11250833| 2021-04-06 18:48:37.258673133 | 
2021-04-06 18:48:39.849493178 | 2021-04-06 18:48:39.875272790 |
| 844910ece80be8bc_7be09b71c487d5d3 | 466 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11248570| 2021-04-06 18:48:40.304598973 | 
2021-04-06 18:48:43.572838266 | 2021-04-06 18:48:43.597973739 |
| 844910ece80be8bc_7be09b71c487d5d3 | 467 | 2021-04-06 18:00:00 | 
ClosedMutableBuffer | 11248882| 2021-04-06 18:48:44.086791040 | 
2021-04-06 18:48:46.746045462 | 2021-04-06 

[jira] [Created] (ARROW-12224) [Rust] Use stable rust for no default test, clean up CI tests

2021-04-06 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12224:
---

 Summary:  [Rust] Use stable rust for no default test, clean up CI 
tests
 Key: ARROW-12224
 URL: https://issues.apache.org/jira/browse/ARROW-12224
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb
Assignee: Andrew Lamb



# Rationale

1. As @jorgecarleitao noted on 
https://github.com/apache/arrow/pull/9889#discussion_r607720790, we should be 
running the check if arrow compiles with stable rust as that is what we target 
for the arrow crate
2. I noticed that there were several redundant settings of `RUSTFLAGS`
3. The titles of many of the tests are confusing (to me) as they have a lot of 
detailed architecture / rust version information rather than the test title




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12214) [Rust][DataFusion] Add some tests for limit

2021-04-05 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12214:
---

 Summary: [Rust][DataFusion] Add some tests for limit
 Key: ARROW-12214
 URL: https://issues.apache.org/jira/browse/ARROW-12214
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12210) [Rust][DataFusion] Document SHOW TABLES / SHOW COLUMNS / InformationSchema

2021-04-05 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12210:
---

 Summary: [Rust][DataFusion] Document SHOW TABLES / SHOW COLUMNS / 
InformationSchema
 Key: ARROW-12210
 URL: https://issues.apache.org/jira/browse/ARROW-12210
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12204) [Rust][CI]

2021-04-05 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12204:
---

 Summary: [Rust][CI]
 Key: ARROW-12204
 URL: https://issues.apache.org/jira/browse/ARROW-12204
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


# Rationale

The [integration 
test](https://github.com/apache/arrow/pull/9884/checks?check_run_id=2263730460) 
has a fixed size builder docker image and has builds from several Arrow 
implementations.

The Rust build artifacts (compiled binaries) in the integration tests still 
consume ~ 1GB of space even after https://github.com/apache/arrow/pull/9879 
(see @pitrou 's comment on  
https://github.com/apache/arrow/pull/9884#issuecomment-813037756).

It would be nice to reduce this even more (and speed up integration test while 
we are at it)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12194) [Rust] [Parquet] Update zstd version

2021-04-04 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12194:
---

 Summary: [Rust] [Parquet] Update zstd version
 Key: ARROW-12194
 URL: https://issues.apache.org/jira/browse/ARROW-12194
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb
Assignee: Andrew Lamb


updates zstd version used by parquet crate to zstd = "0.7.0+zstd.1.4.9".





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12171) [Rust] Clippy error

2021-03-31 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12171:
---

 Summary: [Rust] Clippy error
 Key: ARROW-12171
 URL: https://issues.apache.org/jira/browse/ARROW-12171
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12159) [Rust][DataFusion] Support grouping on expressions

2021-03-30 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12159:
---

 Summary: [Rust][DataFusion] Support grouping on expressions
 Key: ARROW-12159
 URL: https://issues.apache.org/jira/browse/ARROW-12159
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Andrew Lamb


Usecase:

I want to group based on time windows (as defined by the `date_trunc` 
function). 

For example, given the table:

{code}
+--+---+-+-+--+---+--+---++---+-+++
| cpu  | host  | time| usage_guest | 
usage_guest_nice | usage_idle| usage_iowait | usage_irq | usage_nice | 
usage_softirq | usage_steal | usage_system   | usage_user |
+--+---+-+-+--+---+--+---++---+-+++
| cpu0 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 65.30408773649165 | 0| 0 | 0  | 0 | 
0   | 18.444666002000673 | 16.251246261217506 |
| cpu1 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 84.43113772402216 | 0| 0 | 0  | 0 | 
0   | 3.193612774446795  | 12.37524950097282  |
| cpu2 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 65.96806387199344 | 0| 0 | 0  | 0 | 
0   | 15.469061876247794 | 18.56287425146831  |
| cpu3 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 84.0478564307993  | 0| 0 | 0  | 0 | 
0   | 3.0907278165770684 | 12.861415752863932 |
| cpu4 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 63.21036889281897 | 0| 0 | 0  | 0 | 
0   | 13.758723828377473 | 23.030907278223218 |
| cpu5 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 83.94815553242313 | 0| 0 | 0  | 0 | 
0   | 2.991026919231221  | 13.0608175473346   |
| cpu6 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 70.85828343276965 | 0| 0 | 0  | 0 | 
0   | 12.87425149699077  | 16.26746506987651  |
| cpu7 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 83.9321357287122  | 0| 0 | 0  | 0 | 
0   | 3.093812375243205  | 12.974051896176206 |
| cpu8 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 74.80079681313936 | 0| 0 | 0  | 0 | 
0   | 10.756972111708253 | 14.442231075949556 |
| cpu9 | MacBook-Pro.local | 16171301300 | 0   | 0  
  | 83.84845463618315 | 0| 0 | 0  | 0 | 
0   | 3.0907278165434624 | 13.060817547316466 |
+--+---+-+-+--+---+--+---++---+-+++

{code}

I want to be able to find the min and max usage time grouped by minute

{code}
select 
  date_trunc('minute', cast (time as timestamp)), 
  min(usage_user), 
  max(usage_user) 
from
  cpu 
group by 
  date_trunc('minute', cast (time as timestamp)), min(usage_user)"
{code}

Or alternately

{code}
select 
  date_trunc('minute', cast (time as timestamp)), 
  min(usage_user), 
  max(usage_user) 
from
  cpu 
group by 
  1
{code}



{code}Instead as of now I get a planning error:
Error preparing query Error during planning: Projection references 
non-aggregate values
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12158) [Rust][DataFusion]: Implement support for the `now()` sql function

2021-03-30 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12158:
---

 Summary: [Rust][DataFusion]: Implement support for the `now()` sql 
function
 Key: ARROW-12158
 URL: https://issues.apache.org/jira/browse/ARROW-12158
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Andrew Lamb
Assignee: Andrew Lamb


Usecase: selecting the last 5 minutes of data

I would like to be able to run queries like this:
```
select * from cpu where time > now() - interval '3' minute;
```

Proposed implementation:
follow postgres functions:  
https://www.postgresql.org/docs/current/functions-datetime.html




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12108) [Rust][DataFusion] Support `SHOW TABLES`

2021-03-26 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12108:
---

 Summary: [Rust][DataFusion] Support `SHOW TABLES`
 Key: ARROW-12108
 URL: https://issues.apache.org/jira/browse/ARROW-12108
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12109) [Rust][DataFusion] Support `SHOW COLUMNS`

2021-03-26 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12109:
---

 Summary: [Rust][DataFusion] Support `SHOW COLUMNS`
 Key: ARROW-12109
 URL: https://issues.apache.org/jira/browse/ARROW-12109
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12107) [Rust][DataFusion] Support `SELECT * from information_schema.columns`

2021-03-26 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12107:
---

 Summary: [Rust][DataFusion] Support `SELECT * from 
information_schema.columns`
 Key: ARROW-12107
 URL: https://issues.apache.org/jira/browse/ARROW-12107
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12106) [Rust][DataFusion] Support `SELECT * from information_schema.tables`

2021-03-26 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12106:
---

 Summary: [Rust][DataFusion] Support `SELECT * from 
information_schema.tables`
 Key: ARROW-12106
 URL: https://issues.apache.org/jira/browse/ARROW-12106
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12076) Fix build

2021-03-24 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12076:
---

 Summary: Fix build
 Key: ARROW-12076
 URL: https://issues.apache.org/jira/browse/ARROW-12076
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb
Assignee: Andrew Lamb




There was a logical conflict between 
https://github.com/apache/arrow/commit/eebf64b00e3a26f61c4bebec7241a0b24d27ec67 
which removed the Arc in `ArrayData` and  
https://github.com/apache/arrow/commit/8dd6abbb72b6b8958f3b2f35512bdadcaf43066f 
which optimized the compute kernels.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12075) [Rust][DataFusion] Add CTE to list of supported features

2021-03-24 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12075:
---

 Summary: [Rust][DataFusion] Add CTE to list of supported features
 Key: ARROW-12075
 URL: https://issues.apache.org/jira/browse/ARROW-12075
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12024) [Rust] Rust 1.52 has additional clippy lint failure

2021-03-19 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12024:
---

 Summary: [Rust] Rust 1.52 has additional clippy lint failure
 Key: ARROW-12024
 URL: https://issues.apache.org/jira/browse/ARROW-12024
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb
Assignee: Andrew Lamb


Rust 1.52 was released yesterday:

info: latest update on 2021-03-19, rust version 1.52.0-nightly (1705a7d64 
2021-03-18)

Resulting in lint failures such as
https://github.com/apache/arrow/pull/9749/checks?check_run_id=2144048180

{code}
error: this `else { if .. }` block can be collapsed
   --> arrow/src/array/array_binary.rs:427:20
|
427 |   } else {
|  ^
428 | | if let Some(size) = size {
429 | | buffer.extend_zeros(size);
430 | | } else {
431 | | prepend += 1;
432 | | }
433 | | }
| |_^
|
= note: `-D clippy::collapsible-if` implied by `-D warnings`
= help: for further information visit 
https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_if
help: collapse nested if block
|
427 | } else if let Some(size) = size {
428 | buffer.extend_zeros(size);
429 | } else {
430 | prepend += 1;
431 | }
|
{code}

Reproduce via running `rustup`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12020) [Rust][DataFusion] Adding SHOW TABLES and SHOW COLUMNS + partial information_schema support to DataFusion

2021-03-18 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-12020:
---

 Summary: [Rust][DataFusion] Adding SHOW TABLES and SHOW COLUMNS + 
partial information_schema support to DataFusion
 Key: ARROW-12020
 URL: https://issues.apache.org/jira/browse/ARROW-12020
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - DataFusion
Reporter: Andrew Lamb
Assignee: Andrew Lamb


See proposal here: 
https://docs.google.com/document/d/12cpZUSNPqVH9Z0BBx6O8REu7TFqL-NPPAYCUPpDls1k/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11992) [Rust][Parquet] Add upgrade notes on 4.0 rename of LogicalType #9731

2021-03-16 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11992:
---

 Summary: [Rust][Parquet] Add upgrade notes on 4.0 rename of 
LogicalType #9731
 Key: ARROW-11992
 URL: https://issues.apache.org/jira/browse/ARROW-11992
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11979) [Rust] Combine limit into SortOptions

2021-03-16 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11979:
---

 Summary: [Rust] Combine limit into SortOptions
 Key: ARROW-11979
 URL: https://issues.apache.org/jira/browse/ARROW-11979
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


The `sort_limit` kernel was added by @sundy-li in 
https://github.com/apache/arrow/pull/9602

While writing some doc examples in https://github.com/apache/arrow/pull/9721, 
it occured to me we could potentially simplify the API so I figured I would 
offer a proposed PR for comment

# Rationale

Since we already have a `SortOptions` structure that controls sorting options, 
we could also add the `limit` to that structure rather than adding a new 
`sort_limit` function and still avoid changing the API

# Changes

Move the `limit` option to `SortOptions`




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11977) [Rust] Add documentation examples for sort kernel

2021-03-16 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11977:
---

 Summary: [Rust] Add documentation examples for sort kernel
 Key: ARROW-11977
 URL: https://issues.apache.org/jira/browse/ARROW-11977
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11969) [Rust][DataFusion] Improve Examples in documentation

2021-03-15 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11969:
---

 Summary: [Rust][DataFusion] Improve Examples in documentation
 Key: ARROW-11969
 URL: https://issues.apache.org/jira/browse/ARROW-11969
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


It would be cool to have an example on the main README.md of datafusion (that 
appears on the crates.io homepage) that shows a prospective user what 
DataFusion offers. 

e.g look at how tokio does it https://crates.io/crates/tokio)

I plan to lift the nice example from 
https://docs.rs/datafusion/3.0.0/datafusion/ ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11962) [Rust][DataFusion] Update Datafusion Docs / readme

2021-03-14 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11962:
---

 Summary: [Rust][DataFusion] Update Datafusion Docs / readme
 Key: ARROW-11962
 URL: https://issues.apache.org/jira/browse/ARROW-11962
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11952) [Rust] Make ArrayData --> GenericListArray fallable instead of `panic!`

2021-03-13 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11952:
---

 Summary: [Rust] Make ArrayData --> GenericListArray fallable 
instead of `panic!`
 Key: ARROW-11952
 URL: https://issues.apache.org/jira/browse/ARROW-11952
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11951) [Rust] Remove OffsetSize::prefix

2021-03-13 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11951:
---

 Summary: [Rust] Remove OffsetSize::prefix
 Key: ARROW-11951
 URL: https://issues.apache.org/jira/browse/ARROW-11951
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb



Background:
Left over cleanups suggested by from @sunchao on  
https://github.com/apache/arrow/pull/9425

Broken out from https://github.com/apache/arrow/pull/9508

Rationale:
This function is redundant with `OffsetSize::is_large`




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11908) [Rust] Intermittent Flight integration test failures

2021-03-08 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11908:
---

 Summary: [Rust] Intermittent Flight integration test failures
 Key: ARROW-11908
 URL: https://issues.apache.org/jira/browse/ARROW-11908
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb
Assignee: Andrew Lamb


This is similar to the symptoms seen in ARROW-11717 but it is still happening  
intermittently

On two separate PR I see similar failures:
https://github.com/apache/arrow/pull/9645/checks?check_run_id=2052183132
https://github.com/apache/arrow/pull/9647/checks?check_run_id=2051946608

Example failure:

{code}

subprocess.CalledProcessError: Command 
'['/build/cpp/debug/flight-test-integration-client', '-host', 'localhost', 
'-port=41743', '-scenario', 'auth:basic_proto']' died with .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/runner.py", line 308, in 
_run_flight_test_case
consumer.flight_request(port, **client_args)
  File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 116, in 
flight_request
run_cmd(cmd)
  File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd
raise RuntimeError(sio.getvalue())
RuntimeError: Command failed: /build/cpp/debug/flight-test-integration-client 
-host localhost -port=41743 -scenario auth:basic_proto
With output:
--
-- Arrow Fatal Error --
Invalid: Expected UNAUTHENTICATED but got Unavailable

--

# FAILURES #
FAILED TEST: auth:basic_proto Rust producing,  C++ consuming

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11896) [Rust] Hang in CI

2021-03-06 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11896:
---

 Summary: [Rust] Hang in CI
 Key: ARROW-11896
 URL: https://issues.apache.org/jira/browse/ARROW-11896
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb


As observed first by [~nevi_me] on 
https://github.com/apache/arrow/pull/9592#issuecomment-791901636

The Rust CI tests seem to be failing due to a timeout, due to a timeout . For 
example: https://github.com/apache/arrow/runs/2045186826 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11882) [Rust] Implement Debug printing "kernel"

2021-03-05 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11882:
---

 Summary: [Rust] Implement Debug printing "kernel"
 Key: ARROW-11882
 URL: https://issues.apache.org/jira/browse/ARROW-11882
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb


[~jorgecarleitao] offered a great way to improve the Debug/Display 
implementations for various Array implementations on 
https://github.com/apache/arrow/pull/9624#issuecomment-790976766 

The only reason we are implementing to_isize/to_usize on NativeType is because 
we have a function to represent an array (for Display) that accepts a generic 
physical type T, and then tries to convert it to a isize depending on a logical 
type (DataType::Date). However, there is already a Many to one relationship 
between logical and physical types.

Thus, a solution for this is to have the `Debug` function branch off depending 
on the (logical) datatype, implementing the custom string representation 
depending on it, instead of having a loop of native type T and then branching 
off according to the DataType inside the loop.

I.e. instead of

{code}
for i in ... {
   match data_type {
 DataType::Date32 => represent array[i] as date
 DataType::Int32 => represent array[i] as int
   }
}
{code}

imo we should have

{code}
match data_type {
 DataType::Date32 => for i in ... {represent array[i] as date}
 DataType::Int32 => for i in ... {represent array[i] as int}
}
{code}

i.e. treat the Display as any other "kernel", where behavior is logical, not 
physical, type-dependent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11881) [Rust][DataFusion] Fix Clippy Lint

2021-03-05 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11881:
---

 Summary: [Rust][DataFusion] Fix Clippy Lint
 Key: ARROW-11881
 URL: https://issues.apache.org/jira/browse/ARROW-11881
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb


A linter error has appeared on master somehow:

```
error: unnecessary parentheses around `for` iterator expression
   --> datafusion/src/physical_plan/merge.rs:124:31
|
124 | for part_i in (0..input_partitions) {
|   ^ help: remove these 
parentheses
|
= note: `-D unused-parens` implied by `-D warnings`

error: aborting due to previous error

Seen on these PRs:

https://github.com/apache/arrow/pull/9612/checks?check_run_id=2042047472

https://github.com/apache/arrow/pull/9639/checks?check_run_id=2042649120



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11863) [Rust][DataFusion] No way to get to the examples from docs.rs

2021-03-04 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11863:
---

 Summary: [Rust][DataFusion] No way to get to the examples from 
docs.rs
 Key: ARROW-11863
 URL: https://issues.apache.org/jira/browse/ARROW-11863
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb
 Attachments: Screen Shot 2021-03-04 at 2.51.54 PM.png

https://docs.rs/datafusion/3.0.0/datafusion/ has a tantalizing piece of text 
about the examples, but no link or explanation of how to find them

 !Screen Shot 2021-03-04 at 2.51.54 PM.png! 

The examples are at 
https://github.com/apache/arrow/tree/master/rust/datafusion/examples

The ideal outcome would be to point people somehow at the examples directory 
for the version of the docs they are looking at in docs.rs. The ok, outcome 
would be to point the docs from docs.rs always at master. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11862) [Rust] String and BinaryArray created from iterators that don't accurately report size can lead to undefined behavior

2021-03-04 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11862:
---

 Summary: [Rust] String and BinaryArray created from iterators that 
don't accurately report size can lead to undefined behavior
 Key: ARROW-11862
 URL: https://issues.apache.org/jira/browse/ARROW-11862
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb


As [~jorgecarleitao] says on  
https://github.com/apache/arrow/pull/9588#discussion_r584290701

The (Rust) Iterator spec recommends, but does not require, that the iterator 
reports a correct length. Consumer that lead to undefined behavior from an 
incorrect size_hint are the causers of said undefined behavior.

The only case where consumers can trust the iterators' length is when the 
interator implement unsafe trait TrustedLen. Unfortunately, TrustedLen is still 
in unstable. For that reason, we have been exposing unsafe 
Buffer::from_trusted_len_iter and the like for those cases.

So the code should be updated to handle the case where the reported `size_hint` 
turns out to be incorrect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11860) [Rust] [DataFusion] Add DataFusion logos

2021-03-04 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11860:
---

 Summary: [Rust] [DataFusion] Add DataFusion logos
 Key: ARROW-11860
 URL: https://issues.apache.org/jira/browse/ARROW-11860
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9630|https://github.com/apache/arrow/pull/9630]
I don't think this needs a JIRA?

These are the DataFusion logos that I had created before the project was 
donated to Apache Arrow. They weren't part of the source code repo so didn't 
get donated at the time.

https://user-images.githubusercontent.com/934084/109990656-d55ddf80-7cc6-11eb-8bbc-f21946fd1dfc.png;>

https://user-images.githubusercontent.com/934084/109990665-d68f0c80-7cc6-11eb-891c-bf367cb5f447.png;>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11851) [Rust][DataFusion] Add coercion support for `NULL` literals

2021-03-03 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11851:
---

 Summary: [Rust][DataFusion] Add coercion support for `NULL` 
literals
 Key: ARROW-11851
 URL: https://issues.apache.org/jira/browse/ARROW-11851
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andrew Lamb


As we observed in 
https://github.com/apache/arrow/pull/9565#discussion_r586347165 datafusion 
won't coerce null literals, forcing strange syntax such as:

```
rpad('hi', CAST(NULL AS INT), 'xy')

We should add automatic coercion logic from the null literal to any type and 
this expression should work just fine (produce a NULL output)

```
rpad('hi', NULL, 'xy')
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11821) [Rust] Edit Rust README

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11821:
---

 Summary: [Rust] Edit Rust README
 Key: ARROW-11821
 URL: https://issues.apache.org/jira/browse/ARROW-11821
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9576|https://github.com/apache/arrow/pull/9576]
Edits and fixes for some missing words, punctuation, and wording.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11819) [Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11819:
---

 Summary: [Rust] Add link to the doc
 Key: ARROW-11819
 URL: https://issues.apache.org/jira/browse/ARROW-11819
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11818) [Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11818:
---

 Summary: [Rust] Add link to the doc
 Key: ARROW-11818
 URL: https://issues.apache.org/jira/browse/ARROW-11818
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11817) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11817:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11817
 URL: https://issues.apache.org/jira/browse/ARROW-11817
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11816) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11816:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11816
 URL: https://issues.apache.org/jira/browse/ARROW-11816
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11815) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11815:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11815
 URL: https://issues.apache.org/jira/browse/ARROW-11815
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11814) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11814:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11814
 URL: https://issues.apache.org/jira/browse/ARROW-11814
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11813) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11813:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11813
 URL: https://issues.apache.org/jira/browse/ARROW-11813
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11812) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11812:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11812
 URL: https://issues.apache.org/jira/browse/ARROW-11812
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11811) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11811:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11811
 URL: https://issues.apache.org/jira/browse/ARROW-11811
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11810) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11810:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11810
 URL: https://issues.apache.org/jira/browse/ARROW-11810
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11809) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11809:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11809
 URL: https://issues.apache.org/jira/browse/ARROW-11809
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[9594|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11808) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11808:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11808
 URL: https://issues.apache.org/jira/browse/ARROW-11808
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


Issue automatically created from Pull Request 
[PRNUM|https://github.com/apache/arrow/pull/9594]
This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11807) TESTING PLEASE IGNORE[Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11807:
---

 Summary: TESTING PLEASE IGNORE[Rust] Add link to the doc
 Key: ARROW-11807
 URL: https://issues.apache.org/jira/browse/ARROW-11807
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andrew Lamb


This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11805) [Rust] Add link to the doc

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11805:
---

 Summary: [Rust] Add link to the doc
 Key: ARROW-11805
 URL: https://issues.apache.org/jira/browse/ARROW-11805
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb


This is a test PR with a minor fix, that has no JIRA issue, to automatically 
create the issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11804) [Developer] Add option to auto-create JIRA issue for PRs which don't have it

2021-02-27 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11804:
---

 Summary: [Developer] Add option to auto-create JIRA issue for PRs 
which don't have it
 Key: ARROW-11804
 URL: https://issues.apache.org/jira/browse/ARROW-11804
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


Improve dev workflow by automatically creating JIRA tickets if requested, as 
discussed here;
https://lists.apache.org/thread.html/rd4533c7f882adbfc51061aceafebe8d84ea194fa5108d6cebc3621e1%40%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11802) [Rust][DataFusion] Mixing of crossbeam channel and async tasks can lead to deadlock

2021-02-26 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11802:
---

 Summary: [Rust][DataFusion] Mixing of crossbeam channel and async 
tasks can lead to deadlock
 Key: ARROW-11802
 URL: https://issues.apache.org/jira/browse/ARROW-11802
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Andrew Lamb


[~edrevo] noticed, on 
https://github.com/apache/arrow/pull/9523#issuecomment-786237494, that the use 
of crossbeam channels can potentially deadlock datafusion

The use of crossbeam channel is left over from earlier, non `async` 
implementations and get been fingered in some hangs that [~MikeSeddonAU] has 
observed in DataFusion ). Specifically the crossbeam channel can block a thread 
when the channel is full or empty, which can result in blocking all the tokio 
executor threads and deadlocking the system

The proposal is is to use tokio's mpsc channels instead of crossbeam which can 
properly yield back to tokio to run another task when the channel is either 
full or empty. .




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11790) [Rust][DataFusion] Change plan builder signature to take Vec rather than &[Expr]

2021-02-25 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11790:
---

 Summary: [Rust][DataFusion] Change plan builder signature to take 
Vec rather than &[Expr]
 Key: ARROW-11790
 URL: https://issues.apache.org/jira/browse/ARROW-11790
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


Another thing to do is to change the signagure of LogicalPlanBuilder
from taking slices of owned things &[Expr] to just taking Vec entirely 

The rationale is that at all callsites you need to have an owned vec and 
Datafusion is going to copy anyways, so it would better to allow the caller to 
give up ownership



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11773) [Rust] Allow json writer to write out JSON arrays as well as newline formatted objects

2021-02-24 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11773:
---

 Summary: [Rust] Allow json writer to write out JSON arrays as well 
as newline formatted objects
 Key: ARROW-11773
 URL: https://issues.apache.org/jira/browse/ARROW-11773
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb



Currently the arrow json writer makes JSON that looks like this (one record per 
line):
```
{"foo":1}
{"bar":1}
```

Whereas a JSON array looks like this
```
[
  {"foo":1},
  {"bar":1}
]
```
It would be nice to write out json in a streaming fashion (we added such a 
feature in IOx via https://github.com/influxdata/influxdb_iox/pull/870/files)

/// Writes out well formed JSON arays in a streaming fashion
///
/// [{"foo": "bar"}, {"foo": "baz"}]
///
/// This is based on the arrow JSON writer (json::writer::Writer)




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11753) [Rust][DataFusion] Add test for Join Statement: Schema contains duplicate unqualified field name

2021-02-23 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11753:
---

 Summary: [Rust][DataFusion] Add test for Join Statement: Schema 
contains duplicate unqualified field name
 Key: ARROW-11753
 URL: https://issues.apache.org/jira/browse/ARROW-11753
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


PR to add a test for this ticket



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11742) [Rust] [DataFusion] Add Expr::is_null and Expr::is_not_null functions

2021-02-23 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11742:
---

 Summary: [Rust] [DataFusion] Add Expr::is_null and 
Expr::is_not_null functions
 Key: ARROW-11742
 URL: https://issues.apache.org/jira/browse/ARROW-11742
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


There are functions such as `Expr::lt` for building up expression trees more 
simply

I recently noticed that there is no `Expr::is_null()` or `Expr::is_not_null` 
for easily creating `Expr::IsNull(..)` and `Expr::IsNotNull(..)`, respectively. 
 Instead users must currently do something like;
```

let tag_name_is_not_null = 
Expr::IsNotNull(Box::new(col(tag_name)));
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11717) [Integration] Intermittent (but frequent) flight integration failures with auth:basic_proto

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11717:
---

 Summary: [Integration] Intermittent (but frequent) flight 
integration failures with auth:basic_proto
 Key: ARROW-11717
 URL: https://issues.apache.org/jira/browse/ARROW-11717
 Project: Apache Arrow
  Issue Type: Bug
  Components: Integration
Reporter: Andrew Lamb


Link to discussion on list: 
https://lists.apache.org/thread.html/r0dcdc2b6334e7f067a828634cf7584406ed859ff4d3fb622fef1bdd7%40%3Cdev.arrow.apache.org%3E

I noticed that the Rust/CPP integration tests are failing seemingly
intermittently on master (and on Rust PRs). The tests pass if they are re-run 
(enough)

There are several commits that  the little red `X` meaning that CI didn't
pass on master https://github.com/apache/arrow/commits/master

Here are some Some example CI runs that are failing
https://github.com/apache/arrow/runs/1935673508
https://github.com/apache/arrow/runs/1926705212

Here is another example:
https://github.com/apache/arrow/pull/9359/checks?check_run_id=1941967422

Example failure:
{code}

==
Testing file auth:basic_proto
==
Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 411, in 
check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/conda/envs/arrow/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 
'['/build/cpp/debug/flight-test-integration-client', '-host', 'localhost', 
'-port=33569', '-scenario', 'auth:basic_proto']' died with .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/arrow/dev/archery/archery/integration/runner.py", line 308, in 
_run_flight_test_case
consumer.flight_request(port, **client_args)
  File "/arrow/dev/archery/archery/integration/tester_cpp.py", line 116, in 
flight_request
run_cmd(cmd)
  File "/arrow/dev/archery/archery/integration/util.py", line 148, in run_cmd
raise RuntimeError(sio.getvalue())
RuntimeError: Command failed: /build/cpp/debug/flight-test-integration-client 
-host localhost -port=33569 -scenario auth:basic_proto
With output:
--
-- Arrow Fatal Error --
Invalid: Expected UNAUTHENTICATED but got Unavailable
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11716) [Rust][DataFusion] Change tests in sql.rs to use `assert_batch`

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11716:
---

 Summary: [Rust][DataFusion] Change tests in sql.rs to use 
`assert_batch` 
 Key: ARROW-11716
 URL: https://issues.apache.org/jira/browse/ARROW-11716
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb



The idea is to make the tests in 
[sql.rs|https://github.com/apache/arrow/blob/master/rust/datafusion/tests/sql.rs#L103]
 more maintainable by using the `assert_batches_eq` macro that was introduced 
here: https://github.com/apache/arrow/pull/9264




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11715) [Rust] Ensure a successful MIRI Run on CI

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11715:
---

 Summary: [Rust] Ensure a successful MIRI Run on CI
 Key: ARROW-11715
 URL: https://issues.apache.org/jira/browse/ARROW-11715
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


Now we have the MIRI check setup to pass even of `cargo miri` returns an error.

https://github.com/apache/arrow/blob/master/.github/workflows/rust.yml#L263-L264
{code}
  # Ignore MIRI errors until we can get a clean run
  cargo miri test || true
{code}

Goal is to make MIRI pass and then remove this check in CI 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11714) [Rust] Fix MIRI build on CI

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11714:
---

 Summary: [Rust] Fix MIRI build on CI
 Key: ARROW-11714
 URL: https://issues.apache.org/jira/browse/ARROW-11714
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


the MIRI check doesn't even compile anymore:

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11713) [Rust] Get MIRI running again

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11713:
---

 Summary: [Rust] Get MIRI running again
 Key: ARROW-11713
 URL: https://issues.apache.org/jira/browse/ARROW-11713
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


Rust's MIRI https://github.com/rust-lang/miri can help detect logical errors in 
programs

The Rust arrow implementation now runs the MIRI checks as part of CI, but it 
does not pass cleanly

For example:
https://github.com/apache/arrow/pull/9535/checks?check_run_id=1941313240

{code}

   Compiling criterion v0.3.4
   Compiling h2 v0.3.0
   Compiling tower v0.4.5
   Compiling hyper v0.14.4
error[E0463]: can't find crate for `tracing`
  --> 
/home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.4/src/lib.rs:68:1
   |
68 | extern crate tracing;
   | ^ can't find crate

error: aborting due to previous error
{code}

Previously MIRI ran but the check failed in FFI somewhere



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11712) [Rust][DataFusion] Introduce PlanRewriter for rewriting plans

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11712:
---

 Summary: [Rust][DataFusion] Introduce PlanRewriter for rewriting 
plans
 Key: ARROW-11712
 URL: https://issues.apache.org/jira/browse/ARROW-11712
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


Introduce a PlanRewriter to encapsulate visiting all logical plan nodes and 
rewriting them bottom up (and get rid of utils::inputs, utils::exprs, etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11711) [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11711:
---

 Summary: [Rust][DataFusion] Rename ExpressionVisitor --> 
ExprVisitor and standardize input
 Key: ARROW-11711
 URL: https://issues.apache.org/jira/browse/ARROW-11711
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb


Rename ExpressionVisitor ExprVisitor for consistency and change it to use ` 
self` rather than consuming the visitor for consistency with `PlanVisitor` (as 
well as the soon to be created `ExprVisitor`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11710) [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11710:
---

 Summary: [Rust][DataFusion] Implement ExprRewriter to avoid tree 
traversal redundancy
 Key: ARROW-11710
 URL: https://issues.apache.org/jira/browse/ARROW-11710
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb



The idea is to
1. Reduce the amount repetitions in optimizer rules to make them easier to 
implement

2. Reduce the amount of repetition to make it easier to see the actual logic 
(rather than having it intertwined with the code needed to do recursion)

2. Set the stage for a more general `PlanRewriter` that doesn't have  to clone 
its input, and  can modify take their input by value and consume them

Plan is to make an ExprRewriter, the mutable counterpart to `ExpressionVisitor` 
and demonstrates its usefulness by rewriting several expression transformation 
rewrite passes using it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11709) [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util

2021-02-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11709:
---

 Summary: [Rust][DataFusion] Move `expressions` and `inputs` into 
LogicalPlan rather than helpers in util
 Key: ARROW-11709
 URL: https://issues.apache.org/jira/browse/ARROW-11709
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb



 move `expressions` and `inputs` into LogicalPlan rather than helpers in util, 
and use Visitor rather than hard coded list

Goal is to consolidate the expression walking in one place



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11692) [Rust][DataFusion] Improve documentation on Optimizer

2021-02-18 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11692:
---

 Summary: [Rust][DataFusion] Improve documentation on Optimizer
 Key: ARROW-11692
 URL: https://issues.apache.org/jira/browse/ARROW-11692
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11690) [Rust][DataFusion] Avoid Expr::clone in Expr builder methods

2021-02-18 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11690:
---

 Summary: [Rust][DataFusion] Avoid Expr::clone in Expr builder 
methods
 Key: ARROW-11690
 URL: https://issues.apache.org/jira/browse/ARROW-11690
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11689) [Rust][DataFusion] Reduce copies in DataFusion LogicalPlan and Expr creation

2021-02-18 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11689:
---

 Summary: [Rust][DataFusion] Reduce copies in DataFusion 
LogicalPlan and Expr creation
 Key: ARROW-11689
 URL: https://issues.apache.org/jira/browse/ARROW-11689
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Andrew Lamb


The theme of this overall epic to make the plan and expression rewriting phases 
of DataFusion more efficient by avoiding copies by leveraging the Rust type 
system

Benefits:
* More standard / idomatic Rust usage
* faster / more efficient (I don't have numbers to back this up)

Downsides:
* These will be  backwards incompatible changes


h1. Background

Many things in DataFusion  look like

Input --tranformation-->output

And the input is not used again. In rust, you can model this by giving 
ownership to the transformation

At a high level the idea is to avoid so much cloning in DataFustion

The basic principle is if the function needs to `clone` one of its arguments, 
the caller should be given the choice of when to do that. Often, the caller can 
give up ownership without issue

I envision at least the following the following items:
1. Optimizer passes that take `` and produce a new `LogicalPlan` 
even though most callsites do not need the original
2. Expr builder calls that take `` and return a new `Expr`
3. An expression rewriter (TODO) while running down optimizer passes


I think this style takes advantage of Rust's ownership model and will let us 
avoid a lot o copying and allocations and avoid the need for something like 
slab allocators




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11671) [Arrow][DataFusion

2021-02-17 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11671:
---

 Summary: [Arrow][DataFusion
 Key: ARROW-11671
 URL: https://issues.apache.org/jira/browse/ARROW-11671
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11667) [Rust] Add docs for utf8 comparison functions

2021-02-17 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11667:
---

 Summary: [Rust] Add docs for utf8 comparison functions
 Key: ARROW-11667
 URL: https://issues.apache.org/jira/browse/ARROW-11667
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11635) [Rust] [DataFusion] Improve performance for grouping/hashing on dictionary encoded data

2021-02-15 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11635:
---

 Summary: [Rust] [DataFusion] Improve performance for 
grouping/hashing on dictionary encoded data
 Key: ARROW-11635
 URL: https://issues.apache.org/jira/browse/ARROW-11635
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


I am recording this for posterity / potential for someone else to help if they 
want:

While adding support for GROUP BY hash, [~jorgecarleitao] had some great 
suggestions
https://github.com/apache/arrow/pull/9233#issuecomment-762174671

The initial GROUP BY implementation hashes the actual value of the dictionary 
(aka looks up the underlying value). For the common case such as when the 
dictionary contains strings, this will likely do much more work than is 
necessary. 

In the common case we should be able to hash the dictionary indexes directly, 
or  possibly skip hashing entirely and build an aggregate table directly from 
the indexes  -- this would work incredibly well for low cardinality string 
columns

What makes it tricky is that we would have to handle the case where the 
dictionary itself is not the same across all record batches (and thus indexes 
in one record batch may not correspond to the same value in another)

Some possibly implementation ideas are:
Implement a special case for a shared dictionary across all input record 
batches, and have code to switch back to the more general case (hash table) if 
the dictionary ever changes.

Alternately, we could hold a hash table (or equivalent) for each distinct 
dictionary we saw and merge them all at the end. 

The second approach likely would likely be the fastest, but also would 
potentially consume the most resources



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11620) [Rust] [DataFusion] Inconsistent use of Box and Arc for TableProvider

2021-02-13 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11620:
---

 Summary: [Rust] [DataFusion] Inconsistent use of Box and Arc for 
TableProvider
 Key: ARROW-11620
 URL: https://issues.apache.org/jira/browse/ARROW-11620
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb
Assignee: Andrew Lamb


The API inconsistently uses Box and Arc -- we 
should standardize on Arc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11602) [Rust] Clippy CI is failing

2021-02-11 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11602:
---

 Summary: [Rust] Clippy CI is failing
 Key: ARROW-11602
 URL: https://issues.apache.org/jira/browse/ARROW-11602
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb
Assignee: Andrew Lamb


CI uses "stable" rust
1.50 stable was updated today: 
https://blog.rust-lang.org/2021/02/11/Rust-1.50.0.html

The new clippy is pickier resulting in many clippy warnings such as 
https://github.com/apache/arrow/pull/9469/checks?check_run_id=1881854256

We need to get CI back green



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11594) [Rust] Support pretty printing with NullArrays

2021-02-11 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11594:
---

 Summary: [Rust] Support pretty printing with NullArrays
 Key: ARROW-11594
 URL: https://issues.apache.org/jira/browse/ARROW-11594
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb




The whole point of `NullArray::new_with_type` is to to be able to cheaply 
construct entirely null columns, with a smaller memory footprint.

Currently trying to print them out causes a painic:

{code}
#[test]
fn test_pretty_format_null() -> Result<()> {
// define a schema.
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Utf8, true),
Field::new("b", DataType::Int32, true),
]));

let num_rows = 4;

// define data (null)
let batch = RecordBatch::try_new(
schema,
vec![
Arc::new(NullArray::new_with_type(num_rows, DataType::Utf8)),
Arc::new(NullArray::new_with_type(num_rows, DataType::Int32)),
],
)?;

let table = pretty_format_batches(&[batch])?;
}

{code}

Panics:

{code}

failures:

 util::pretty::tests::test_pretty_format_null stdout 
thread 'util::pretty::tests::test_pretty_format_null' panicked at 'called 
`Option::unwrap()` on a `None` value', arrow/src/util/display.rs:201:27

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11576) [Rust] Remove unused variable in example

2021-02-09 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11576:
---

 Summary: [Rust] Remove unused variable in example
 Key: ARROW-11576
 URL: https://issues.apache.org/jira/browse/ARROW-11576
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


As shown in 
https://github.com/apache/arrow/commit/3a380a4c4193c6683a71ba72dc31f8456bc661d5




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11489) [Rust][DataFusion] Make DataFrame should be Send+Sync

2021-02-03 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11489:
---

 Summary: [Rust][DataFusion] Make DataFrame should be Send+Sync
 Key: ARROW-11489
 URL: https://issues.apache.org/jira/browse/ARROW-11489
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb
Assignee: Andrew Lamb


Inspired by a question on the mailing list
https://lists.apache.org/thread.html/r8f81fae08346817fa283804037ed79a4309bb54aa8ed77c354d7baf0%40%3Cuser.arrow.apache.org%3E

Things need to be `Send + Sync` on order to be sent between threads (or async 
tasks). Thus we should make DataFrame require Send + Sync as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11457) [Rust] Make string comparisson kernels generic over Utf8 and LargeUtf8

2021-02-01 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11457:
---

 Summary: [Rust] Make string comparisson kernels generic over Utf8 
and LargeUtf8 
 Key: ARROW-11457
 URL: https://issues.apache.org/jira/browse/ARROW-11457
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Ritchie






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11414) [Rust] Reduce copies in Schema::try_merge

2021-01-28 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11414:
---

 Summary: [Rust] Reduce copies in Schema::try_merge
 Key: ARROW-11414
 URL: https://issues.apache.org/jira/browse/ARROW-11414
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb


https://github.com/apache/arrow/blob/ab5fc979c69ccc5dde07e1bc1467b02951b4b7e9/rust/arrow/src/datatypes.rs#L1832-L1860

I was looking at this code yesterday while using it in IOx -- 
https://github.com/influxdata/influxdb_iox/pull/703

Even though Schema::try_merge requires a slice of Schemas (not schema refs), it 
copies all of its fields. This is not ideal in the common case where most of 
the fields in the Schema will be the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11375) [Rust] CI fails due to deprecation warning in clippy

2021-01-25 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11375:
---

 Summary: [Rust] CI fails due to deprecation warning in clippy
 Key: ARROW-11375
 URL: https://issues.apache.org/jira/browse/ARROW-11375
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andrew Lamb
Assignee: Jorge Leitão


Rust clippy lint test on CI started failing with this error:

{code}
   Compiling arrow-flight v3.0.0-SNAPSHOT (/__w/arrow/arrow/rust/arrow-flight)
error: use of deprecated struct `criterion::Benchmark`: Please use 
BenchmarkGroups instead.
  --> arrow/benches/builder.rs:39:9
   |
39 | Benchmark::new("bench_primitive", move |b| {
   | ^^
   |
   = note: `-D deprecated` implied by `-D warnings`

error: use of deprecated struct `criterion::Benchmark`: Please use 
BenchmarkGroups instead.
  --> arrow/benches/builder.rs:62:9
   |
62 | Benchmark::new("bench_bool", move |b| {
   | ^^

error: use of deprecated associated function 
`criterion::Criterionbench`: Please use BenchmarkGroups instead.
  --> arrow/benches/builder.rs:37:7
   |
37 | c.bench(
   |   ^

error: use of deprecated associated function 
`criterion::Criterionbench`: Please use BenchmarkGroups instead.
  --> arrow/benches/builder.rs:60:7
   |
60 | c.bench(
   |   ^
{code}

It appears related to the latest release of criterion: 
https://crates.io/crates/criterion/0.3.4 (On Jan 24 2021)





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11330) [Rust][DataFusion] Add ExpressionVisitor pattern

2021-01-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11330:
---

 Summary: [Rust][DataFusion] Add ExpressionVisitor pattern
 Key: ARROW-11330
 URL: https://issues.apache.org/jira/browse/ARROW-11330
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11327) [Rust] [DataFusion] Add DictionaryArray support for create_batch_empty

2021-01-20 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11327:
---

 Summary: [Rust] [DataFusion] Add DictionaryArray support for 
create_batch_empty
 Key: ARROW-11327
 URL: https://issues.apache.org/jira/browse/ARROW-11327
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Andrew Lamb
Assignee: Andrew Lamb


the create_batch_empty function is used for creating output during aggregation. 
As part of my plan for better dictionary support it also needs to support 
DictionaryArray as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11323) [Rust][DataFusion] with queries with ORDER BY or GROUP BY that return no

2021-01-19 Thread Andrew Lamb (Jira)
Andrew Lamb created ARROW-11323:
---

 Summary: [Rust][DataFusion]  with queries with ORDER BY or GROUP 
BY that return no 
 Key: ARROW-11323
 URL: https://issues.apache.org/jira/browse/ARROW-11323
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Andrew Lamb


If you run a SQL query in datafusion which has predicates that produces no rows 
that also includes a GROUP BY or ORDER BY clause, you get the following error:

Error of "ArrowError(ComputeError("concat requires input of at least one 
array"))"

Here are two test cases that show the problem: 
https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/context.rs#L889

{code}
#[tokio::test]
async fn sort_empty() -> Result<()> {
// The predicate on this query purposely generates no results
let results =
execute("SELECT c1, c2 FROM test WHERE c1 > 10 ORDER BY c1 
DESC, c2 ASC", 4).await?;
assert_eq!(results.len(), 0);
Ok(())
}


#[tokio::test]
async fn aggregate_empty() -> Result<()> {
// The predicate on this query purposely generates no results
let results = execute("SELECT SUM(c1), SUM(c2) FROM test where c1 > 
10", 4).await?;
assert_eq!(results.len(), 0);
Ok(())
}

{code{



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >