[jira] [Assigned] (ARROW-5530) [C++] Add options to ValueCount/Unique/DictEncode kernel to toggle null behavior

2021-04-17 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-5530:
-

Assignee: Rok Mihevc

> [C++] Add options to ValueCount/Unique/DictEncode kernel to toggle null 
> behavior
> 
>
> Key: ARROW-5530
> URL: https://issues.apache.org/jira/browse/ARROW-5530
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Francois Saint-Jacques
>Assignee: Rok Mihevc
>Priority: Major
>  Labels: analytics
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11759) [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-04-17 Thread Rok Mihevc (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc reassigned ARROW-11759:
--

Assignee: Rok Mihevc

> [C++] Kernel to extract datetime components (year, month, day, etc) from 
> timestamp type
> ---
>
> Key: ARROW-11759
> URL: https://issues.apache.org/jira/browse/ARROW-11759
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Assignee: Rok Mihevc
>Priority: Major
>
> It can be very useful to extract certain "fields" from the timestamp, such as 
> the year, month, day, etc.
> See eg 
> https://pandas.pydata.org/docs/user_guide/timeseries.html#time-date-components
>  for the ones available in pandas. 
> Using pandas as an example, there are the basic components of the datetime:
> {code}
> >>> ts = pd.Timestamp.now()
> >>> ts
> Timestamp('2021-02-24 10:47:54.294504')
> >>> ts.year
> 2021
> >>> ts.month
> 2
> >>> ts.day
> 24
> >>> ts.hour
> 10
> >>> ts.minute
> 49
> >>> ts.second
> 54
> >>> ts.microsecond
> 607393
> >>> ts.nanosecond
> 0
> {code}
> (only for the sub-second, this is not fully clear how to divide it in 
> microseconds or milliseconds, etc)
> But in addition also some more "advanced" like:
> {code}
> >>> ts.dayofyear
> 55
> >>> ts.dayofweek
> 2
> >>> ts.week
> 8
> >>> ts.isocalendar()
> (2021, 8, 3)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-12437) [Rust] [Ballista] Ballista plans must not include RepartitionExec

2021-04-17 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-12437:
--

Assignee: Andy Grove

> [Rust] [Ballista] Ballista plans must not include RepartitionExec
> -
>
> Key: ARROW-12437
> URL: https://issues.apache.org/jira/browse/ARROW-12437
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Ballista plans must not include RepartitionExec because it results in 
> incorrect results. Ballista needs to manage its own repartitioning in a 
> distributed-aware way later on. For now we just need to configure the 
> DataFusion context to disable repartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-12437) [Rust] [Ballista] Ballista plans must not include RepartitionExec

2021-04-17 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-12437.

Fix Version/s: 5.0.0
   Resolution: Fixed

Issue resolved by pull request 10086
[https://github.com/apache/arrow/pull/10086]

> [Rust] [Ballista] Ballista plans must not include RepartitionExec
> -
>
> Key: ARROW-12437
> URL: https://issues.apache.org/jira/browse/ARROW-12437
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Ballista plans must not include RepartitionExec because it results in 
> incorrect results. Ballista needs to manage its own repartitioning in a 
> distributed-aware way later on. For now we just need to configure the 
> DataFusion context to disable repartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12437) [Rust] [Ballista] Ballista plans must not include RepartitionExec

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12437:
---
Labels: pull-request-available  (was: )

> [Rust] [Ballista] Ballista plans must not include RepartitionExec
> -
>
> Key: ARROW-12437
> URL: https://issues.apache.org/jira/browse/ARROW-12437
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ballista plans must not include RepartitionExec because it results in 
> incorrect results. Ballista needs to manage its own repartitioning in a 
> distributed-aware way later on. For now we just need to configure the 
> DataFusion context to disable repartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12437) [Rust] [Ballista] Ballista plans must not include RepartitionExec

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12437:
--

 Summary: [Rust] [Ballista] Ballista plans must not include 
RepartitionExec
 Key: ARROW-12437
 URL: https://issues.apache.org/jira/browse/ARROW-12437
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - Ballista
Reporter: Andy Grove


Ballista plans must not include RepartitionExec because it results in incorrect 
results. Ballista needs to manage its own repartitioning in a distributed-aware 
way later on. For now we just need to configure the DataFusion context to 
disable repartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12436) [Rust][Ballista] Add watch capabilities to config backend trait

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12436:
---
Labels: pull-request-available  (was: )

> [Rust][Ballista] Add watch capabilities to config backend trait
> ---
>
> Key: ARROW-12436
> URL: https://issues.apache.org/jira/browse/ARROW-12436
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust - Ballista
>Reporter: Ximo Guanter
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [arrow/lib.rs at 66aa3e7c365a8d4c4eca6e23668f2988e714b493 · apache/arrow 
> (github.com)|https://github.com/apache/arrow/blob/66aa3e7c365a8d4c4eca6e23668f2988e714b493/rust/ballista/rust/scheduler/src/lib.rs#L183]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-17 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-12334.

Fix Version/s: 5.0.0
   Resolution: Fixed

Issue resolved by pull request 10083
[https://github.com/apache/arrow/pull/10083]

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-12433:
---
Component/s: Rust

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-12433.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 10082
[https://github.com/apache/arrow/pull/10082]

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12436) [Rust][Ballista] Add watch capabilities to config backend trait

2021-04-17 Thread Ximo Guanter (Jira)
Ximo Guanter created ARROW-12436:


 Summary: [Rust][Ballista] Add watch capabilities to config backend 
trait
 Key: ARROW-12436
 URL: https://issues.apache.org/jira/browse/ARROW-12436
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust - Ballista
Reporter: Ximo Guanter


[arrow/lib.rs at 66aa3e7c365a8d4c4eca6e23668f2988e714b493 · apache/arrow 
(github.com)|https://github.com/apache/arrow/blob/66aa3e7c365a8d4c4eca6e23668f2988e714b493/rust/ballista/rust/scheduler/src/lib.rs#L183]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-12419) [Java] flatc is not used in mvn

2021-04-17 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-12419.
--
Fix Version/s: 5.0.0
   Resolution: Fixed

Issue resolved by pull request 10067
[https://github.com/apache/arrow/pull/10067]

> [Java] flatc is not used in mvn
> ---
>
> Key: ARROW-12419
> URL: https://issues.apache.org/jira/browse/ARROW-12419
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 4.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ARROW-12111 removed the usage of flatc during the build process in mvn. Thus, 
> it is not necessary to explicitly download flatc for s390x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12435) [Rust][DataFusion] Remove unnecessary references to namespace in executor

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12435:
---
Labels: pull-request-available  (was: )

> [Rust][DataFusion] Remove unnecessary references to namespace in executor
> -
>
> Key: ARROW-12435
> URL: https://issues.apache.org/jira/browse/ARROW-12435
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Rust - Ballista
>Reporter: Ximo Guanter
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is no need to support multiple executor clusters from a scheduler, so 
> the namespace of an executor is implicitly defined by the scheduler it 
> connects to. See 
> [https://the-asf.slack.com/archives/C01QUFS30TD/p1618679585211100] for more 
> context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12435) [Rust][DataFusion] Remove unnecessary references to namespace in executor

2021-04-17 Thread Ximo Guanter (Jira)
Ximo Guanter created ARROW-12435:


 Summary: [Rust][DataFusion] Remove unnecessary references to 
namespace in executor
 Key: ARROW-12435
 URL: https://issues.apache.org/jira/browse/ARROW-12435
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust - Ballista
Reporter: Ximo Guanter


There is no need to support multiple executor clusters from a scheduler, so the 
namespace of an executor is implicitly defined by the scheduler it connects to. 
See [https://the-asf.slack.com/archives/C01QUFS30TD/p1618679585211100] for more 
context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12334:
---
Labels: pull-request-available  (was: )

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324321#comment-17324321
 ] 

Andy Grove commented on ARROW-12433:


Thanks [~alippai] that is a good suggestion

 

So the issue is that our builds with nightly Rust are failing (our SIMD feature 
requires nightly, and the nightly version of Rust we use does not have const 
generics yet). I went ahead with a PR to pin to 0.8.3 to fix our builds.

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-12433:
---
Priority: Blocker  (was: Critical)

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12433:
---
Labels: pull-request-available  (was: )

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12434) [Rust] [Ballista] Show executed plans with metrics

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12434:
---
Labels: pull-request-available  (was: )

> [Rust] [Ballista] Show executed plans with metrics
> --
>
> Key: ARROW-12434
> URL: https://issues.apache.org/jira/browse/ARROW-12434
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Show executed plans with metrics to help with debugging and performance tuning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12434) [Rust] [Ballista] Show executed plans with metrics

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12434:
--

 Summary: [Rust] [Ballista] Show executed plans with metrics
 Key: ARROW-12434
 URL: https://issues.apache.org/jira/browse/ARROW-12434
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - Ballista
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 5.0.0


Show executed plans with metrics to help with debugging and performance tuning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Adam Lippai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324312#comment-17324312
 ] 

Adam Lippai commented on ARROW-12433:
-

[~andygrove] I don't know if it makes any difference, I filed this 
https://github.com/google/flatbuffers/pull/6573/files

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Critical
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Adam Lippai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324311#comment-17324311
 ] 

Adam Lippai commented on ARROW-12433:
-

No, likely I'm the one who don't understand something here.
 I just followed the link in the error message to the closed issue, looked into 
the feature and found the blogpost that it's shipped (in a minimal version).

For me flatbuffers 0.8.4 and arrow master compiles:
{code:java}
alippai:/mnt/c/Repositories/arrow/rust/arrow$ cargo clean
alippai:/mnt/c/Repositories/arrow/rust/arrow$ cargo --version --verbose
cargo 1.51.0 (43b129a20 2021-03-16)
release: 1.51.0
commit-hash: 43b129a20fbf1ede0df411396ccf0c024bf34134
commit-date: 2021-03-16
alippai@DESKTOP-HTTH82C:/mnt/c/Repositories/arrow/rust/arrow$ cargo build
   Compiling autocfg v1.0.1
   Compiling libc v0.2.93
   Compiling proc-macro2 v1.0.24
   Compiling memchr v2.3.4
   Compiling unicode-xid v0.2.1
   Compiling syn v1.0.67
   Compiling serde v1.0.125
   Compiling cfg-if v1.0.0
   Compiling getrandom v0.1.16
   Compiling ryu v1.0.5
   Compiling bitflags v1.2.1
   Compiling byteorder v1.4.3
   Compiling lexical-core v0.7.5
   Compiling hashbrown v0.9.1
   Compiling cfg_aliases v0.1.1
   Compiling itoa v0.4.7
   Compiling serde_json v1.0.64
   Compiling ppv-lite86 v0.2.10
   Compiling serde_derive v1.0.125
   Compiling lazy_static v1.4.0
   Compiling regex-syntax v0.6.23
   Compiling arrayvec v0.5.2
   Compiling smallvec v1.6.1
   Compiling static_assertions v1.1.0
   Compiling hex v0.4.3
   Compiling arrow v4.0.0-SNAPSHOT (/mnt/c/Repositories/arrow/rust/arrow)
   Compiling regex-automata v0.1.9
   Compiling num-traits v0.2.14
   Compiling num-integer v0.1.44
   Compiling num-bigint v0.3.2
   Compiling num-rational v0.3.2
   Compiling num-iter v0.1.42
   Compiling indexmap v1.6.2
   Compiling aho-corasick v0.7.15
   Compiling csv-core v0.1.10
   Compiling quote v1.0.9
   Compiling regex v1.4.5
   Compiling time v0.1.43
   Compiling rand_core v0.5.1
   Compiling rand_chacha v0.2.2
   Compiling num-complex v0.3.1
   Compiling rand v0.7.3
   Compiling chrono v0.4.19
   Compiling bstr v0.2.15
   Compiling csv v1.1.6
   Compiling num v0.3.1
   Compiling thiserror-impl v1.0.24
   Compiling thiserror v1.0.24
   Compiling flatbuffers v0.8.4
Finished dev [unoptimized + debuginfo] target(s) in 48.18s
{code}

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Critical
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324310#comment-17324310
 ] 

Daniël Heres commented on ARROW-12433:
--

I think the nightly versions of rust are outdated in the CI [~andygrove]

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Critical
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324308#comment-17324308
 ] 

Andy Grove commented on ARROW-12433:


[~alippai] Am I misunderstanding this issue?

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Critical
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-12430) [C++] Support LZO compression

2021-04-17 Thread Haowei Yu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324306#comment-17324306
 ] 

Haowei Yu edited comment on ARROW-12430 at 4/17/21, 4:21 PM:
-

Yes, it's for parquet.


was (Author: howryu):
Yes, it's for parquest.

> [C++] Support LZO compression
> -
>
> Key: ARROW-12430
> URL: https://issues.apache.org/jira/browse/ARROW-12430
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Haowei Yu
>Priority: Major
>
> I have some code that supports arrow compression with LZO and am willing to 
> contribute. However, I do understand there is a license concern w.r.t using 
> lzo library since it's under GPL2. I am not sure if you can take the change 
> set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324307#comment-17324307
 ] 

Andy Grove commented on ARROW-12433:


CI is already using 1.51 ... "latest update on 2021-03-25, rust version 1.51.0"

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Critical
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12430) [C++] Support LZO compression

2021-04-17 Thread Haowei Yu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324306#comment-17324306
 ] 

Haowei Yu commented on ARROW-12430:
---

Yes, it's for parquest.

> [C++] Support LZO compression
> -
>
> Key: ARROW-12430
> URL: https://issues.apache.org/jira/browse/ARROW-12430
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Haowei Yu
>Priority: Major
>
> I have some code that supports arrow compression with LZO and am willing to 
> contribute. However, I do understand there is a license concern w.r.t using 
> lzo library since it's under GPL2. I am not sure if you can take the change 
> set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Adam Lippai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324304#comment-17324304
 ] 

Adam Lippai commented on ARROW-12433:
-

Shouldn't we bump stable rust version to 1.51 instead? Ref: 
https://blog.rust-lang.org/2021/03/25/Rust-1.51.0.html

> [Rust] Builds failing due to new flatbuffer release introducing const generics
> --
>
> Key: ARROW-12433
> URL: https://issues.apache.org/jira/browse/ARROW-12433
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Andy Grove
>Priority: Critical
>
> I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
> should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12433:
--

 Summary: [Rust] Builds failing due to new flatbuffer release 
introducing const generics
 Key: ARROW-12433
 URL: https://issues.apache.org/jira/browse/ARROW-12433
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Andy Grove


I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-12432) [Rust] [DataFusion] Add metrics for SortExec

2021-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12432:
---
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Add metrics for SortExec
> 
>
> Key: ARROW-12432
> URL: https://issues.apache.org/jira/browse/ARROW-12432
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add metrics for SortExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12432) [Rust] [DataFusion] Add metrics for SortExec

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12432:
--

 Summary: [Rust] [DataFusion] Add metrics for SortExec
 Key: ARROW-12432
 URL: https://issues.apache.org/jira/browse/ARROW-12432
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - DataFusion
Reporter: Andy Grove
 Fix For: 5.0.0


Add metrics for SortExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-17 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324288#comment-17324288
 ] 

Andy Grove commented on ARROW-12334:


I tracked this down and there are two separate bugs:

1. We are getting RepartitionExec in the plan which is not compatible with 
Ballista and explodes the number of partitions (and likely causes incorrect 
results)
2. The query actually works fine and the final sort produces 2 rows, but the 
results are created by reading all the intermediate results as well

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12430) [C++] Support LZO compression

2021-04-17 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324287#comment-17324287
 ] 

Antoine Pitrou commented on ARROW-12430:


Is it for Parquet? Otherwise I'm not sure what you'd need it for.

> [C++] Support LZO compression
> -
>
> Key: ARROW-12430
> URL: https://issues.apache.org/jira/browse/ARROW-12430
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Haowei Yu
>Priority: Major
>
> I have some code that supports arrow compression with LZO and am willing to 
> contribute. However, I do understand there is a license concern w.r.t using 
> lzo library since it's under GPL2. I am not sure if you can take the change 
> set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12430) [C++] Support LZO compression

2021-04-17 Thread Haowei Yu (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324280#comment-17324280
 ] 

Haowei Yu commented on ARROW-12430:
---

Ah ok, I really want arrow to support LZO since right now I have to keep the 
diff somewhere and patch those diff whenever I need to upgrade arrow version, 
which is painful. I don't need need binary distribution, I can compile arrow by 
myself, that is fine.

> [C++] Support LZO compression
> -
>
> Key: ARROW-12430
> URL: https://issues.apache.org/jira/browse/ARROW-12430
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Haowei Yu
>Priority: Major
>
> I have some code that supports arrow compression with LZO and am willing to 
> contribute. However, I do understand there is a license concern w.r.t using 
> lzo library since it's under GPL2. I am not sure if you can take the change 
> set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-12429) [C++] MergedGeneratorTestFixture is incorrectly instantiated

2021-04-17 Thread David Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li resolved ARROW-12429.
--
Fix Version/s: 5.0.0
   Resolution: Fixed

Issue resolved by pull request 10075
[https://github.com/apache/arrow/pull/10075]

> [C++] MergedGeneratorTestFixture is incorrectly instantiated
> 
>
> Key: ARROW-12429
> URL: https://issues.apache.org/jira/browse/ARROW-12429
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://gist.github.com/kou/868eaed328b348e45865747044044272#file-source-cpp-txt]
> Looks like the base class was accidentally instantiated instead of the actual 
> test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-12430) [C++] Support LZO compression

2021-04-17 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324218#comment-17324218
 ] 

Antoine Pitrou edited comment on ARROW-12430 at 4/17/21, 10:25 AM:
---

Indeed, the license issue is a bit tricky. It is not clear whether making use 
of the LZO APIs absolutely requires adherence to the GPL by Arrow itself.

The GNU readline library (GPL-licensed) is in a similar situation and it 
[states|https://tiswww.cwru.edu/php/chet/readline/rltop.html] that you may make 
use of it inside software licensed under any GPL-compatible license (the Apache 
license 2.0 is GPL-compatible according to the FSF). However, the FSF 
[contradicts its own 
advice|https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#IfLibraryIsGPL]
 in the GPL FAQ.

If you feel strongly about this feature, you should probably contact the LZO 
author and ask them their position, because that is what matters.

Note that, in any case, we would not distribute binaries with LZO enabled; you 
would have to compile Arrow yourself for that.


was (Author: pitrou):
Indeed, the license issue is a bit tricky. It is not clear whether making use 
of the LZO APIs absolutely requires adherence to the GPL by Arrow itself.

The GNU readline library (GPL-licensed) is in a similar situation and it 
[states|https://tiswww.cwru.edu/php/chet/readline/rltop.html] that you may make 
use of it inside software licensed under any GPL-compatible license (the Apache 
license 2.0 is GPL-compatible according to the FSF). However, the FSF 
[contradicts its own 
advice|https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#IfLibraryIsGPL]
 in the GPL FAQ.

If you feel strongly about this feature, you should probably contact the LZO 
author and ask them their position.

Note that, in any case, we would not distribute binaries with LZO enabled; you 
would have to compile Arrow yourself for that.

> [C++] Support LZO compression
> -
>
> Key: ARROW-12430
> URL: https://issues.apache.org/jira/browse/ARROW-12430
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Haowei Yu
>Priority: Major
>
> I have some code that supports arrow compression with LZO and am willing to 
> contribute. However, I do understand there is a license concern w.r.t using 
> lzo library since it's under GPL2. I am not sure if you can take the change 
> set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-12430) [C++] Support LZO compression

2021-04-17 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324218#comment-17324218
 ] 

Antoine Pitrou edited comment on ARROW-12430 at 4/17/21, 10:24 AM:
---

Indeed, the license issue is a bit tricky. It is not clear whether making use 
of the LZO APIs absolutely requires adherence to the GPL by Arrow itself.

The GNU readline library (GPL-licensed) is in a similar situation and it 
[states|https://tiswww.cwru.edu/php/chet/readline/rltop.html] that you may make 
use of it inside software licensed under any GPL-compatible license (the Apache 
license 2.0 is GPL-compatible according to the FSF). However, the FSF 
[contradicts its own 
advice|https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#IfLibraryIsGPL]
 in the GPL FAQ.

If you feel strongly about this feature, you should probably contact the LZO 
author and ask them their position.

Note that, in any case, we would not distribute binaries with LZO enabled; you 
would have to compile Arrow yourself for that.


was (Author: pitrou):
Indeed, the license issue is a bit tricky. It is not clear whether making use 
of the LZO APIs absolutely requires adherence to the GPL by Arrow itself. 

The GNU readline library (GPL-licensed) is in a similar situation and it 
[states|https://tiswww.cwru.edu/php/chet/readline/rltop.html] that you may make 
use of it inside software licensed under software licensed under any 
GPL-compatible license (the Apache license 2.0 is GPL-compatible according to 
the FSF). However, the FSF [contradicts its own 
advice|https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#IfLibraryIsGPL]
 in the GPL FAQ.

If you feel strongly about this feature, you should probably contact the LZO 
author and ask them their position.

Note that, in any case, we would not distribute binaries with LZO enabled; you 
would have to compile Arrow yourself for that.


> [C++] Support LZO compression
> -
>
> Key: ARROW-12430
> URL: https://issues.apache.org/jira/browse/ARROW-12430
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Haowei Yu
>Priority: Major
>
> I have some code that supports arrow compression with LZO and am willing to 
> contribute. However, I do understand there is a license concern w.r.t using 
> lzo library since it's under GPL2. I am not sure if you can take the change 
> set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-12430) [C++] Support LZO compression

2021-04-17 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324218#comment-17324218
 ] 

Antoine Pitrou commented on ARROW-12430:


Indeed, the license issue is a bit tricky. It is not clear whether making use 
of the LZO APIs absolutely requires adherence to the GPL by Arrow itself. 

The GNU readline library (GPL-licensed) is in a similar situation and it 
[states|https://tiswww.cwru.edu/php/chet/readline/rltop.html] that you may make 
use of it inside software licensed under software licensed under any 
GPL-compatible license (the Apache license 2.0 is GPL-compatible according to 
the FSF). However, the FSF [contradicts its own 
advice|https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.en.html#IfLibraryIsGPL]
 in the GPL FAQ.

If you feel strongly about this feature, you should probably contact the LZO 
author and ask them their position.

Note that, in any case, we would not distribute binaries with LZO enabled; you 
would have to compile Arrow yourself for that.


> [C++] Support LZO compression
> -
>
> Key: ARROW-12430
> URL: https://issues.apache.org/jira/browse/ARROW-12430
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Haowei Yu
>Priority: Major
>
> I have some code that supports arrow compression with LZO and am willing to 
> contribute. However, I do understand there is a license concern w.r.t using 
> lzo library since it's under GPL2. I am not sure if you can take the change 
> set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12431) [Python] pa.array mask inverted when type is binary and value to be converted in numpy array

2021-04-17 Thread Daniel Nugent (Jira)
Daniel Nugent created ARROW-12431:
-

 Summary: [Python] pa.array mask inverted when type is binary and 
value to be converted in numpy array
 Key: ARROW-12431
 URL: https://issues.apache.org/jira/browse/ARROW-12431
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Daniel Nugent


{code:python}
Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)   

[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pyarrow as pa
>>>
>>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([False]))

[
  null
]
>>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([True]))

[
  00
]
>>> pa.array([b'\x00'],type=pa.binary(1), mask = np.array([False]))

[
  00
]
>>> pa.__version__
'3.0.0'
>>> np.__version__
'1.20.1'
{code}

Happens both with FixedSizeBinary and variable sized binary (I was working with 
FixedSizeBinary). Does not happen for integers (presumably other types, didn't 
exhaustively check)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)